Curated Self Study Guide for Computer Science, DevOps, SRE & SysAdmin
-
Updated
Jun 11, 2024 - HTML
Site reliability engineering (SRE) is a set of principles and practices that incorporates aspects of software engineering and applies them to infrastructure and operations problems. The main goals are to create scalable and highly reliable software systems. Site reliability engineering is closely related to DevOps, a set of practices that combine software development and IT operations, and SRE has also been described as a specific implementation of DevOps.
Curated Self Study Guide for Computer Science, DevOps, SRE & SysAdmin
Website, courses documentation, blog and youtube video tracker.
An active monitoring software to detect failures before your customers do.
Terraform Pull Request Automation
Enable Self-Service Operations: Give specific users access to your existing tools, services, and scripts
⭐ 【开源书籍】深入讲解内核网络、Kubernetes、ServiceMesh、容器等云原生相关技术。经历实践检验的 DevOps、SRE指南。如发现错误,谢谢提issue
A collection of git utilities, useful extra git scripts, tutorials and other useful articles.
Web UI for Jaeger
Terraform provider for Nobl9
On-Call/DevOps Assistant - Get a head start on fixing alerts with AI investigation
🐒 🔥 Datadog Failure Injection System for Kubernetes
A curated list of awesome DevOps platforms, tools, practices and resources
DevOps Tutorials
Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization. DevOps Interview Questions
log data pre processing in python
Kaytu's AI platform boosts cloud efficiency by analyzing historical usage and delivering intelligent recommendations—such as optimizing instance sizes—that maintain reliability. Pay for what you need, without compromising your apps.
Massively parallel ssh client
A prometheus exporter for pg-promise
A prometheus exporter for node-postgres