Observability: Metrics, Logs and Traces
Observability: Metrics, Logs and Traces is our new workshop for companies and individuals. In 4 days application and site reliability engineers will be familiarized with the benefits that modern observability tools provide to the builders and curators of cloud native applications.
- Want a workshop tailored to the specific needs of your company? Contact us >
Structure
4 days of online workshop – both theory and practice – with instructor
Audience
The goal of this workshop is to familiarize application and site reliability engineers with the benefits that modern observability tools provide to the builders and curators of cloud native applications.
Level and Prior Knowledge
Development tooling experience [make, mvn, yaml, vim]; Unix-like tooling [bash, scripting, vim]; Stable internet connection, a microphone and a webcam (optional but recomended). We will provide you with the links to webcast after enrollment
Technical Requirements
Stable internet connection, a microphone and a webcam (optional but recomended). We will provide you with the links to webcast after enrollment
Language
English
Instructors
Mikhail Chinkov
DevOps/SRE Consultant, software engineer with 8 years of hands-on experience in the areas of platform operations, security, CI/CD, AWS/Cloud infrastructure
Curriculum
Day 1
Theory
- Definition, goals and purposes of observability
- The Chaos Factor of Distributed Systems
- Three pillars of observability
- Patterns & anti-patterns
- Alerts, On-Call, and Incident Management
Practice
- Set up a Kubernetes cluster via Terraform
- Deploy microservices via GitHub Actions
Day 2
Theory
- Definition, goals and purposes of metrics
- Examples of use (incident management, historical analysis, capacity planning)
- USE & RED Methods
- Statistic Primer
- Metric Domains: Frontend Metrics, Business Metrics etc.
- About Prometheus
- How Prometheus is different (pull vs. push monitoring)
- Prometheus architecture
- Metric types (gauges, counters, histograms)
- PromQL – the basics (range vectors, aggregation operators)
- Alerting (alerting rules, alert routing)
- Clusterization (Thanos)
- About kube-prometheus-stack
- Prometheus alternatives
Practice
- Install kube-prometheus-stack chart to Kubernetes cluster
- Build up the USE dashboard for app cluster hosts
- Add the /metrics path to microservices
- Start scraping metrics by using ServiceMonitor
- Build up the RED dashboard for microservices
- Build up Alerts on 5xx and duration via Alertmanager
- Find out the root cause of the problem #1
Day 3
Theory
- Definition, goals and purposes of metrics
- Examples of use (exploitation, testing, real-time analytics, historical analysis, auditing)
- Typical log system challenges
- Runtime Errors Tracking (Sentry, Rollbar etc.)
- About Loki
- How Loki is different (labels)
- Loki architecture (collection, aggregation, storage, querying, visualisation)
- LogQL – the basics
- Alerting via Alertmanager
- Clusterization
- Loki alternatives
Practice
- Install Loki via Helm
- Start collecting logs from microservices
- Working with search, searching for errors over a period of time
- Creating visualizations for search selections
- Set up log alerts in Loki
- Work with log format conversion (JSON etc.)
- Find out the root cause of the problem #2
Day 4
Theory
- Definition, goals and purposes of distributed tracing
- Spans, traces and references
- Examples of use (upstream dependency analysis, etc.)
- Typical challenges of distributed tracing adoption
- About OpenTelemetry & Jaeger
- OpenTelemetry as a standard
- Jaeger as a backend
- Jaeger architecture (collection, aggregation, storage, querying, visualisation)
- About Jaeger Query
- Alerting via Alertmanager
- Backend Scaling
- Jaeger alternatives
Practice
- Install OpenTelemetry & Jaeger via Helm
- Add tracing agent to each microservice, route all spans to Jaeger
- Get to know the Jaeger interface
- Finding Traces
- Understanding Traces
- Analyzing Calls
- Visualise traces in Grafana
- Find out the root cause of the problem #3
Participant assessment
It's horrible when you come to a workshop and half of the time is spent on things you already know. Or when a workshop starts with topics that are way too advanced for you.
To avoid such situations, we send each participant an assessment test. Based on this test, we adjust the particular workshop to suit each one of the participants well. And if we feel that it's too early for you to take part in the workshop, we will tell you that too.
Don’t worry! If we find that your level isn't enough for this workshop we will return your payment the same day and recommend you something else.
Book workshop
We have received your request and will reply soon
Oops, something went wrong. Please use the chat widget bottom right to contact us.
We will contact you within 24 working hours to discuss dates of the workshop and gather information on your attendess for further assessment.