Observability: Metrics, Logs and Traces

Observability: Metrics, Logs and Traces is our new workshop for companies and individuals. In 4 days application and site reliability engineers will be familiarized with the benefits that modern observability tools provide to the builders and curators of cloud native applications.

BOOK THIS WORKSHOP FOR YOUR TEAM

Want a workshop tailored to the specific needs of your company? Contact us >

Structure

4 days of online workshop – both theory and practice – with instructor

Audience

The goal of this workshop is to familiarize application and site reliability engineers with the benefits that modern observability tools provide to the builders and curators of cloud native applications.

Level and Prior Knowledge

Development tooling experience [make, mvn, yaml, vim]; Unix-like tooling [bash, scripting, vim]; Stable internet connection, a microphone and a webcam (optional but recomended). We will provide you with the links to webcast after enrollment

Technical Requirements

Stable internet connection, a microphone and a webcam (optional but recomended). We will provide you with the links to webcast after enrollment

Language

English

Instructors

Mikhail Chinkov

DevOps/SRE Consultant, software engineer with 8 years of hands-on experience in the areas of platform operations, security, CI/CD, AWS/Cloud infrastructure

Learn more about the insctructors and the team >

Curriculum

Day 1

Theory

Definition, goals and purposes of observability
The Chaos Factor of Distributed Systems
Three pillars of observability
Patterns & anti-patterns
Alerts, On-Call, and Incident Management

Practice

Set up a Kubernetes cluster via Terraform
Deploy microservices via GitHub Actions

Day 2

Theory

Definition, goals and purposes of metrics
Examples of use (incident management, historical analysis, capacity planning)
USE & RED Methods
Statistic Primer
Metric Domains: Frontend Metrics, Business Metrics etc.
- About Prometheus
- How Prometheus is different (pull vs. push monitoring)
- Prometheus architecture
- Metric types (gauges, counters, histograms)
- PromQL – the basics (range vectors, aggregation operators)
- Alerting (alerting rules, alert routing)
- Clusterization (Thanos)
- About kube-prometheus-stack
Prometheus alternatives

Practice

Install kube-prometheus-stack chart to Kubernetes cluster
Build up the USE dashboard for app cluster hosts
Add the /metrics path to microservices
Start scraping metrics by using ServiceMonitor
Build up the RED dashboard for microservices
Build up Alerts on 5xx and duration via Alertmanager
Find out the root cause of the problem #1

Day 3

Theory

Definition, goals and purposes of metrics
Examples of use (exploitation, testing, real-time analytics, historical analysis, auditing)
Typical log system challenges
Runtime Errors Tracking (Sentry, Rollbar etc.)
About Loki
- How Loki is different (labels)
- Loki architecture (collection, aggregation, storage, querying, visualisation)
- LogQL – the basics
- Alerting via Alertmanager
- Clusterization
Loki alternatives

Practice

Install Loki via Helm
Start collecting logs from microservices
Working with search, searching for errors over a period of time
Creating visualizations for search selections
Set up log alerts in Loki
Work with log format conversion (JSON etc.)
Find out the root cause of the problem #2

Day 4

Theory

Definition, goals and purposes of distributed tracing
Spans, traces and references
Examples of use (upstream dependency analysis, etc.)
Typical challenges of distributed tracing adoption
About OpenTelemetry & Jaeger
- OpenTelemetry as a standard
- Jaeger as a backend
- Jaeger architecture (collection, aggregation, storage, querying, visualisation)
- About Jaeger Query
- Alerting via Alertmanager
- Backend Scaling
Jaeger alternatives

Practice

Install OpenTelemetry & Jaeger via Helm
Add tracing agent to each microservice, route all spans to Jaeger
Get to know the Jaeger interface
- Finding Traces
- Understanding Traces
- Analyzing Calls
Visualise traces in Grafana
Find out the root cause of the problem #3

Participant assessment

It's horrible when you come to a workshop and half of the time is spent on things you already know. Or when a workshop starts with topics that are way too advanced for you.

To avoid such situations, we send each participant an assessment test. Based on this test, we adjust the particular workshop to suit each one of the participants well. And if we feel that it's too early for you to take part in the workshop, we will tell you that too.

Don’t worry! If we find that your level isn't enough for this workshop we will return your payment the same day and recommend you something else.

Book workshop

We will contact you within 24 working hours to discuss dates of the workshop and gather information on your attendess for further assessment.

More workshops

Cloud Native GitOps

Modern, declarative way to deploy applications to Kubernetes

When: Soon