Observability: Metrics, Logs and Traces

Observability: Metrics, Logs and Traces is our new workshop for companies and individuals. In 4 days application and site reliability engineers will be familiarized with the benefits that modern observability tools provide to the builders and curators of cloud native applications.

BOOK THIS WORKSHOP FOR YOUR TEAM

  • Want a workshop tailored to the specific needs of your company? Contact us >
observability

Structure

4 days of online workshop – both theory and practice – with instructor

Audience

The goal of this workshop is to familiarize application and site reliability engineers with the benefits that modern observability tools provide to the builders and curators of cloud native applications.

Level and Prior Knowledge

Development tooling experience [make, mvn, yaml, vim]; Unix-like tooling [bash, scripting, vim]; Stable internet connection, a microphone and a webcam (optional but recomended). We will provide you with the links to webcast after enrollment

Technical Requirements

Stable internet connection, a microphone and a webcam (optional but recomended). We will provide you with the links to webcast after enrollment

Language

English

Instructors

Mikhail Chinkov

Mikhail Chinkov

DevOps/SRE Consultant, software engineer with 8 years of hands-on experience in the areas of platform operations, security, CI/CD, AWS/Cloud infrastructure

Learn more about the insctructors and the team >

Curriculum

Day 1

Theory

  • Definition, goals and purposes of observability
  • The Chaos Factor of Distributed Systems
  • Three pillars of observability
  • Patterns & anti-patterns
  • Alerts, On-Call, and Incident Management

Practice

  • Set up a Kubernetes cluster via Terraform
  • Deploy microservices via GitHub Actions

Day 2

Theory

  • Definition, goals and purposes of metrics
  • Examples of use (incident management, historical analysis, capacity planning)
  • USE & RED Methods
  • Statistic Primer
  • Metric Domains: Frontend Metrics, Business Metrics etc.
    • About Prometheus
    • How Prometheus is different (pull vs. push monitoring)
    • Prometheus architecture
    • Metric types (gauges, counters, histograms)
    • PromQL – the basics (range vectors, aggregation operators)
    • Alerting (alerting rules, alert routing)
    • Clusterization (Thanos)
    • About kube-prometheus-stack
  • Prometheus alternatives

Practice

  • Install kube-prometheus-stack chart to Kubernetes cluster
  • Build up the USE dashboard for app cluster hosts
  • Add the /metrics path to microservices
  • Start scraping metrics by using ServiceMonitor
  • Build up the RED dashboard for microservices
  • Build up Alerts on 5xx and duration via Alertmanager
  • Find out the root cause of the problem #1

Day 3

Theory

  • Definition, goals and purposes of metrics
  • Examples of use (exploitation, testing, real-time analytics, historical analysis, auditing)
  • Typical log system challenges
  • Runtime Errors Tracking (Sentry, Rollbar etc.)
  • About Loki
    • How Loki is different (labels)
    • Loki architecture (collection, aggregation, storage, querying, visualisation)
    • LogQL – the basics
    • Alerting via Alertmanager
    • Clusterization
  • Loki alternatives

Practice

  • Install Loki via Helm
  • Start collecting logs from microservices
  • Working with search, searching for errors over a period of time
  • Creating visualizations for search selections
  • Set up log alerts in Loki
  • Work with log format conversion (JSON etc.)
  • Find out the root cause of the problem #2

Day 4

Theory

  • Definition, goals and purposes of distributed tracing
  • Spans, traces and references
  • Examples of use (upstream dependency analysis, etc.)
  • Typical challenges of distributed tracing adoption
  • About OpenTelemetry & Jaeger
    • OpenTelemetry as a standard
    • Jaeger as a backend
    • Jaeger architecture (collection, aggregation, storage, querying, visualisation)
    • About Jaeger Query
    • Alerting via Alertmanager
    • Backend Scaling
  • Jaeger alternatives

Practice

  • Install OpenTelemetry & Jaeger via Helm
  • Add tracing agent to each microservice, route all spans to Jaeger
  • Get to know the Jaeger interface
    • Finding Traces
    • Understanding Traces
    • Analyzing Calls
  • Visualise traces in Grafana
  • Find out the root cause of the problem #3

Participant assessment

It's horrible when you come to a workshop and half of the time is spent on things you already know. Or when a workshop starts with topics that are way too advanced for you.

To avoid such situations, we send each participant an assessment test. Based on this test, we adjust the particular workshop to suit each one of the participants well. And if we feel that it's too early for you to take part in the workshop, we will tell you that too.

Don’t worry! If we find that your level isn't enough for this workshop we will return your payment the same day and recommend you something else.

Book workshop

We will contact you within 24 working hours to discuss dates of the workshop and gather information on your attendess for further assessment.