Observability: Metrics, Logs and Traces

WIP

This workshop is still a work in progress and will be available soon. However you can book it today and help release it sooner.

The workshop is timed to lectures — we do not repeat lectures, but instead provide intense practice, with explanations (when needed) and strict focus on participants doing the work. Thus, list-items below are practical things to be done — they are accompanied by slides from the videos and live coding of the instructor.

In this workshop, attendees will configure an RBAC-enabled vanilla K8S cluster, deploy Prometheus, Loki, and Jaeger in support of observing and monitoring a distributed microservice application, and instrument that application by introducing libraries and tooling. Each attendee will configure, update, and deploy a cloud-native application utilizing the following technologies:

  • k8s_platform
  • Helm - we will utilize Helm to deploy policies, monitoring services, ingress rules, operators, and our demo application
  • Spring Petclinic Microservices - we will utilize a distributed Java/Springboot/Angular-based cloud-native application. We will start with an unmonitored version and work together to bring observability to our application
  • Grafana / Prometheus - we will demonstrate how to utilize Prometheus operators to configure Prometheus scrape targets. We will install custom dashboards, and set up custom alerting rules
  • Grafana / Loki - we will demonstrate how to utilize Loki to collect logs of your microservices to gain visibility into the current application behavior. We will set up custom log pattern dashboards and create metrics based on log records
  • Jaeger / Opentracing - we will demonstrate how to utilize the Jaeger and Opentelemetry libraries to gain visibility into the communication happening between your microservices and database systems

During this workshop we will discuss and explore some of the following topics:

  • How Distributed Tracing works and why it's become a requirement when deploying cloud-native applications.
  • The pitfalls, challenges, and solutions to managing and deploying observability at scale.
  • What's missing? Important aspects of our operations aren't being monitored, let's find out what and, more importantly, how to fix it.
  • Where is the innovation happening in observability? Let's discuss automation, AI, and the future.

PRICE: To be announced

BOOK THIS WORKSHOP FOR YOUR TEAM
  • Want a workshop tailored to your specific needs? Contact us >
observabilityobservability

Structure

4 days of online workshop – both theory and practice – with instructor

Audience

The goal of this workshop is to familiarize application and site reliability engineers with the benefits that modern observability tools provide to the builders and curators of cloud native applications.

Level and Prior Knowledge

Development tooling experience [make, mvn, yaml, vim];
Unix-like tooling [bash, scripting, vim]

Technical Requirements

Stable internet connection, a microphone and a webcam (optional but recomended). We will provide you with the links to webcast after enrollment

Language

English

Costs

To be announced

Instructors

Mikhail Chinkov

Mikhail Chinkov

DevOps/SRE Consultant, software engineer with 8 years of hands-on experience in the areas of platform operations, security, CI/CD, AWS/Cloud infrastructure

Learn more about the insctructors and the team >

Curriculum

Day 1

Theory

  • Definition, goals and purposes of observability
  • The Chaos Factor of Distributed Systems
  • Three pillars of observability
  • Patterns & anti-patterns
  • Alerts, On-Call, and Incident Management

Practice

  • Set up a Kubernetes cluster via Terraform
  • Deploy microservices via GitHub Actions

Day 2

Theory

  • Definition, goals and purposes of metrics
  • Examples of use (incident management, historical analysis, capacity planning)
  • USE & RED Methods
  • Statistic Primer
  • Metric Domains: Frontend Metrics, Business Metrics etc.
  • About Prometheus
    • How Prometheus is different (pull vs. push monitoring)
    • Prometheus architecture
    • Metric types (gauges, counters, histograms)
    • PromQL – the basics (range vectors, aggregation operators)
    • Alerting (alerting rules, alert routing)
    • Clusterization (Thanos)
    • About kube-prometheus-stack
  • Prometheus alternatives

Practice

  • Install kube-prometheus-stack chart to Kubernetes cluster
  • Build up the USE dashboard for app cluster hosts
  • Add the /metrics path to microservices
  • Start scraping metrics by using ServiceMonitor
  • Build up the RED dashboard for microservices
  • Build up Alerts on 5xx and duration via Alertmanager
  • Find out the root cause of the problem #1

Day 3

Theory

  • Definition, goals and purposes of metrics
  • Examples of use (exploitation, testing, real-time analytics, historical analysis, auditing)
  • Typical log system challenges
  • Runtime Errors Tracking (Sentry, Rollbar etc.)
  • About Loki
    • How Loki is different (labels)
    • Loki architecture (collection, aggregation, storage, querying, visualisation)
    • LogQL – the basics
    • Alerting via Alertmanager
    • Clusterization
  • Loki alternatives

Practice

  • Install Loki via Helm
  • Start collecting logs from microservices
  • Working with search, searching for errors over a period of time
  • Creating visualizations for search selections
  • Set up log alerts in Loki
  • Work with log format conversion (JSON etc.)
  • Find out the root cause of the problem #2

Day 4

Theory

  • Definition, goals and purposes of distributed tracing
  • Spans, traces and references
  • Examples of use (upstream dependency analysis, etc.)
  • Typical challenges of distributed tracing adoption
  • About OpenTelemetry & Jaeger
    • OpenTelemetry as a standard
    • Jaeger as a backend
    • Jaeger architecture (collection, aggregation, storage, querying, visualisation)
    • About Jaeger Query
    • Alerting via Alertmanager
    • Backend Scaling
  • Jaeger alternatives

Practice

  • Install OpenTelemetry & Jaeger via Helm
  • Add tracing agent to each microservice, route all spans to Jaeger
  • Get to know the Jaeger interface
    • Finding Traces
    • Understanding Traces
    • Analyzing Calls
  • Visualise traces in Grafana
  • Find out the root cause of the problem #3

Participant assessment

It's horrible when you come to the workshop and half of the time is spent on the things you already know. Or maybe the workshop starts from the topics that are way too advanced for you.

To avoid such situations, we send each participant an assessment test - based on this test, we adjust the particular workshop to suit each one of the participants well. And if we feel that it's too early for you to take part in the workshop, we will tell you that too.

Don’t worry! If we find that you are way too dumb for this workshop we will return your payment the same day and recommend you something else.

Book workshop

We will contact you within 24 working hours to discuss dates of the workshop and gather information on your attendess for further assessment.