20 Years of Infrastructure with Thomas Falkenberg from PAYBACK |🎙️#51

Podcast promotional graphic for "DEVOPS ACCENTS EPISODE 51: 20 YEARS OF INFRASTRUCTURE WITH THOMAS FALKENBERG FROM PAYBACK," featuring an illustrated smiling man in orange. Podcast promotional graphic for "DEVOPS ACCENTS EPISODE 51: 20 YEARS OF INFRASTRUCTURE WITH THOMAS FALKENBERG FROM PAYBACK," featuring an illustrated smiling man in orange.

What was the beginning of DevOps like? How does your project change when you spend more than a decade with it? In episode 51 of DevOps Accents our guest is Thomas Falkenberg from PAYBACK, a multi-partner loyalty program well-known in Europe, he has been working with since 2007. Also in this episode:

  • The health of the infrastructure;
  • How to choose a platform;
  • The beginnings of platform engineering;
  • The future of single cloud strategy;
  • How to scale up your business;
  • Architectural decision-making process;
  • Industry trends coming and going.

You can listen to episode 51 of DevOps Accents on Spotify, or right now:


Understanding the complexities of infrastructure and platform engineering is crucial for modern organizations aiming to scale effectively while maintaining robust operations. This article explores key insights from experienced professionals on navigating these challenges.

The Health of Infrastructure: Beyond Tools to Culture

Maintaining a healthy infrastructure goes beyond technology. Thomas Falkenberg emphasizes the importance of understanding a company's cultural DNA and business context. For instance, Payback's commitment to data protection shaped its infrastructure strategy, ensuring compliance with German regulations and public expectations. Meanwhile, Pablo pointed out that consultants, despite offering fresh perspectives, often lack deep organizational knowledge, underscoring the need for long-term team members to anchor infrastructure stability.


When you talk about the health of the infrastructure, to me, it's... I don't care that much about infrastructure on its own. It's like something to get a job done. What's important in the company are the principles, the culture, and the mindset of the people—and you know what you can do and what you cannot do because of the context the company is in.

So, Payback, for example, has a very strong focus on data protection because we own a lot of data. We get a lot of data when people use loyalty systems, and people are always worried about what happens with their data—especially in Germany, where this has been a concern for a long time. There are good reasons for them to have a focus on data protection, especially with the German history. There are reasons for that.

If you don't know that when you come in, for example, as a consultant, and you just say, "Can we not build this with this simple service in the cloud that processes your data?"—people get very nervous quickly here. You have to be very careful how you try to sell something like that. You have to start by, for example, talking about how the data is being protected and then move on to maybe some other benefits and things like that. This is what you learn, I think, over time: What's the DNA of the company? How many risks is it willing to take? Also, who are the right people to talk to, and how should you talk to them?

I think this is what really gets you further—more than the plain technology behind the things. That comes secondary, in my point of view. — Thomas Falkenberg


Choosing the Right Platform: A Multifaceted Decision

When Payback transitioned to the Google Cloud Platform (GCP), the decision was influenced by data management needs, cost-effectiveness, and specific cloud-native capabilities like BigQuery and low-latency networking. Falkenberg highlighted the importance of aligning platform choices with business priorities rather than solely focusing on technology. A single-cloud strategy was initially pursued for simplicity, but as acquisitions brought in multi-cloud elements, the approach evolved. This highlights the importance of flexibility in platform engineering.


I think the most straightforward one is just the pure amount of data that we have to manage. We did that—and still partially do—with our on-premise data center, where we have a co-location. In general, this was working fine, but you always reach the point where your box is just full. You have a database, and if you need to extend it, you have to buy another full appliance.

As we were using appliances—which are not cheap and come with big upfront investments to scale out—this approach became unsustainable. It’s not elastic; you cannot scale up and down. If you want to provide the same performance in non-production environments, like tests, for example, this also gets pretty expensive. A couple of things around the classical elasticity you get in the cloud became very attractive to us, but other factors played a role too. For example, services available in the cloud promised to take away the toil from our operations engineers, freeing them from just operating things. This was part of the reasoning. — Thomas Falkenberg


The Evolution of Platform Engineering

Platform engineering at Payback grew organically, transitioning from on-premise solutions to sophisticated cloud-based systems. As the concept of internal developer platforms (IDPs) emerged, Payback prioritized building foundational structures like resource hierarchies and permission models. Thomas noted that while tools like GCP's Cloud Build were effective initially, adapting to the evolving needs of the platform required exploring alternatives such as GitHub Actions. This evolution demonstrates the need to balance immediate functionality with long-term scalability.


I think it was like an evolution and also a bit of learning by doing, at least in my case. When we started platform engineering, maybe the term was around, but it wasn’t widely known. At least, I think we had something like a platform in the on-premise world, though we didn’t really call it that. Kirill knows what I’m talking about, as he helped build parts of it.

But the focus shifted. With moving to the cloud, we all had to learn that one extreme is just handing out access to the cloud to developers and letting them go. I think everyone agrees now that this is a very bad idea in general—except if you just want to experiment or try something out as a startup. If you have some experience, it might work for a time, but eventually, you’ll hit a point where this doesn’t scale. You end up with a big mess that’s hard to maintain.

We wanted to have a platform. I’m not even sure we called it that at the time. Yes, we called it a platform team, but terms like IDP (Internal Developer Platform) and related concepts were coined later. There wasn’t much tooling available then, which I think was one of our challenges. — Thomas Falkenberg


Scaling Up: Predictable and Unpredictable Challenges

Scaling infrastructure for tens of millions of users, as seen at Payback, involves both planned and unplanned challenges. Unexpected spikes in BigQuery costs and the intricacies of hybrid networking were managed through careful planning and monitoring. Falkenberg described how foundational changes to the platform, akin to replacing a car’s engine while driving, were implemented without disrupting users—a testament to meticulous architectural planning.


We had some financial surprises that probably everyone experiences when using BigQuery. Migrating some of the data and then analysts starting to explore BigQuery—running some bigger queries on it—spiked costs to a few thousand euros a day just for some queries on the data, which is not hard with that amount of data. I think they were maybe impressed by the performance.

If you use on-demand pricing in BigQuery, it just uses all the slots it can, making it very fast but also expensive if you pay by the amount of data being scanned in the query. The good thing is, we already had measures in place, like budget alerts and budget limits for the project. We had defined those defaults as a cloud platform team when we handed out the projects to the users. So, we saw this pretty quickly and took measures.

This was one of the surprises. Other than that, not really, because I think we started with a pretty sound foundation of having transparency into what’s going on and also being a bit careful in rolling out the more expensive things. Let’s put it like that. Most of the time, this worked quite well. — Thomas Falkenberg


Trends That Stay and Go in the Industry

Not all industry trends are worth adopting. Falkenberg advised focusing on open standards like OpenTelemetry for observability while avoiding proprietary solutions that could lead to vendor lock-in. He also stressed the importance of simplicity, noting that every new tool adds mental load and complexity. Conversely, the team embraced AI and machine learning as essential tools for the future, demonstrating a balanced approach to innovation and risk.


And then, of course, the hype currently with AI technology—I think here we are a bit faster compared to the last trends, more involved. This is also easier to achieve if you are in a cloud provider space, working together with Google on some topics here and also in the development and DevOps space. Things like Co-Pilot or Code Assistant, I think, will just stay and improve over time, becoming normal for people to use in their daily work. A lot of people are already using it.

The major challenges for us have always been the legal and compliance discussions. Technology-wise, it is relatively simple now to build your own applications with the help of AI APIs or language models and incorporate these into your applications. That comes with a new set of challenges, but that's probably a completely separate talk and a huge topic.

At least here, we found a way to easily experiment with these things without too much investment, just to get to know them and avoid being overly surprised when new things come up. Then we already know the basics, how it compares to other solutions, and things like that. So yes, this is something that will stay and that we are really looking closely at right now. — Thomas Falkenberg


Decision-Making in Architecture: The Role of Principles and Records

Effective architectural decisions stem from a mix of clear principles and collaborative processes. Payback’s architecture guild established high-level principles, emphasizing simplicity and the use of cloud-native solutions. Decision records maintained at both team and guild levels ensure transparency and continuity, enabling teams to understand not just the "what," but the "why" behind choices—a practice that aids both current operations and onboarding new talent.


To get more concrete on the architectural decisions, we implemented a guild concept. There’s, for example, an architecture guild. One of the first things it came up with was a set of architecture principles—around 10 or 12 high-level principles—from which we could derive practices or patterns.

One principle, for instance, focuses on essential complexity, avoiding the creation of unnecessary accidental complexity when building things. It sounds a bit abstract, but there are examples in the principles. Another is to use cloud-native technology. A more detailed example is the principle of using what is provided as a service unless there’s a valid reason not to—such as cost considerations or other constraints. The idea is to avoid reinventing the wheel whenever possible.

If you keep building your systems in an on-premise world, you often end up with a long history of custom, homegrown tools. These are often protected like little pets by the people who created them. While that might work for a time, it doesn’t scale well. As soon as those people leave the company, or new colleagues join, they have to dig into these custom technologies, learn them, and maintain them. Instead, we aim to use standard technologies, open-source tools, and services provided by the cloud provider.

In this case, our go-to solution was to take what Google offers. Most of the time, this has worked quite well. Sometimes, it hasn’t worked as well, but overall, it’s been effective. — Thomas Falkenberg


Final Thoughts

Organizations looking to build resilient and scalable infrastructure can learn much from Payback’s journey. From maintaining cultural alignment to experimenting cautiously with new tools, the insights shared here provide a roadmap for balancing innovation with operational stability. Smaller companies may not replicate all these processes immediately but can draw inspiration from establishing foundational practices that support long-term growth.



Show Notes:

  • Our guest, Thomas Falkenberg, on LinkedIn;
  • PAYBACK, a leading multipartner loyalty program and multichannel marketing platform.

Podcast editing: Mila Jones, milajonesproduction@gmail.com

Previous Episode • All Episodes