Leaving the Cloud | ✉️ #4

Illustration for "MKDEV DISPATCH #4 - LEAVING THE CLOUD" with a smiling person holding a small dog, paper airplanes, and a paper plane thought bubble. Illustration for "MKDEV DISPATCH #4 - LEAVING THE CLOUD" with a smiling person holding a small dog, paper airplanes, and a paper plane thought bubble.

Hey! 👋

The topic that was on my mind a lot for the last couple of weeks was the announcement of 37signals that they will migrate from the cloud back to their own data center.

Well, not “back” - their main product, Basecamp, was always in the data center, while their latest product, Hey, was running in AWS since day 1. I am a bit biased when someone is talking about whether the cloud is a good choice. At mkdev, we love the cloud, and we help our customers with the cloud - both with using it, and migrating to it.

My first instinct, when I’ve read 37signals' announcement, was the desire to look at their monthly bill and see whether it actually makes sense. Do they leverage the right services for the right workloads, or do they use AWS as a mere VPS provider? Is the cost driven by particular architecture decisions? In the end, cloud costs and cloud architecture go hand in hand together. And in many cases, a fat AWS bill is just a sign of an improperly planned and built AWS environment. Is this the case at 37Signals? According to them, they’ve already optimized everything they could and still have to pay $3mln/year for Hey.com cloud infrastructure.

What we do know is what 37signals tells us about this planned migration (and they admit this migration will take months, if not years). I highly recommend listening to the podcast with the company’s CTO and their lead infrastructure engineer, where they walk through the decision-making process. It explains many things.

One point is that they have an experienced infra team, that is used to running the infrastructure outside the cloud. Per 37signals, just using the cloud does not actually make things any easier in terms of operations. This is true: cloud brings different models of operating and configuration, but different does not always mean “requires fewer people on infra team”. It just means different, and at certain scale, every company ends up with a dedicated team responsible for the cloud infrastructure, not unlike having a dedicated team responsible for the on-premise infrastructure.

On the other side, till certain scale, you can manage the cloud infrastructure with way fewer engineers than on-premise would require. Quite often, this work is even done by the developers, and only during a small part of their time. Both Basecamp and Hey have millions of users, quite some load on their systems. They do need a dedicated team to manage all of this. From this perspective, sticking to a single way to manage the infrastructure is just simpler from the organizational perspective.

The other point is that 37signals is not managing the complete data center - a lot of the hardware tasks are outsourced to a white glove data center provider, and the infrastructure team is in the game once everything is connected and remote access is in place. The cloud offers the shared responsibility model: AWS, or Azure, or GCP, take care of the physical data center and machines inside, and expose to you only the software abstractions on top of this. In 37signals model, there is a bit more responsibility on the company side, and a bit less responsibility on the data center service provider side, but it’s still not identical to building your own data center and having your own people doing all the work end to end.

37signals is basically saying: “We want to have full responsibility with purchasing the right hardware, and with configuring it to run our own applications. We don’t need any extra software abstractions cloud offers us, we already have our preferred way of doing it ourselves.” Which is an absolutely valid approach in their particular case. It also means, that there is a solid chance that 37signals was looking at the cloud as a fancier way to run servers.

I have an issue with the message “leaving the cloud”, because 37signals also says that they use AWS S3 for more or less all file storage. I don’t know for sure, but I won’t be surprised if some other cloud services are in use - at the very least, seems like DataDog is also in use over there. “Leaving the cloud” seems more like “using the cloud where it makes sense for our particular setup”, which makes all the sense in the world.

There is also a philosophical point that the CTO of the company brings. Should we all use AWS? Is it a good idea to rely all the world’s IT infrastructure on 2-3 tech companies? These questions are not asked often enough. We humans tend to say we hate centralization, but we are also lazy, and having centralized systems is just so convenient. Should the convenience be the deciding factor? Is it the right choice long-term?

37signals are famous for being controversial, having their own strong opinions on many things, and giving really, really good arguments to support what they say.

Having over a decade of experience running things in your own DC, an experienced team responsible for all of this, and a close relationship with the data center service provider are all the right reasons to stick and evolve this approach.

If you are using the cloud as a way to run servers, then the cloud will always be significantly more expensive. Paying over $3mln/year to AWS makes zero sense, if you can pay a fraction of that in your own DC, without hiring more people or retraining the existing team. Still, if you are using some of the cloud services (and today it’s close to impossible not to), you did not leave the cloud, you just have a hybrid-cloud setup, like a majority of big companies today.

Finally, the point of relying too much on just a few cloud providers is a very good. How do we solve this? I think part of the answer is in defining open standards for running things in a cloud native way. Once those standards are in place, we should be able to switch between any provider, or use any number of them, without adjusting our infrastructure automation. This is especially important for regulated industries, where you might not be even allowed to use AWS, and not have people and skills to build your own abstractions and standards from scratch.

Where those standards will come from? We’ve got some good ones already, around packaging and running containers. Now we are all building a more complex ones on top. I partially touched this topic in this article, but I think this discussion is much bigger and needs further exploration. I’ll be happy to hear your thoughts on this!

What we've shared

What we've discovered

  • The future of Kubernetes – and why developers should look beyond Kubernetes in 2022

  • Impressive AWS numbers from Prime Day: Important to keep in mind, but those are numbers for Amazon.com itself, and what kind of load AWS handled during Prime Day just for Amazon. Thus numbers are even more impressive, if you consider that same infra also runs Netflix, Pinterest and thousands of other infrastructures of any size.

  • Using Amazon EBS snapshots for persistent storage with your Amazon EKS cluster by leveraging add-ons.

  • Don’t use Kubernetes CPU Limits: An interesting take on CPU limits, suggesting that setting them is rarely a good idea.

  • Continuous Load Testing: We fully support the idea of continuously testing the performance and resiliency of the service. It's not so common, so it's refreshing to see how serious Slack takes this topic.

  • A PostgreSQL server instance running in a virtual machine running in the browser by Supabase & Snaplet.

A random reminder

'DevOps Accents' is our new monthly podcast on everything around DevOps, Public Cloud and Cloud Native topics! First episode is already out, you can listen to it on Spotify, Apple Podcasts or the site itself.

The 5th mkdev dispatch will arrive on Friday, November 11th. See you next time!