Why and how mkdev switched to Fullstaq Ruby

Illustration of two young men, one holding a smartphone showing a social media notification icon, with both looking at the screen and smiling. Illustration of two young men, one holding a smartphone showing a social media notification icon, with both looking at the screen and smiling.

Ever since the first announcement of Fullstaq Ruby I was interested in trying it out. The idea behind it is simple: take the best server-oriented configuration of Ruby and distribute it as easy to install and use OS packages, deb and rpm in particular.

What is meant by "the best server-oriented configuration" is, first and foremost, that Fullstaq Ruby is compiled with Jemalloc.

Jemalloc is the FreeBSD libc memory allocator. It uses a more complex algorithm than the default Linux glibc memory allocator, but is faster and results in less memory fragmentation (which reduces memory usage significantly).

There is an excellent article Why Fullstaq Ruby, though you could say it's biased as it's written by the distribution developers. Regardless, I will not repeat the whole reasoning behind Fullstaq Ruby, as it was already done elsewhere.

Two primary reasons why we at mkdev were looking at Fullstaq Ruby are exactly two main selling points of it:

  1. Significantly reduced memory consumption;

  2. Easy to install and to update OS packages.

Ruby before Fullstaq Ruby

Before Fullstaq Ruby we were using Ruby from Software Collections -- the simplest way to get new Ruby versions from OS packages on CentOS without compiling it ourselves.

There are certain nuances in using Software Collections. For example, you need to scl enable the collection (which essentially sets few environment variables) to be able to use it. Overall, there was not much to complain about. We did wish new Ruby versions became available a bit faster, but other than that using Ruby from SCL was straightforward and never resulted in any issues.

Even though we didn't have any trouble by just using SCL Ruby, we did think that Fullstaq Ruby is an important project for the Ruby community. We love Ruby, and we want it to prosper, so it is our duty to support new initiatives like this.

I did not want to take the statement that it halves the memory consumption as given, though. So the first thing I did is I measured the memory consumption.

Analysing Ruby memory consumption with Performance Co-Pilot

There are plenty of articles with benchmarks of Jemalloc with Ruby and even Fullstaq Ruby. All of them show significant reduction in memory consumption, not just in artificial benchmarks, but on real projects.

What I was curious about is whether Fullstaq Ruby will result in a similar reduction for our application. Before rolling out Fullstaq Ruby to production, I conducted a very basic experiment: I run the complete mkdev.me test suite and measured the RSS memory usage. I did it 10 times in a row, first with Ruby 2.5 from Software Collections, then with Fullstaq Ruby 2.5, and then with Fullstaq Ruby 2.6. Each set of 10 runs took over 1 hour and resulted in around 7 000 metric samples per set.

All tests were running on my laptop, with i7-8565U CPU @ 1.80GHz × 8 with 16Gb of RAM.

Comparing SCL Ruby 2.5 with Fullstaq Ruby 2.6 is absolutely unfair. I added 2.6 into the table just to compare memory usage between those minor Ruby versions, purely out of curiosity and also because it was high time for mkdev to upgrade.

To collect metrics, I used Performance Co-Pilot hotproc option, which allows to automatically get metrics for each process that matches certain pattern.

If you want to conduct the similar experiment with PCP, then installing it with sudo dnf install pcp-zeroconf -y (if you are using Fedora, of course) should make your system ready for metrics collection. PCP provides hundreds of metrics out of the box and probably is the easiest tool to setup and use for such experiments.

Filtering for bundle was sufficient, because during tests there is just single bundle exec rspec spec process (we did not use any parallelized tests for this experiment):

pmstore hotproc.control.config 'fname == "bundle"'

To start collecting RSS memory usage, I run pmrep hotproc.memory.rss -1 -p -o csv -F fullstaq-25.csv. I piled all the CSVs into the final CSV and used free online tool DataWrapper to build some graphs.

The results are:

Not unexpectedly, Fullstaq Ruby did consume less memory, though nowhere near 50% reduction in our particular case. On average, it used 40-50mb less when looking at Ruby 2.5 and even less than this when using 2.6. Though it is worth repeating that comparing Fullstaq Ruby 2.6 with SCL Ruby 2.5 is not fair.

Is it a valid test of memory consumption? Is it the correct way to see the benefits of Jemalloc? Most likely not. But our test suite is quite comprehensive and big and in the end it reproduces the real behaviour of the application, mostly because we write a lot of end to end tests. Even if it's not the best way to see the benefits of Fullstaq Ruby and Jemalloc, it is the simplest one we could do and the closest one to the real usage of our particular application.

There are two important results from this test:

  1. Indeed, memory consumption is lower;

  2. All of our tests are still green.

Both are good signs we should give Fullstaq Ruby a try in production.

Fullstaq Ruby in Production

We swapped Ruby distributions without any issues and then compared the memory usage for a 1-week period.

We are using only standard CloudWatch with basic metrics collected by CloudWatch Agent. Comparison we did was very simple: just looking at 1 week graph before and after of total percentage of memory used on single application server. We are using Nginx with Passenger.

The results are curious. For one, there was no huge reduction in memory consumption.

Per-minute average, blue is before, green is after

There are two visible differences though:

  1. The memory consumption of Ruby 2.6 with Jemalloc seems more predictable, at least by looking at the graph;

  2. There are no sudden spikes and it never hit our alert threshold. If you look at the blue graph, it generated some alerts couple of times that week.

Per-hour average, blue is before, green is after

Looking at per-hour averages for the same time periods is even more awkward:

Again, there is no noticeable reduction, but there is seemingly improved stability. If anything, memory consumption seems to have increased.

Conclusion after production usage

Some of you are probably thinking: "this is the most stupid way to collect and compare metrics". I totally agree. In retrospect, we could enhance the amount of metrics we collect and give you more data to look at. But we were so excited to try out Fullstaq Ruby, that we just dropped it to production and did the most basic comparison afterwards.

It is fascinating how different graphs are, though. One would expect that the second graph would more or less replicate the first one, just with lower values. But what we got is a totally different memory consumption pattern. Is it because of Jemalloc? Is it because of a switch from 2.5 to 2.6? We'll never know!

Should you use Fullstaq Ruby?

If you are running Ruby applications in production on servers, I don't see a reason not to switch to Fullstaq Ruby. It's Open Source, it's still the same Ruby, but packaged for server usage, it's easy to install and update and it "just works". Your application will consume less memory and it will be much easier for you to upgrade Ruby versions in future - and those upgrades will arrive faster (at least, latest SCL Ruby is 2.6 and latest Fullstaq Ruby is 2.7).

There are plans for container-oriented versions of Fullstaq Ruby, though nothing stops you from using it in containers already today. That's what we do at mkdev.

Fullstaq Ruby deserves your attention and is a very important project for Ruby community. Give it a fair try. And if you are using it already, I would love to hear about your experiences in the comments section below.