Introduction to MongoDB

Illustration of a large, segmented bookshelf filled with various items such as books, folders, a fishbowl, a TV, decorative objects, and a sleeping cat. Illustration of a large, segmented bookshelf filled with various items such as books, folders, a fishbowl, a TV, decorative objects, and a sleeping cat.

In this article I want to tell you what MongoDB is and if you should try it out.

Let me start with describing my experience with it. The first time I tried it was about 4 years ago, when it was on trend and everybody wanted to give it a shot, as NoSQL was what every startup needed!!! (not really, but we’ll talk about it later.)But I didn’t get anywhere with it at that point, played around with it a little bit, even used it for some MVP, but not for any serious projects. And then I forgot about it for some time. However, a year and a half ago I got a job at the company where MongoDB was used actively. There was a huge microservice architecture, and one of the microservices wasn’t really ‘micro’ and used the mentioned database. And then I experienced it for real. I was truly enjoying the fact that everything might be changed easily, but also had the bursts of anger, as the surprises in its work were too surprising. Now I have another job and use Mongo sensibly, working with it mostly every day, if not every day.

Well, let’s discuss now why you might need MongoDB for your project.

1. It’s schemaless

This one is an obvious advantage. If you work on a startup with a vague business model and you know that the project and its data management are going to change a lot in its pre-market stage, give NoSQL (MongoDB in particular) a shot. The thing is that in MongoDB there is no need to create tables, change their schemas, run migrations and think about data types, unlike in my favourite PostgreSQL. But be careful about that. The benefit here is that it’s way easier for you to create new tables and add or delete fields. It’s so easy that you just add a line like field :text, type: String in the model, when you use Mongoid (ORM for MongoDB). And that’s it, when you add some information, the new element in your database is going to have that field. If you add the data without ORM, you don’t even need any lines, just add whatever you’d like to. But there’s also the dark side. You cannot be absolutely sure if in a particular document you have a particular field (NB: in MongoDB entries in a collection are called documents.) In other words, if you didn’t have a field called ‘text’ in the beginning, then added some entries, and then added this field and some more entries, you’re not going to have the same field for the old entries. Thanks to Mongoid, it will pretend that such a field already exists for the old entries and just return the null value.

2. Horizontal partitioning is easy.

It is used if the amount of data you need to store in your database is bigger than the size of your drive on the server. Everything connected with the horizontal partitioning is the signature of any NoSQL database. Besides that they can only compete in reliability, writing speed and response rate. As I’m writing this article PostgreSQL doesn’t have any built-in horizontal partitioning option. There are some third-party solutions (here I can’t help but mention the company Citus, as they’re doing the great job.) The same thing is easily implemented in MongoDB, there are a whole lot of articles about it. Replication mechanisms along with the sharding work perfectly there. Moreover, the balancer may automatically choose in which shard to store which documents and it also may set the rules regarding some fields or the set of ones.

3. It has an awesome data aggregation.

An SQL is very complex and everybody’s familiar with it, so I’m not going to cast the stones at it. But the way of receiving data in MongoDB definitely deserves high praise. There is a map-reduce function, grouping according to multiple conditions, formatting the documents on-the-fly, generating random documents, sortings. So it has basically everything you can get from an SQL database along with saving it via the pipeline format and with more legible syntax.

Here’s an example from the docs:

sql

SELECT cust_id,
       SUM(price) as total
FROM orders
WHERE status = 'A'
GROUP BY cust_id
HAVING total > 250

mongo

db.orders.aggregate( [
   { $match: { status: 'A' } },
   {
     $group: {
        _id: "$cust_id",
        total: { $sum: "$price" }
     }
   },
   { $match: { total: { $gt: 250 } } }
] )

A MongoDB version is a little bit longer, but is structured more sensibly. If you’ve just moved from an SQL database, you’ll need some time to get used to it. But when you get into it, you’ll never come back to anything else.

4. It’s designed for denormalization.

In MongoDB we usually store data as we’d like to. In SQL databases you always have to pay attention to the data management (a little sidenote: you should always use your brain and not consider the word ‘always’ the rule of thumb), so that even if all the tables are normalized and the queries are badly-written, they can still retrieve the data. In MongoDB if the format of the data or the place where it’s stored is not convenient for you, you can transfer it or duplicate it to the required place beyond shame, as this database is schemaless. In other words, you might have the same field with the same data, but in different collections. Or maybe two fields in one collection and on top of that one more field which is the combination of the first two. But you shouldn’t abuse it. If you multiply fields in the documents without any limitations, it will become more and more difficult for a developer to understand data entities.

5. Simple index types.

The indexes’ names in MongoDB are really clear and there are almost no bottlenecks while putting them into action. For instance, if in PostgreSQL you have a b-tree index in one field and a gist index in another, when you form a query, only one of those indexes will be used. There are less of such surprises in MongoDB.

Conclusion

MongoDB is obviously no silver bullet. There might be some unexpected hidden problems. One of them is that it doesn’t disable slow queries by itself, so that they are still run until you close them manually. You also might experience low performance when you create a count query working with large collections. But those are the things that you face once and then just keep them in mind as they don’t really hold you back. I don’t encourage all of you to move from SQL databases to MongoDB, but if are in the dilemma and try to choose a database for a new project, consider using MongoDB. If it’s still tough for you to experiment with NoSQL solution, you can give PostgreSQL a shot. It always develops, it has JSON fields in a table (which is partly a NoSQL solution). There’s a lot of documentation and the performance is high. You won’t be disappointed. But this is the topic for another discussion.