How to understand the code of the Aliens and refactor properly
Introduction: Who is our code for?
‘Nothing is more permanent than a temporary solution.’
— A proverb
Any code exists so that some program could execute it. It doesn’t matter for this program how you wrote it, but it should have no mistakes. If you agree with that, you are among many others who misunderstand the whole concept of programming. Let me try to explain this.
What, in your opinion, do all the programs have in common? Let’s set aside such obvious things as the fact of being written by people in some programming language and consisting of symbols from this exact language. There’s one more thing we have left. What, besides the language, differs the text of a program from the text of a book? Yep, books are for reading, not for executing. But there’s one more difference. Any program should be able to be changed so that it can be used in the future! This rule can be applied to anything from trendy mobile apps to core elements of operating systems. It seems fair to say that the code written by one programmer is sort of intended for other programmers (or for the very same future programmer) as well. How do you think, do I remember anything about the code I wrote five years ago? I remember nothing even about the project I did half a year ago!
The implementation of your program should be clear not only for you when you’re working on it, but for anybody else who might come across it
This article is not enough to cover such complex topics as refactoring and code architecture. There is a bunch of fascinating books written about it. We’re just going to provide the direction you can move in to make your code or the code you inherited from other developers better.
Part 1: Naming
‘The world was so recent that many things lacked names, and in order to indicate them it was necessary to point.’
— Gabriel García Márquez
One of my friends once told me a rather funny story. He said that the developers in the companies which care too much about safety don’t know the purpose of the business logic they implement and don’t have to know what their program does. They must NOT know that, to be precise. But how can you create a program without understanding what it does? The company hires external analysts, whose access level is going to be higher than that of the on-site developers, exactly for this purpose.
They have tables which contain information about the real names and corresponding names in code, they browse them and give tasks to the developers according to these tables. For example: “Write a ‘func_352’ function, which should take the number ‘DEP20’, multiply it by ‘CF1850’ coefficient and save the result to the database ‘FIELD255’ field”. What does the function ‘func_352’ actually does? It takes the amount from the deposit, multiplies it by the annual interest rate and saves it to the user’s balance. But the developer knows nothing about it while creating the function.
This example is quite simple. But can you imagine a function with dozens of variables of this kind and a program with thousands of them? The development speed would be mostly the same as if you ventured on writing the story in Hungarian, talking on the phone with the friend who had a Hungarian-English dictionary. The developers working with someone else’s code with bad naming are confused quite as much.
In programming literature, you can find a whole lot of recommendations on how to name software objects of different kind. The main idea of all these recommendations is that all the names should make sense.
The function name should show what it does. The variable name should show what it stores.
You can quite easily pick a name for such a simple function as the one which calculates the user’s balance. But what should you do if your function does way more operations? This is what our next part is about.
Part 2: Single responsibility principle
‘Simple is better than complex’
‘Complex is better than complicated.’
— Zen of Python
Imagine a miserable employee of some company who needs to issue a statement of transfer and acceptance of goods, drive to the warehouse, pick up the goods from the warehouse, sign some documents, drive to the store, display the goods on the store shelves, order the goods from the supplier and pay for them, and all of that should be done today.
There are a lot of people who work like that, there’s nothing wrong with it. But what if your business grows and you have ten stores instead of one now? Would you hire ten such employees? Hiring an accountant, a driver, a merchandiser and a shop assistant makes more sense at this point.
The same happens in programming. If your program is small, having such an approach as from above is absolutely fine, since it takes almost no planning. But as soon as the program, as any other program, gets bigger, the use cases get more complex too. This is when you start asking yourself questions like “How should I name the 200-line function so that its name actually means anything? How about ‘DO ALL’”?
Dividing this function into smaller functions, each with its specific purpose, seems like a good idea. I understand that such advice sounds obvious, but believe me, some professional developers don’t do it. What is the reason? Are they too lazy to refactor their code? Or maybe it’s the same as with foreign languages? We’re learning the rules for years, but everything gets out of our heads when it comes to talking in real life. The reason doesn’t matter much. Let it be that those programmers are just too careless. Shall we continue?
There’s a wonderful rule:
If you can’t pick a name for a function, it needs to be rewritten (divided into subfunctions)
And one more:
One function — one purpose
Part 3: Abstraction layers
‘The more an artist uses these abstracted forms, the deeper and more confidently will he advance into the kingdom of the abstract.’
— Wassily Kandinsky
Any IT product, be it a website, a mobile app or an electronic wristwatch, is a clear example of multi-layered abstractions. For example the buttons you see on a website call the functions in the code via the Internet. But this is only the beginning. After that, the code written by the website developer uses the libraries of the language the website was written in, e.g. Python. Python calls the libraries of some more low-level language that it works with. In our case it’s C++. Then C++ libraries address to Assembler, in which all the components of the operating system are written. Such components make system calls, through which the request will reach the CPU and will be converted into zeros and ones somewhere and then into electric signals which will do the calculation needed. After that, the calculation result will return with the same steps in reversed order.
This is a very simplified representation of what happens when we ask Google what two plus two equals. In practice, there are way more stages. They were created by many professionals: physicists, hardware engineers, operational systems programmers, creators of programming languages, website developers. Each of them strived to make work of others who will use their work later easier. Nowadays it’s a piece of cake for us to make a website just because 99.9% of work has been done for us.
The same concepts can be applied when we work with the code inside the project. You need to divide the code according to the abstraction layers, so the development process wouldn’t be too difficult. Each component should ‘think’ according to the layer it’s working at. The function saving the user’s e-mail to the database doesn’t need to know where this e-mail comes from: from the form on the website, mobile app or social network. This is what higher layers are responsible for. It works the same way for the function that saves e-mails to the database. It doesn’t need to know where exactly it stores the data. This is what lower levels are responsible for, they store the location and the type of the database used. The approach of distinct dividing abstraction layers is highly beneficial, it makes your development process easier and faster.
Imagine your colleague telling you ‘We need to send notifications to the mobile apps, but I’ve done it already. Just call my function and pass the message text and a user’s ID into it’. Sounds good, doesn’t it? The developer separated notification sending and the rest of the app’s logic. Now the app has a function working exactly on notifications and it can be used. And now try to imagine another situation. Your colleague tells you ‘We need to send notifications to the mobile apps. I’ve done it already for payment notifications for the user named Bill. Now we need to do the same for John.
Look at the function I wrote, do yours using mine as the example, but change the text and user’s ID.’ Sounds not so good anymore. In this case, you need to examine your colleague’s code, copy some parts of it and change and adapt them to your purposes. It takes way more time and leads to the code duplication, that’s for sure.
Do not mix concrete things with abstract ones
If you’re working on the code alone, you need to keep in mind the abstraction layers as well, as it facilitates your future work. There’s a concept of architecture design that requires your code to be ready for any changes your client might ask for. Dividing software according to the abstraction layers is one of the ways to do that with minimal losses. You need notifications for an iPhone? No probs, we’ve done it for an Android already. Now we just need to connect the notifications module with the new platform, and, since we have the notifications already, we don’t have to work on them. This is how it works.
Part 4: DRY (Don’t repeat yourself)
‘At least buy him a sweatshirt. I can’t look at him. He’s not a child anymore. He just can’t walk around naked. At least buy him a sweatshirt. I can’t look at him. He’s not a child anymore. He just can’t walk around naked.’
— ‘Chekhov's Motifs’ directed by Kira Muratova
DRY — is a programmer’s golden rule. Its significance cannot be overestimated. Most of the programming techniques, such as abstractions, inheritance, composition, single responsibility principle, design patterns, and the whole OOP (object-oriented programming) paradigm exist so that you can avoid duplications in code. This task might seem trivial at first, but it might become very difficult for a developer. Let’s look at how it works and what kind of duplications there are.
There’s such an ‘anti-design pattern’ called Magic number. Let’s say that the price variable is multiplied by 0.13 and is written into the tax variable. Here we can easily guess that tax — is a tax rate of 13%. Knowing the rate of income tax, we understand what this number next to the price variable means. However, this number is a magic number. First of all, if you don’t know the tax rate in some specific country, you’ll never realize where this number comes from. Secondly, the income tax rate may change, and in such case you would need to rewrite all the code sections with this number.
Such magic numbers tend to be duplicated in the code many times. Even if now there’s only one such number in your code, it’s very likely to be repeated at some time in the future. So why not just make it a variable right away?!
Any value in your code should have no duplications
There might be, however, some exceptions. If the value has a different meaning in a different place, you can duplicate it.
For example, in one place 0.13 might be a tax rate, but also the fee of the goods in the other. In such case, the number is the same, but the variables should be different.
Duplicate code blocks
Many professional developers come across this problem. Imagine that you need to create a procedure similar to the one you already have. It’s easier to just duplicate the code block than make it a separate function and call it in a place you need. But such an approach might cause so many problems! Changes to this block might need to be done in two, three or more places you copied the code to. So if you make it a separate function, besides having no duplicates, you also acquire some other less obvious, but significant advantages.
Following the single responsibility principle maeans that you need to make a separate code block a single function. Since such a block usually has some specific purpose, it would make sense to make it a separate function and not mix it with the logic in the caller.
Making a code block a single function gives you an opportunity to name the function you created and thus facilitates code writing process. This name should describe the action in the function and thus will make your code more readable.
Do not duplicate the code blocks! Make each of them a separate function
This duplication is the most difficult to spot. You need to be an experienced developer to avoid it. There is a great way to deal with it in the ‘template method’ design pattern, so have some time to read about it. I don’t plan to thoroughly explain the implementation of this pattern in this article. I’m going to just give you some info about why duplicate behavior is problematic and how you can get rid of it using the way below.
Each task we create might has several stages. We create such stages according to the single responsibility principle. Every time you create one more task, you need to try to remember if it has any stages similar to the ones from the previous stages. If it does, we can define an abstract base class for our task classes and specify some of the stages and their execution order in it. For instance, you can create a common class to save data to the file and store this data to the database.
|DB||Connect to the database||Create a record in the database||Send a message that the record was created|
|File||Open file for writing||Write text in the file|
Here two tasks have not only common behavior strategy, but also one common stage. By implementing the behavior of these tasks in an abstract base class, you can transfer the code of the common stages to it, just like it’s mentioned in the stage ‘Notification’ in the table above. This will be highly convenient for shared code use. Separation of the common similar stages in the base class facilitates the process of creating new procedures.
If you see that the tasks you create have common behavior, try to specify it in the parent (base) class of these tasks
‘God is in the details.’
— Ludwig Mies van der Rohe and others
If you are here, it means that by reading this article you’ve discovered some basic things you need to pay your attention to while analyzing your code. Analyzing the code and rewriting some parts of it are basically the parts of the refactoring procedure. One of the main aims of it is to make your code more readable and thus more logical and of higher quality.
It doesn’t matter whose code you’re working on, be it yours or someone else’s. Invest some time in refactoring, it’ll pay off, I promise When your app starts expanding, code refactoring becomes essential. Changing separate parts of the code will always lead to changes in other parts and sometimes in the whole structure. That’s why refactoring is and always will be a part of the working process. In this article I tried to provide you with several directions, moving in which you can refactor properly as well as minimize the need for refactoring and facilitate future analyzing of someone else’s code.
But knowing just these rules is never enough. The perfect code takes a lot of time to become, well, perfect and work as required. Even the most experienced developers rewrite their functions over and over again until the final version ends up stable.
Not so long ago I had a conversation with an artist, who painted a stunning portrait of my friend. I was absolutely fascinated by it, so I asked him how he had been able to achieve such a similarity. The answer was simple: ‘I’d been just repainting her face again and again. Three, four, five times, until I’d become satisfied with the result.’
Aim to keep things simple and meaningful while writing your code
Use the techniques described in this article. Invent your own. Help others to make their code more readable, as a mere bystander sometimes might judge better. Treat your code as a work of art and it’ll be working well!