DRY (Don’t Repeat Yourself) Principle suggests every piece of knowledge, whether it is in code or in documentation#, must have a single, unambiguous, authoritative representation within a system. [The Pragmatic Programmer](lit/@Hunt1999)❌
Such principle is necessary because programmers are always need to adapt to new requirements and knowledges in order to develop or maintain their software. This could resulted in undesired duplication of code or documentation inside the source code repository which makes maintenance more difficult.
There are 4 common types of duplications (4Is) that DRY principle is aimed to solved:
Imposed Duplication
There could be a time that a system that uses different components in different programming language, which seems to be forced on the developers to duplicate information. If they need to share the same data structure, often time we need to duplicate that data structure, and write them into different programming language.
Let’s say that the software composed of server side and client side. Server is implemented in C, whereas client is developed under JavaScript. The Message
data structure would need to be represented in both languages. This is, unfortunately, a duplication of knowledge in order to have the components within the system to communicate with each other.
There is a solution for it: 202207132124#, or simple filter. They often use database scheme or some sort of metadata (preferably in [](@Hunt1999.md#The%20Power%20of%20Plain%20Text%20Noted%7Cplain%20text)❌) to generate the share structure into different forms (programming languages). In this way, in the above example, the programmers don’t need to maintain the Message
data structure in two different programming language. Instead, they could just look at the database schema or the metadata and do their necessary maintenance there.
The practice can be implemented into documentation as well. Let’s say that the manual for the program needed to be in Markdown and PDF, and additionally be put up onto a website. An automated script, could be accompanied by code generator, should be used to maintained them instead of using manual labour which is tedious and prone to human errors. [The Pragmatic Programmer](lit/@Hunt1999)❌ Treat them as a view of the knowledge, not merely as distinct entities that have no relationship with each other.
Pushing forward, we can even do the same thing to tests or codes! Instead of writing them in bare programming languages, we could write them in documentation! This is known as #literate-programming where codes will be documented extensively without the sacrifice of the documentation quality.
That being said, this doesn’t mean that you should comment everywhere. Remember: Bad code requires lots of comments. Reserve comments to high-level explanations in header files such as the structure of the infrastructure or the interface issues. Makes the code self-documenting if possible, don’t use extensive comments to elaborate them as they will be outdated inevitably. If that is not possible, use the comments to note down its implementation or the code itself and restrict them in the source files.
Inadvertent Duplication
Duplication can be a result from mistakes in the design where developers don’t necessarily aware that they are duplicating the same knowledge.
Let’s say that there are two classes, which contain the same attribute. If changes happen to that attribute due to the business logic, then these two classes must adapt accordingly. Clearly, such duplication is not desirable. There are more non-obvious duplication such as calculatable attribute which could be ignored by most developers.
To minimise the occurrence of this kind of duplication, we need to normalise the data or attribute according to the business logic and requirements. If performance matters, often time a cache is needed, we have to make sure that the impact due to the changes of data (usually calculatable attributes) localised within the function or class so that the outside world doesn’t need to worry about such violation.
Note: The reason of using getter and setter is that they allow functionality expansion such as caching in the future.
Impatient Duplication
Some developers would find that copying codes elsewhere from Internet or within the organisation can be a good idea for shorting the developing time of the program. However, this is not always the case. The copied code itself could be burry for many years to come, and at that time, the code would be impossible to be understood as there are little context around it.
Instead, we need to understand what the code does, how could it fail and why should we use it. If the code doesn’t seems self-documenting, note the information in comments or in documentation depending on its properties (implementation or interface). Shortcuts make for long delays.
Interdeveloper Duplication
Working with multiple people or even multiple teams can be a nightmare to the maintenance of the program as there could be multiple instances of duplication undetected by the radar.
To avoid that, [The Pragmatic Programmer](lit/@Hunt1999)❌ suggested to have a clear design, a strong technical project leader and a well-understood division of responsibilities (everyone know their part and how to do it) within the design. The authors also encourage the idea of module reusability.
A friendly environment must be set up in order to encourage the reuse of modules. Put those modules into a central place in the repository (like directory named with utils
or mods
). A dedicated project librarian who will be in charge on facilitating knowledge exchange between different people or teams would be ideal.
Draft and practice a policy to have people read other’s source code and documentation informally (chatroom, mailing list) or during code reviews.