DRY Principle

DRY (Don’t Repeat Yourself) Principle suggests every piece of knowledge, whether it is in code or in documentation#, must have a single, unambiguous, authoritative representation within a system. [The Pragmatic Programmer](lit/@Hunt1999)

Such principle is necessary because programmers are always need to adapt to new requirements and knowledges in order to develop or maintain their software. This could resulted in undesired duplication of code or documentation inside the source code repository which makes maintenance more difficult.

There are 4 common types of duplications (4Is) that DRY principle is aimed to solved:

Imposed Duplication

There could be a time that a system that uses different components in different programming language, which seems to be forced on the developers to duplicate information. If they need to share the same data structure, often time we need to duplicate that data structure, and write them into different programming language.

Let’s say that the software composed of server side and client side. Server is implemented in C, whereas client is developed under JavaScript. The Message data structure would need to be represented in both languages. This is, unfortunately, a duplication of knowledge in order to have the components within the system to communicate with each other.

There is a solution for it: 202207132124#, or simple filter. They often use database scheme or some sort of metadata (preferably in [](@Hunt1999.md#The%20Power%20of%20Plain%20Text%20Noted%7Cplain%20text)) to generate the share structure into different forms (programming languages). In this way, in the above example, the programmers don’t need to maintain the Message data structure in two different programming language. Instead, they could just look at the database schema or the metadata and do their necessary maintenance there.

The practice can be implemented into documentation as well. Let’s say that the manual for the program needed to be in Markdown and PDF, and additionally be put up onto a website. An automated script, could be accompanied by code generator, should be used to maintained them instead of using manual labour which is tedious and prone to human errors. [The Pragmatic Programmer](lit/@Hunt1999) Treat them as a view of the knowledge, not merely as distinct entities that have no relationship with each other.

Pushing forward, we can even do the same thing to tests or codes! Instead of writing them in bare programming languages, we could write them in documentation! This is known as #literate-programming where codes will be documented extensively without the sacrifice of the documentation quality.

That being said, this doesn’t mean that you should comment everywhere. Remember: Bad code requires lots of comments. Reserve comments to high-level explanations in header files such as the structure of the infrastructure or the interface issues. Makes the code self-documenting if possible, don’t use extensive comments to elaborate them as they will be outdated inevitably. If that is not possible, use the comments to note down its implementation or the code itself and restrict them in the source files.

Inadvertent Duplication

Duplication can be a result from mistakes in the design where developers don’t necessarily aware that they are duplicating the same knowledge.

Let’s say that there are two classes, which contain the same attribute. If changes happen to that attribute due to the business logic, then these two classes must adapt accordingly. Clearly, such duplication is not desirable. There are more non-obvious duplication such as calculatable attribute which could be ignored by most developers.

To minimise the occurrence of this kind of duplication, we need to normalise the data or attribute according to the business logic and requirements. If performance matters, often time a cache is needed, we have to make sure that the impact due to the changes of data (usually calculatable attributes) localised within the function or class so that the outside world doesn’t need to worry about such violation.

Note: The reason of using getter and setter is that they allow functionality expansion such as caching in the future.

Impatient Duplication

Some developers would find that copying codes elsewhere from Internet or within the organisation can be a good idea for shorting the developing time of the program. However, this is not always the case. The copied code itself could be burry for many years to come, and at that time, the code would be impossible to be understood as there are little context around it.

Instead, we need to understand what the code does, how could it fail and why should we use it. If the code doesn’t seems self-documenting, note the information in comments or in documentation depending on its properties (implementation or interface). Shortcuts make for long delays.

Interdeveloper Duplication

Working with multiple people or even multiple teams can be a nightmare to the maintenance of the program as there could be multiple instances of duplication undetected by the radar.

To avoid that, [The Pragmatic Programmer](lit/@Hunt1999) suggested to have a clear design, a strong technical project leader and a well-understood division of responsibilities (everyone know their part and how to do it) within the design. The authors also encourage the idea of module reusability.

A friendly environment must be set up in order to encourage the reuse of modules. Put those modules into a central place in the repository (like directory named with utils or mods). A dedicated project librarian who will be in charge on facilitating knowledge exchange between different people or teams would be ideal.

Draft and practice a policy to have people read other’s source code and documentation informally (chatroom, mailing list) or during code reviews.

Links to this page
  • The Pragmatic Programmer

    Active code generator is often used as a bridge between two disparate environments in avoidance of violating the 202206171004#. Typically, there will be a schema as an input for the code generator. It will produce into either two different programming languages or forms according to the given arguments. When the schema change, the result will reflect the change made. Therefore, active code generator, in contrast its passive counterpart, need to be run more than once, ideally during the build process.

    Different view of the documentation (website, Markdown, PDF) should be maintained by an automated script adhered to the 202206171004#.

    If receiver has to handle more than enough or even all the events that they don’t need, this increase the coupling since it violates the object encapsulation (having knowledge# about them in order to interate with many objects). Such thing should be avoided.

    Specification should not and will not capture every detail and nuance of a system. It only serves as a restriction onto the creativeness of the programmers instead of assisting them. Specification should be treated as a view of the requirement, just like the code implementation, adhering to the DRY Principle# design principle.

    Stick to the practice of 202206171004#, # and decoupling# to reduce the numbers of irreversible decisions.

    Comments# should be used to describe why something is done, its purpose, its goal, its engineering trade-offs, what’s other alternatives were discarded etc. Don’t document how it is done, it violates the 202206171004# as the code itself should already explain it.

  • Refactoring: Improving The Design of Existing Code

    That being said, having more classes is not always a good sign. If there is a need to make changes all over the place (to different classes), there is a need to move the necessary methods and/or fields to a single class so that all changes will happen on only one class. Parallel inheritance hierarchies could be a source of duplication# too which we could solve it by moving the duplicated methods and/or fields up to the hierarchy.

    The goal of the refactoring is to make the program easy to read, have all logic specified in one and only one place (DRY Principle#), doesn’t allow changes to alter existing behaviour and allow only simple conditional logic.

    The obvious one is there is duplicated# code. If the same code structure appear in more than one places, extract it into a new method and then replace the code in place with the method. If it appears in sibling subclasses, then extract it into a new method and pull it up to the parent class. Be aware if the code is similar but with slightly different steps, form a template method where the similar bits will be placed, and different bits to be in new method that will be called in template method. This way, the template method will be easier to read, and the implementation details will be encapsulated in other methods. If the methods are doing the same thing, but one is doing it is a simpler way, adopt the simpler method over the complex one. If the duplicated code is in unrelated classes, then extract it into a class and make it a component in both classes. That being said, it is highly dependable on the situation which class does the code or method should belong to.

    Without refactoring, the code will lose its structural cohesion as people change the codes for various intention. The harder it is to see the design in code, the harder it is to preserve it, and the more rapidly the structure of the code decays. Refactoring helps code retain its shape (p. 55). Poorly designed code usually has duplications# where different codes are doing the same thing. Refactoring can remove such duplications. A good design has only the code that says everything once and once only.

  • Refactoring

    Refactoring is a process of rewriting, reworking and re-architecting of the program’s codebase without changing its observable behaviour. It is often used to prevent or correct code duplication#, preserve the orthogonality# of the software, update the code base on the changes of requirements or better understanding on the underlying technology, or optimise. The ultimate goal of Refactoring is to make the software easy to read, have all logic specified in one and only one place (no duplications#), doesn’t allow changes to alter existing behaviour, and allow only simple conditional logic.

  • Publish/Subscribe Protocol

    Note: Avoid having a subscriber that has to handle more than enough or all the events that they don’t need. It will decrease the coupling since it doesn’t have the knowledge# about them in order to interact with so many objects.

  • Prototyping

    Note: When building a prototype for architecture, we should inspect on how the system hangs together as a whole in a higher level. Turn our focus on the major components’ responsibilities and collaboration between those to see whether they are well-defined and appropriate. Investigate the coupling (is it minimised?) of the architecture. Find out whether there are potential sources of duplication. Are the interface definitions and constraints acceptable? Pay attention to modules’ accessibility (access path and the access) to the data it needs during the program execution.

  • Documentation Guide
    Describe how things are done by function code (adhered to #202206171004)
  • Code Generator

    An active code generator is often used as a bridge between two disparate or different environments in avoidance of violating the #202206171004. It typically uses schema, often in a relatively simple configuration language# or just plain text#, as an input, to produce the defined form into two different languages (SQL and #cpp, for example). If there is a change in the schema, the result will reflect the change made, and produce the respective output.

#oop #functional-programming #documentation #Imposed #Inadvertent #Impatient #Interdeveloper #literate-programming