CPU
We could use perf
and flame graph which visualise the CPU time usage by the program in its lifetime to investigate whether there are opportunities in improving CPU time.
Be aware that when measuring the program execution time or CPU cycle, it will inevitably introduce biases into the measurement since the program or code that used for the measurement will need the time for themselves too.
Memory
For JavaScript and its variants, we could use the inspect option provided by node to visualise the memory usage by the program in its lifetime to investigate whether there are opportunities in improving memory usage.
Sometime, improving memory usage could drastically improve the performance of one program.
Layout and Multithreading
The layout of a program could affect its own performance regardless of its own code. This means that the link order (changes of function address) and environment variable size (moves of program stack) could be arbitrary for one architecture which resulted in more or less cache conflict. Therefore, we need to eliminate the layout differences by using a stabiliser, which randomises the memory layout of the program, analyse it using 202205072243# and use causal profiler such as coz
especially for programs that utilise parallelism#.
While using causal profiler, doing a virtual speed-up for the program (slow down every component except for the component that is aimed to speed-up). Measures which speed-up of a component is significant.
To-do: Learn more about casual profiler and stabiliser. Read the paper Layout Biases Measurement.
Note on -O3
The option -O3
from most C/C++ compilers does not actually optimise the program that much. Its effect on the program’s performance is not significant.