There is a lot optimizations you can do without going to assembly level, by just writing your C-code differently.
For example
. unsigned char c; for (c = 0; c < 10; ++c) { // do something }
generates much better Z80 code than
int n; for (n = 0; n < 10; n++) { // do something }
But it's not always obvious which code optimizes better. For example, stack variables are terrible on Z80, so statics or globals feel generally a good choice. But using them may also prevent the compiler using register variables.
Then there are different compilers. Code that suits z88dk may be poor for SDCC. Speaking of which, is supposed to be an optimizing compiler. But it supports many processors, and doesn't seem to optimize very well for Z80. But there is also a Z80-only variant ZSDCC. Has anyone tried that one on MSX?
Well, why I'm writing this is that I tried google instructions how to write good optimizing code for SDCC (Z80) and found almost nothing. I guess I just have to write tests myself and learn.
Found this article at least: https://github.com/Fabrizio-Caruso/8bitC/blob/master/8bitC_E...