OK, this is a bit of a nerdy* subject but I was giving the strlen() function from the standard C library some thought.
If one were to implement it on Z80 the natural solution would be to use the cpir instruction with some wrapper code:
; size_t strlen(const char *s); strlen: xor a ;5 ld c,a ;5 ld b,a ;5 cpir ;23/18 ld hl,-1 ;11 sbc hl,bc ;17 ret ;11
Execution time will be 72+n*23 cycles where n is the length of the string in characters.
So, the natural question is: Could you make it any faster?
Loop unrolling is usually the way to go, but with conditional code it seldom gives you an advantage since the branch often can be incorporated in the loop.
Unrolled code would look something like this:
REPT 16 cp (hl) ; 8 jr z,.strend ; 8/13 inc hl ; 7 ENDR
You still get 23 cycles per character but larger code size and at some point you still have to loop.
However, if you look through the Z80 instruction set you will notice that the fastest conditional branch in the Z80 instruction set isn't jr at 8 cycles but the rarely used conditional return at 6 cycles.
What happens if we use it for an unrolled loop?
strlen: xor a ;5 ld c,l ;5 ld b,h ;5 cp (hl) ;8 call nz,loop ;18/11 sbc hl,bc ;17 ret ;11 N EQU 16 loop: REPT N inc hl ;7 cp (hl) ;8 ret z ;6/12 ENDR jp loop ;11
OK, so figuring out how many cycles it will cost for each length is a bit trickier here.
The first thing to notice is the conditional call. It costs as much as a non-conditional call but gives an early exit for 0-length string.
At 62 cycles it is actually faster than the cpir version, but that is not really surprising since we never enter any loop.
What about a string that is only one character long?
Well, you get some call overhead and end up at 96 cycles instead of the 95 cycles of the cpir version.
After that you get an additional cost of 21 cycles per character. That is 2 cycles faster than cpir.
To make it faster you need to recuperate the cost of the jp instruction.
That means that as long as you repeat 6 or more times the loop expanded version will be faster than the cpir based one.
There is one thing that seems a bit iffy about the loop expansion.
At the end of it you have a conditional return followed by a non-conditional jump.
For strings lengths that ends at that particular character that is the fastest option. For longer strings you would want to adjust it somewhat.
strlen: xor a ;5 ld c,l ;5 ld b,h ;5 cp (hl) ;8 call nz,loop ;18/11 sbc hl,bc ;17 ret ;11 N EQU 16-1 loop: REPT N inc hl ;7 cp (hl) ;8 ret z ;6/12 ENDR inc hl ;7 cp (hl) ;8 jp nz,loop ;11 ret ;11
This version costs a bit more if the string ends at the same time as the loop unrolling but it costs 6 cycles less per loop compared to previous version.
I don't really think anyone needs to quickly do a strlen on many long strings on a Z80 and I can't really think of any other function where this is useful.
I just didn't want to be the only one to waste some time on this.
* Don't say I didn't warn you.