Assembler Optimizer

Page 10/20
3 | 4 | 5 | 6 | 7 | 8 | 9 | | 11 | 12 | 13 | 14 | 15

By santiontanon

Paragon (1027)

santiontanon's picture

04-07-2020, 09:46

I haven't yet updated the release, but I just pushed a commit where I started adding support for some of sjasm syntax using -dialect sjasm (only the bare minimum I needed for making it work on the Metal Gear disassembly here: https://github.com/GuillianSeed/MetalGear ). 51 patterns matched, saving 52 bytes, only a little bit, but not bad for the first set of patterns (which have not changed for a while!).

It is interesting as each codebase seems to find one corner case or another that I didn't have covered. For example, sjasm uses "#" as a keyword, and I had never thought "#" would be used for anything else than hex constants!

So, still not all of them addressed, but walking down your list TheNestruo ;)

Might dedicate one or two more days to the parser to make it more robust and make sure what is implemented works correctly, and then I will move on to expand the pattern database, to see how far can this first pattern approach be pushed.

By Manuel

Ascended (16702)

Manuel's picture

04-07-2020, 11:36

I also wonder how well this could be use as a post processor for output of C compilers.

By geijoenr

Master (166)

geijoenr's picture

04-07-2020, 12:49

I think register allocation algorithms are complex topics, see below paper related to the one from SDCC:
register allocator for Z80 research paper

Finding an improved version seems quite an undertanking.

SDCC peephole optimization for z80 is pretty bad, but is customizable; this is used in Fuzix to improve code generation:

Fuzix peep for z80

By salutte

Expert (72)

salutte's picture

04-07-2020, 13:53

Sdcc assembly is, surprisingly optimized at a very low level, but it is slow because of "strategic" decisions: bad register allocation choices, local variables in the stack, integer promotion to 16 bits forced by the C standard, abuse of index registers, etc...

Hence, I think that optimization patterns that would match hand-made assembly would not match sdcc-generated assembly.

I think this is also the reason why the peep-hole optimizer is so under-utilized, despite having such a powerful syntax... it's just hard to find patterns for sdcc that do not break compatibility.

By hit9918

Prophet (2895)

hit9918's picture

04-07-2020, 15:49

"register allocation", the SDCC example that I posted did not run out of registers, but it uses 3 address registers for one pointer variable!
it misses a special feature: "p->x", say p is in HL. now is needed to do some INC HL to get at the x of the struct. and then HL has a different value than the p variable! and for that to be possible the compiler needs a special feature.

to use (iy+offset) is easier because IY stays the same value as the C pointer. but still SDCC doesnt have this (it did INC IY in the example that I posted).

using (iy+offset) could make this simple nice code

;while(1) {
loop:
;p->x += p->dx;
	ld  a,0(iy)
	add 2(iy)
	ld  2(iy),a
	ld  a,1(iy)
	adc 3(iy)
	ld  3(iy),a
	jr  loop
;}
--
139 cycles

By santiontanon

Paragon (1027)

santiontanon's picture

04-07-2020, 19:38

SDCC uses some "funny" assembler syntax Smile "ld a,1(iy)" instead of "ld a,(iy+1)", and hex values as labels. But I think it should not be hard to adapt the parser to accept this syntax if some flag, like "-dialect sdcc" is set. Optimal register allocation might not be achievable in the immediate future, but I'll add it to the todo list to consider in the future. Thanks for the references @geijoenr!!

By santiontanon

Paragon (1027)

santiontanon's picture

05-07-2020, 23:44

I just pushed a new release (alpha v4): https://github.com/santiontanon/mdlz80optimizer/releases/tag...

This release fixes many, many things and adds better support for Glass/asMSX/sjasm syntax (sjasm support is only barebones for now though). For example, I was able to now verify that the symbol tables generated by mdl exactly match those generated by Glass in some more complex projects like "threed".

But the most exciting news is that there is now support for integrating mdl into IDEs/text editors! (thanks to theNestruo for this!). For example, here're a couple of screenshots of it working in VSCode and Sublime Text. This is already being very useful for me, so, I hope it can be useful for others too. If you give it a try and find any issues by all means let me know. We just added support for this yesterday, and I bet there's LOTS of rough edges still!

By ARTRAG

Enlighted (6405)

ARTRAG's picture

06-07-2020, 09:38

Very good! I don't use them but I see it can be handy if you want to accept or reject manually the optimization suggested

By ARTRAG

Enlighted (6405)

ARTRAG's picture

06-07-2020, 10:14

A nice modern superoptimizer
https://arxiv.org/abs/1711.04422
https://github.com/google/souper

Sadly it works on the intermediate representation (IR) in LLVM
IR is language independent but I do not think it can be translated to z80 asm easily.

By Metalion

Paragon (1155)

Metalion's picture

06-07-2020, 12:13

I would be definitively interested if it included a speed optimization option.
I don't care about saving 1 or 2 bytes here and there, I'm more interested in gaining speed.

Sometimes a gain in size means a gain in speed, but not always.
The classic example is multiple OUTIs instead of one OTIR, but there are other cases.

For example:

jr  address ; 2 bytes, 13 t-states
jp  address ; 3 bytes, 11 t-states

you lose 1 byte but you gain 2 t-states.

Page 10/20
3 | 4 | 5 | 6 | 7 | 8 | 9 | | 11 | 12 | 13 | 14 | 15