Assembler Optimizer

Page 7/20
1 | 2 | 3 | 4 | 5 | 6 | | 8 | 9 | 10 | 11 | 12

By Metalion

Paragon (1155)

Metalion's picture

01-07-2020, 20:30

Do you optimize on size ? or on speed ?
Or is it an option to choose from ?

By santiontanon

Paragon (1027)

santiontanon's picture

01-07-2020, 20:59

The optimizer currently only prints "size" in the output, but internally it could check for both (the internal data structures contain both size and timing information, although only the size info is currently being used).

All the optimizations that are currently in are those that optimize both size and speed. But what I would like to implement is that via input flags you can tell the optimizer to activate optimizations that favor size over speed, or viceversa.

By theNestruo

Master (158)

theNestruo's picture

01-07-2020, 22:17

santiontanon wrote:

But what I would like to implement is that via input flags you can tell the optimizer to activate optimizations that favor size over speed, or viceversa.

I would go for (what I think is) easier idea: allow a parameter to choose which patterns.txt files to apply. You can have embedded (in the classpath) default.txt (current pbo-patterns.txt), size.txt and speed.txt. The option "-use default,size" will apply default.txt+size.txt, the option "-use speed" will apply speed.txt only, and the option "-use default,speed,./my-secr3t-optimizations.txt" will apply default.txt+speed.txt+an user provided pbo-patterns file.
I think is easy to implement (once you can parse a file, you can parse any of them), extensible, make easy to test additional pbo-patterns.txt files, and can even be used as a priorization system (deciding the order the patterns will be applied if found in more than one file).

But, again, writing an idea in a forum is quick; doing the actual code is... "less quick" hahaha

By santiontanon

Paragon (1027)

santiontanon's picture

02-07-2020, 01:09

Yeah, that actually sounds like a better option! Should be quite trivial to add an input flag to specify the optimizer files. Hopefully I can have some updates tomorrow or so with some of these functionalities in.

By pgimeno

Master (191)

pgimeno's picture

02-07-2020, 09:31

santiontanon wrote:

All the optimizations that are currently in are those that optimize both size and speed. But what I would like to implement is that via input flags you can tell the optimizer to activate optimizations that favor size over speed, or viceversa.

Via "pragma" comments? That would be nice, so that the programmer can choose the "time-critical" and "size-critical" sections of the code. Or even what sections to not optimize at all.

In the VDP test, for example, I have some routines dedicated exclusively to introduce a delay of a known number of CPU cycles. It would be disastrous if any of these was optimized (unless there are timing-preserving optimizations, but that sounds like an atypical use case).

By santiontanon

Paragon (1027)

santiontanon's picture

02-07-2020, 11:03

(apologies in advance for the long message Wink)

Alright, I finally got the parser to recognize basic Glass and asMSX syntax, so, I can test the optimizer in some of your projects! No great optimizations yet, as the set of patterns is limited, but at least I can finally test it in projects other than my own Smile (which helped me fix lots of bugs haha)

Here are some examples!

When running it in theNestruo's World Rally (with this commandline):

java -jar mdl.jar msx-wrally/src/rally.asm -dialect asmsx -I msx-wrally/ -warn-off-labelnocolon -po

I get this output:

PatternBasedOptimizer substitution in msx-wrally//src/rally_rom_code.asm, line 260: 1 bytes saved
    cp 1
Replaced by:
    dec a

PatternBasedOptimizer substitution in msx-wrally//src/rally_rom_code.asm, line 1351: 1 bytes saved
    srl a
    srl a
    srl a
Replaced by:
    rrca
    rrca
    rrca
    and 31

PatternBasedOptimizer substitution in msx-wrally//src/rally_rom_code.asm, line 896: 1 bytes saved
    ld b, NUM_HIGH_SCORES  ; para saber cu�ndo dejar de comparar
    ld c, HI_SCORES_SIZE  ; para saber cu�ntos bytes mover
Replaced by:
    ld bc, HI_SCORES_SIZE + 256 * NUM_HIGH_SCORES

PatternBasedOptimizer substitution in msx-wrally//src/rally_rom_code.asm, line 1044: 1 bytes saved
    ld b, NUM_HIGH_SCORES  ; para saber cu�ndo dejar de comparar
    ld c, HI_SCORES_SIZE  ; para saber cu�ntos bytes mover
Replaced by:
    ld bc, HI_SCORES_SIZE + 256 * NUM_HIGH_SCORES

PatternBasedOptimizer substitution in msx-wrally//src/rally_rom_code.asm, line 1300: 1 bytes saved
    ld a, SPAT_END
    ld (hl), a
Replaced by:
    ld (hl), SPAT_END

PatternBasedOptimizer substitution in msx-wrally//src/rally_rom_page0.asm, line 51: 1 bytes saved
    sla a  ; deja un pixel de margen
Replaced by:
    add a, a

PatternBasedOptimizer: 6 patterns applied, 6 bytes saved

Only 6 bytes saved, but hey, again, this is only the beginning Smile

I also tried it with Grauw's 3d engine project, when running it like this (I had to change the value of the R800 constant in the code beforehand, as I do not have support for R800 at the moment):

java -jar mdl.jar threed/src/COM.asm -I threed/lib/neonlib/src -I threed/gen -dialect glass -po

I get this output:

PatternBasedOptimizer substitution in threed/lib/neonlib/src/VDP.asm, line 21: 2 bytes saved
    cp 1
    jr c, MSX1
    jr z, MSX2
Replaced by:
    cp 1 + 1
    jr c, MSX2

PatternBasedOptimizer substitution in threed/src/Application.asm, line 97: 1 bytes saved
    ld a, 0
Replaced by:
    xor a

PatternBasedOptimizer substitution in threed/lib/neonlib/src/Memory.asm, line 321: 1 bytes saved
    ld a, 0
Replaced by:
    xor a

PatternBasedOptimizer substitution in threed/lib/neonlib/src/VDP.asm, line 167: 1 bytes saved
    ld a, 0  ; set HR line 0
Replaced by:
    xor a

PatternBasedOptimizer substitution in threed/src/Application.asm, line 236: 4 bytes saved
    ld ix, Application_points
Replaced by:

PatternBasedOptimizer substitution in threed/src/Application.asm, line 243: 4 bytes saved
    ld ix, Application_edges
Replaced by:

PatternBasedOptimizer: 6 patterns applied, 13 bytes saved

Again, only a few bytes saved, but hey, it's working Smile Of course, you can get the optimizer to directly generated the optimized assembler output for you (with the -asm flag, that will generate a single asm file with your whole project, with all macros resolved, and all optimizations applied, ready to be compiled).

I verified all the proposed optimizations are actually safe in thos codebases, so quite happy about it, as some are quite tricky with even nested macros that were tricky to parse right.

I also added the functionality to prevent optimizations (if you add ; mdl:no-opt to any line, it'll prevent any optimization, and if you don't like that pragma code, you can change it with a commandline flag). And also I added the flag to specify which optimization pattern file to use.

I'm leaving it here for today, but will continue during the weekend. My next task is to have an option to generate the output we were discussing above so that it can be parsed easily by VSCode/Sublime plugins, and after that I'll go back to improving the optimizer.

Latest version in github: https://github.com/santiontanon/mdlz80optimizer/releases/tag...

By santiontanon

Paragon (1027)

santiontanon's picture

02-07-2020, 10:52

btw, the latest version should just require Java 8 now instead of 12 (I hope Smile )

By Grauw

Ascended (9181)

Grauw's picture

02-07-2020, 19:53

Very cool! I must try it on my other projects Smile.

Btw if you need newer than Java 8 I would use Java 11 which is also LTS.

If you have replacements like this:

PatternBasedOptimizer substitution in msx-wrally//src/rally_rom_code.asm, line 1044: 1 bytes saved
    ld b, NUM_HIGH_SCORES  ; para saber cu�ndo dejar de comparar
    ld c, HI_SCORES_SIZE  ; para saber cu�ntos bytes mover
Replaced by:
    ld bc, HI_SCORES_SIZE + 256 * NUM_HIGH_SCORES

Does it also work if b and c are assigned in reverse order, or if there is some non-related code in-between (let’s say a nop)?

By santiontanon

Paragon (1027)

santiontanon's picture

02-07-2020, 20:17

Currently pattern matching is very limited, so I have two patterns one for b,c->bc and another for c,b->bc. But if there is something in between it'll currently not catch it. But that's a good point. Just added an item to my to-do list to figure out if there is an easy way to allow for that!

As for trying it on other projects, I must warn you that I only added support for the Glass syntax that was used in the "threed" project, if there is some other syntax used that I'm not supporting it'll fail. But if you try it, do let me know, and I can add support for any additional needed syntax! Smile

By Grauw

Ascended (9181)

Grauw's picture

02-07-2020, 20:39

Is there a way to specify include paths?

Edit: Never mind, I see you use it above, it is missing from the readme though :).

[grauw] ~/Development/vgmplay % java -jar ../mdlz80optimizer/target/mdlz80optimizer-0.2-jar-with-dependencies.jar src/COM.asm -I lib/neonlib/src -I lib/gunzip/src -dialect glass -po -popotential
ERROR: expression failed to parse with token list: [,, 0]
ERROR: Cannot parse line lib/gunzip/src/deflate/Alphabet.asm, 26: 		ds Alphabet_MAX_CODELENGTH * 2, 0
ERROR: Problem including file at src/COM.asm, 55: 	INCLUDE "deflate/Alphabet.asm"
Page 7/20
1 | 2 | 3 | 4 | 5 | 6 | | 8 | 9 | 10 | 11 | 12