Assembler Optimizer

Page 24/26
17 | 18 | 19 | 20 | 21 | 22 | 23 | | 25 | 26

By Grauw

Ascended (9377)

Grauw's picture

12-10-2020, 01:20

Uninteresting wrote:

I believe that well over ten years ago suffix arrays were becoming more preferable to suffix trees in some fields like bioinformatics for reasons like lower memory usage. They represent mostly the same thing, but I tried googling for suffix arrays in relation to compilers and found nothing of interest. Maybe you'd have better luck, but I suppose suffix trees are easier abstractions to handle.

Implementing algorithms from the original articles tends to be the harder option than checking some online lecture notes. I think the latter was the way I used when I implemented Ukkonen's algorithm in my previous life. (Plus I once implemented one algorithm from the original mid-80s article, only to find the algorithm didn't work and a fix was released in a later issue of the journal.)

Hehe well I managed to make my way through. But thanks for the tip, I’ll broaden my search a bit Smile. I notice finding the right keywords (like “procedural abstraction” and “code compaction” rather than “compression”) also makes a lot of difference for what you can find. Amongst others I found a survey of different techniques and a thesis about code compaction (not read yet), which seemed easier to read than papers like Ukkonen’s, so there are also differences there.

I think indeed if lowering memory use of the compressor was a goal, suffix arrays would be interesting. I’m sure they see many applications in databases, search engines, DNA analysis, etc.

However since I’m not working with big data, I haven’t looked into them much since it seemed to me working with the more explicit tree structure is more convenient. Just like working with a tree structure is generally more convenient than a tree postfix-encoded as a sequence. (Though I found several articles about detecting repeats in ordered trees by postfix encoding them and feeding them into a suffix tree.)

As long as the algorithms are O(n) or O(n log n) it’s probably good enough for my purposes.

Btw I found a project called Squeeze, referenced in a paper, which do code compaction.

https://www2.cs.arizona.edu/projects/squeeze/

By santiontanon

Paragon (1103)

santiontanon's picture

12-10-2020, 07:28

Oh wow, didn't realize there's so much literature on the topic! In one of the papers on the squeeze project they claim 30% reduction in code size in average, which is quite impressive! (MDL gets 1% at best!). I suspect savings from binaries directly written in assembler will be lower, as there'll be much less boiler plate code than in those generated from C/C++, but still!

By Grauw

Ascended (9377)

Grauw's picture

17-10-2020, 01:23

I implemented a first version which just does a single repetition elimination pass on the “stream of commands” type music files generated by my gngplay project. The repeat finding is still very basic, and also not optimised though it runs pretty quick. But I wanted to share my first results of running it on the Ghosts ’n Goblins music files:

Track Original Compressed Reduction
 01      132       73      45%
 02      657      471      28%
 03      286      206      28%
 04     2052     1355      34%
 05     1863      923      50%
 06      486      329      32%
 07      413      249      40%
 08     1817     1088      40%
 09      763      513      33%
 10     2130     1206      43%
 11     2742     1707      38%
 12     1214      704      42%
 13     3449     2131      38%
 14      392      304      22%
 15     2269     1595      30%
 16      831      471      43%
 17      304      206      32%
 18      653      491      25%
 19     3782     2076      45%
 20      519      397      24%
 21     1722     1284      25%
 22      473      339      28%
 23      209      169      19%
 24      890      465      48%
 25      272      230      15%
 26       80       70      12%
 27      170      108      36%

Total  30570    19161      37%

Note:

  1. The sizes are expressed as nr. of commands
  2. Call / return command overhead for repeats is not counted
  3. It goes only 1 level deep, repeats inside repeats are not found

So although I did not count certain factors contributing to final file size, there’s also a lot of missed opportunity still. I also haven’t iterated on various strategies to improve the compression, like different pattern search ordering or reverse prefix trees.

Results do not translate to assembler files already procedurally abstracted by the programmer.

By Grauw

Ascended (9377)

Grauw's picture

17-10-2020, 02:40

I note from the above table that generally speaking the size reduction improves as the files get bigger. So I concatenated the data of all the tracks together and it gets a little smaller; 30570 -> 18216 commands, a reduction of 40%. Mh. Longer music means more repetition I guess, and the tracks don’t share too much between each other.

My current implementation functionally matches the “LFS” algorithm in the paper Simple Linear-Time Off-Line Text Compression by Longest-First Substitution. They also describe an improvement called “LFS2” which is what I describe in my third point above. For their test corpus they get a 54% size reduction for LFS and 80% (another 55%) for LFS2.

So if I extrapolate this to my results, if the size reduction for the music files is 37%, for the LFS2 algorithm I expect an additional size reduction of about 37%, or a 60% reduction in total. If I look at the sizes of the repeats found, several large ones that are internally still uncompressed, such a further reduction would be in line with my expectations.

p.s. The MFFS (Re-pair) method it mentions works bottom-up from shortest rather than longest matches, see here (chapter 7). It does not use a suffix tree but is its own stand-alone algorithm, so it may be simpler to implement with comparable results.

By santiontanon

Paragon (1103)

santiontanon's picture

18-10-2020, 02:52

This is very interesting! Thanks for the update Grauw! I have not yet had a chance to read through the algorithm in detail, but from the explanations you've been putting here, it seems very related to adaptive dictionary compression methods like LZW, right? trying to find common subsequences. So, in addition to using it for compressing code, I wonder if this could also be used as a compression algorithm for data rather than commands.

Also, I'm curious to see how would the call/ret overhead play with the numbers. However, even if the compression goes down from 35% to 20% or even 10% when including that overhead, that's still much better than the current levels of compression MDL can do, which I'd say are around the single digit % number. So, this is all very promising!

By Grauw

Ascended (9377)

Grauw's picture

18-10-2020, 05:46

Yeah LZ78 (which LZW is based on) is related, although from what I understand it’s less optimal than LFS2 or Re-pair because the dictionary itself contains duplication due to the way it is constructed (it is not irreducible).

These off-line dictionary compression techniques (which form a context-free grammar) apply to any type of data, but due to their hierarchical nature they are also suitable for code. They do not need a decompression buffer and can be processed straight from ROM, while on-line LZ77-based algorithms require RAM because the dictionary is built on the fly.

Some ret overhead and call stack depth can be reduced by eliminating tail recursion. Ordering subroutines such that they follow a tail recursion allows you to eliminate tail jumps as well. The opportunities for tail call optimisation might be improved by reversing the input before compression.

About the percentages, note that my input data contains ample repetition, while program code already has repetitions removed by the programmer as subroutines, plus repeats that conditionally branch or unbalance the call stack must be split or rejected. Nevertheless I think typical code still contains repetitions in a bunch of places.

p.s. Given a program execution trace, suffix trees can also be used to identify hot code paths.

By Giangiacomo Zaffini 2

Master (233)

Giangiacomo Zaffini 2's picture

24-10-2020, 14:48

Hi optimizers!
back in July I hinted in one reply of this thread about GNU binutils for z80 being usable.
So, as an exercise, I ported wonderful (SCC sound driver powered) cobinee's LR
to GNU assembly for z80.
Here a .zip of the ported program worked out in a Lubuntu laptop with GNU binutils 2.35 .

I know I went out of time and this trail is cold now. But better late than ever. :P

By santiontanon

Paragon (1103)

santiontanon's picture

25-10-2020, 02:20

oh, interesting! other than the assembler, which gnu tools do support z80? can you use gprof or gdb with z80 binaries? (and if you do, do you run it directly on MSX or on the Linux/Mac/Windows side?)

By Giangiacomo Zaffini 2

Master (233)

Giangiacomo Zaffini 2's picture

25-10-2020, 16:11

@santiontanon : GNU as cross assembler for z80 does not support compiling with `-pg`, and gprof cross profiler should have a way for pushing data from target to host, such a semihosting mechanism. So it is not enabled yet.
gdb for z80 is not usable yet, Sergey Belyashov a.k.a. b-s-a is actively contributing among other volunteers.
Ciao,
Giacomo

By Grauw

Ascended (9377)

Grauw's picture

06-11-2020, 22:32

Old numbers without repeats-in-repeats on the left (slightly updated), and numbers that also include compressing repeats inside repeats on the right.

Track Original Compressed Reduction  Compressed Reduction
 01      132       68      48%           58      56%
 02      657      456      31%          318      52%
 03      286      195      32%          140      51%
 04     2052     1335      35%          935      54%
 05     1863      910      51%          514      72%
 06      486      317      35%          226      53%
 07      413      241      42%           95      77%
 08     1817     1072      41%          484      73%
 09      763      495      35%          364      52%
 10     2130     1190      44%          519      76%
 11     2742     1676      39%         1085      60%
 12     1214      695      43%          319      74%
 13     3449     2116      39%         1247      64%
 14      392      282      28%          230      41%
 15     2269     1578      30%         1221      46%
 16      831      456      45%          126      85%
 17      304      198      35%          175      42%
 18      653      476      27%          366      44%
 19     3782     2069      45%          742      80%
 20      519      382      26%          328      37%
 21     1722     1250      27%         1024      41%
 22      473      325      31%          245      48%
 23      209      157      25%          120      43%
 24      890      461      48%          166      81%
 25      272      216      21%          188      31%
 26       80       64      20%           58      27%
 27      170      100      41%           90      47%

Total  30570    18780      39%        11383      63%

Note:

  1. The sizes are expressed as nr. of commands
  2. Call / return command overhead for repeats is not counted

So this matches my earlier prediction of “an equal additional size reduction”.

When I zip the original files it also gives a 63% size reduction (from 60k to 22k). So on the surface the compression is comparable. But of course I didn’t count the cost of the call / return commands, and zip also applies Huffman compression to further reduce the size. Still I think it’ll get pretty close and be a good compression.

Page 24/26
17 | 18 | 19 | 20 | 21 | 22 | 23 | | 25 | 26