Pointless ASM

Page 1/2
| 2

Par NYYRIKKI

Enlighted (6091)

Portrait de NYYRIKKI

28-05-2012, 21:39

Today I thought about general 8bit * 8bit multiplication routine... I'm propably 1000th person to do that or so, but I came up with this 223 clock version that needs 291 bytes of memory. Since I wrote it I may just as well share it although it might be a bit pointless.

It is not fastest and not smallest, but somewhere in between... At least it looks funny Tongue

        DB #FE  ;BASIC Header
        DW BEGIN,END-1,START

        ORG #C000

BEGIN:	; 16*16 table
	DB #00,#00,#00,#00,#00,#00,#00,#00,#00,#00,#00,#00,#00,#00,#00,#00
	DB #00,#01,#02,#03,#04,#05,#06,#07,#08,#09,#0A,#0B,#0C,#0D,#0E,#0F
	DB #00,#02,#04,#06,#08,#0A,#0C,#0E,#10,#12,#14,#16,#18,#1A,#1C,#1E
	DB #00,#03,#06,#09,#0C,#0F,#12,#15,#18,#1B,#1E,#21,#24,#27,#2A,#2D
	DB #00,#04,#08,#0C,#10,#14,#18,#1C,#20,#24,#28,#2C,#30,#34,#38,#3C
	DB #00,#05,#0A,#0F,#14,#19,#1E,#23,#28,#2D,#32,#37,#3C,#41,#46,#4B
	DB #00,#06,#0C,#12,#18,#1E,#24,#2A,#30,#36,#3C,#42,#48,#4E,#54,#5A
	DB #00,#07,#0E,#15,#1C,#23,#2A,#31,#38,#3F,#46,#4D,#54,#5B,#62,#69
	DB #00,#08,#10,#18,#20,#28,#30,#38,#40,#48,#50,#58,#60,#68,#70,#78
	DB #00,#09,#12,#1B,#24,#2D,#36,#3F,#48,#51,#5A,#63,#6C,#75,#7E,#87
	DB #00,#0A,#14,#1E,#28,#32,#3C,#46,#50,#5A,#64,#6E,#78,#82,#8C,#96
	DB #00,#0B,#16,#21,#2C,#37,#42,#4D,#58,#63,#6E,#79,#84,#8F,#9A,#A5
	DB #00,#0C,#18,#24,#30,#3C,#48,#54,#60,#6C,#78,#84,#90,#9C,#A8,#B4
	DB #00,#0D,#1A,#27,#34,#41,#4E,#5B,#68,#75,#82,#8F,#9C,#A9,#B6,#C3
	DB #00,#0E,#1C,#2A,#38,#46,#54,#62,#70,#7E,#8C,#9A,#A8,#B6,#C4,#D2
	DB #00,#0F,#1E,#2D,#3C,#4B,#5A,#69,#78,#87,#96,#A5,#B4,#C3,#D2,#E1

START:
       LD HL,TEST
       LD (#F39A),HL ;DEFUSR
       RET
TEST:
       LD HL,#F7F9 ;USR +1
       LD A,(HL)
       DEC L

MULT8:
       ; 8bit * 8bit multiplication: (HL)*A
       ; 16bit out: (HL)
       ; Changes: AF,DE,BC

       LD D,BEGIN/256
       RLD
       LD C,(HL)
       EX DE,HL
       LD L,A
       LD B,(HL)
       LD L,C
       LD C,(HL)
       EX DE,HL
       RLD
       LD E,A
       LD A,(DE)
       LD E,(HL)
       EX DE,HL
       ADD A,(HL)
       EX DE,HL
       LD (HL),A
       SBC A,A
       AND 16
       RLD
       LD D,A
       LD A,C
       ADD A,(HL)
       LD (HL),A
       LD A,B
       ADC A,D
       INC HL
       LD (HL),A
       DEC HL
       RET
END:
!login ou Inscrivez-vous pour poster

Par ARTRAG

Enlighted (6976)

Portrait de ARTRAG

29-05-2012, 00:57

I think you are the 1000000th person who did that...
So what is the idea behind?

Par NYYRIKKI

Enlighted (6091)

Portrait de NYYRIKKI

29-05-2012, 02:01

Idea was that if I use nibbles instead of bits, I don't need to loop & add so much. Only few simple 8/9bit additions from table are enough... Well.. ended up to be long code anyway...

Math is:
B=Hi1*Hi2
C=Lo1*Lo2
result=(Hi1*Lo2+Hi2*Lo1)*16+BC

Par Heca

Rookie (21)

Portrait de Heca

29-05-2012, 23:26

Your algorithm is bad in many ways.
Why don't you search for knowledgeable sources instead ?
http://baze.au.com/misc/z80bits.html
EDIT: I say it's bad because you're an experienced developer and you should not be that lazy

Par NYYRIKKI

Enlighted (6091)

Portrait de NYYRIKKI

30-05-2012, 05:55

Problem is not that I don't know, I just wanted to test a new approach -> didn't work too well... This is why the title says "Pointless ASM"...

Par Manuel

Ascended (19677)

Portrait de Manuel

30-05-2012, 10:11

Those who never try, will never fail, but will also never succeed!

Par ARTRAG

Enlighted (6976)

Portrait de ARTRAG

30-05-2012, 15:15

Interesting:

- the 8x8 bit unrolled multiplication (1.1 at z80bits) is 33 bytes and costs between 284-247 CPU cycles (msx wait time included )
- the code by posted by NYYRIKKI costs 223 CPU cycles (msx wait time included ) and 35 bytes + 256 bytes of table

So NYYRIKKI 's code is faster (anyway the unrolled multiplication at z80bits is the best trade off I know between speed and size)

Par WORP3

Paladin (864)

Portrait de WORP3

30-05-2012, 16:57

As pointless it maybe seems to be, it's the journey to the solution that counts Wink
It's always easy to disapprove some code while you're not making some effort yourself !

I like the attempt and who knows someone will actually use it for some undefined reason Big smile

Par Heca

Rookie (21)

Portrait de Heca

30-05-2012, 22:30

ARTRAG, to be fair you should compare a table driven approach. IIRC, the square-root table algo is definitivly faster.

Par JohnHassink

Ambassador (5684)

Portrait de JohnHassink

31-05-2012, 07:20

Manuel wrote:

Those who never try, will never fail, but will also never succeed!

That is so zen! I love it. Smile

Par hit9918

Prophet (2932)

Portrait de hit9918

31-05-2012, 18:24

Even when something doesn't work out, it can contain a piece of puzzle for future hacking.

The z80 is the hardest cpu when you want speed.
Many registers, each one special, sometimes try 8bit vs 16bit ops.
It wants lot puzzle exercise Tongue

I found useful the function HL = HL + DE * A. I.e. also an addition in there.
Load HL with array base address, DE with index, A with element size.
When enumerating charset vram addresses, actually DE is the element size of 2k, taking 16bit.

Page 1/2
| 2