suggestion on optimization

Page 4/5
1 | 2 | 3 | | 5

By Grauw

Ascended (8388)

Grauw's picture

08-04-2019, 21:29

ricbit wrote:

This was a mix of superoptimization and coding by hand (superopt currently can't deal with the zero flag).

I remember the superoptimizador… that thing is cool! I imagine on today’s PCs, and with multithreading (?), it runs a lot quicker now? I liked that overview page where you used to have, with some example optimisation problems passed through it. Would be neat to see that resurrected and expanded!

By ricbit

Champion (437)

ricbit's picture

08-04-2019, 21:37

That page still lives: Superopt

These days not only we have more threads, but also more memory, I could cache intermediate results. Guess I'll get back to that project, it may be more useful now.

By ARTRAG

Enlighted (6238)

ARTRAG's picture

08-04-2019, 23:21

Humm, I get a mess... did I correctly apply your patch ?


	struct sat
y		db	0
x		db	0
f		db	0
c		db	0
	ends


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;
;	plot enemies and bullets if visible in the current SAT in ram
;
;	depends on xmap,ymap

_plot_enemy:

	ld	iy,(alt_ram_sat)
	ld	ix,enemies 
	ld	bc,(max_enem + max_plyr_bullets + max_enem_bullets)*256+0
	
	ld	hl,-128
	ld	de,(ymap)
	and a
	sbc	hl,de
	ld	(tempy),hl

	ld	hl,(xmap)
	ld	de,-32
	add	hl,de
	ld	(tempx),hl

.npc_loop1:
	bit 0,(ix+enemy_data.status)
	jp	z,.invisible

	ld	l,(ix+enemy_data.y+0)
	ld	h,(ix+enemy_data.y+1)
	ld	de,(tempy)
	
	add	hl,de			; hl = enemy.y - (ymap + 128)
	ld	de,128+16		; hl = enemy.y - (ymap + 128) + 128 + 16 >=0 
	add	hl,de			; hl = enemy.y - ymap + 16 >=0
	jr	nc,.invisible	; !(-16 <= enemy.y - ymap < 128)

	ld	a,l
	add	a,64-16			; a = enemy.y - ymap + 64	
	ld	(iy+sat.y+0),a
	ld	(iy+sat.y+4),a	; not needed if single layer but in this way it is overall faster 
	
	ld	l,(ix+enemy_data.x+0)
	ld	h,(ix+enemy_data.x+1)
	ld	de,(tempx)
						; CF is reset by previous add
	sbc hl,de			; hl = enemy.x + 32 - xmap < 0
	jp	m,.invisible	; hl <0  <==> dx = enemy.x - xmap < -32
	
    ld   a, l                      ; 5
    sub  32                        ; 8
    ld   e, a                      ; 5 
    ld   a, h                      ; 5 
    sbc  a, 0                      ; 8
    jr   c,.has_ec                 ; 13/8
    jr   nz,.invisible             ; 13/8

    ld   l, e                      ; 5

.has_ec:
    and  128                       ; 8
    or   (ix+enemy_data.color)     ; 21
    ld   e, a                      ; 5
	
	ld	a,(ix+enemy_data.frame)
	ld	(iy+sat.x),l				; write X
	ld	(iy+sat.f),a				; write shape
	ld	(iy+sat.c),e				; write colour
	ld	(ix+enemy_data.plane),c		; save SAT plane
	inc c
	set 7,(ix+enemy_data.status)	; set it as visible
	cp	16*4						; hard coded in the SPT
	jp	nc,.two_layers

.one_layer:

	ld	e,sat
	add iy,de
	jp 	.next
	
.invisible
	res 7,(ix+enemy_data.status)	; set it as invisible
		
.next:
	ld	de,enemy_data
	add ix,de
	djnz	.npc_loop1

	ld	a,c
	ld	(alt_visible_sprts),a
	ret
	
.two_layers:
	
	ld	(iy+sat.x+4),l				; second layer X
	add	a,4
	ld	(iy+sat.f+4),a				; second layer shape
	ld	a,e
	and 0xF0
	inc	a							; second layer is always black
	ld	(iy+sat.c+4),a	
	inc c
	ld	e,2*sat
	add iy,de
	jp 	.next

By ricbit

Champion (437)

ricbit's picture

09-04-2019, 00:24

I think the first jump before the patch should be jp c instead of jp m, also the code expects to have xmap-32 in tempx, instead of 32-xmap. (I was working from your code before bore's patch).

By santiontanon

Paladin (824)

santiontanon's picture

09-04-2019, 18:10

and speaking of the superoptimizador, the link you posted has a description and examples of it, but I do not see the actual optimizer. Is it available online anywhere? I have been considering creating something like that myself for a while! Big smile

By hit9918

Prophet (2866)

hit9918's picture

09-04-2019, 19:04

but in RAM ops the Akku is the better than HL!

21	ld	l,(ix+enemy_data.y+0)
21	ld	h,(ix+enemy_data.y+1)
22	ld	de,(tempy)
	
12	add	hl,de			; hl = enemy.y - (ymap + 128)
11	ld	de,128+16		; hl = enemy.y - (ymap + 128) + 128 + 16 >=0 
12	add	hl,de			; hl = enemy.y - ymap + 16 >=0
8	jr	nc,.invisible	; !(-16 <= enemy.y - ymap < 128)
--
107
14      ld a,(tempy+0)
21      add (ix+enemy_data.y+0)
5       ld e,a
14      ld a,(tempy+1)
21      adc (ix+enemy_data.y+1)
8       jr nz,.invisible                 ;high byte not 0 => outside 8bit window
5       ld a,e
8       cp 128+16
8       jr nc,.invisible
--
104

even more, HL is still free to use! that can save dozen cycles somewhere else.

the 16bit version looks like it is faster and was more easy to develop, but the opposite is the case.

By ricbit

Champion (437)

ricbit's picture

09-04-2019, 19:22

santiontanon wrote:

and speaking of the superoptimizador, the link you posted has a description and examples of it, but I do not see the actual optimizer. Is it available online anywhere? I have been considering creating something like that myself for a while! Big smile

Sure, it's on my github.

By ARTRAG

Enlighted (6238)

ARTRAG's picture

10-04-2019, 08:58

Your work on optimization via brute force research is fascinating.
How do you describe the algorithm you want the program to encode?

By ricbit

Champion (437)

ricbit's picture

11-04-2019, 02:32

The desired function is encoded using standard C:

unsigned char final2 (unsigned char a, unsigned char h) {
  signed short x = (signed short)(((unsigned short)h << 8) | a);
  return x > 256+32;
}

By ARTRAG

Enlighted (6238)

ARTRAG's picture

11-04-2019, 22:09

ricbit wrote:

I think the first jump before the patch should be jp c instead of jp m, also the code expects to have xmap-32 in tempx, instead of 32-xmap. (I was working from your code before bore's patch).

I've tested you patch with xmap-32 in tempx and with jp c
It seems to work when X in in 128-256, but not outside that interval
if I use jp m, it seems to work for x<256 but not for larger values

Page 4/5
1 | 2 | 3 | | 5