Bresenham's Line Algorithm in Screen 2

Page 1/3
| 2 | 3

Par ARTRAG

Enlighted (6977)

Portrait de ARTRAG

24-10-2009, 07:28

hi
is there already an asm implementation optimized for screen 2 that exploits the fact that i can move up or down in the screen by adding or subtracting 32*8 to the current vram position in the screen?

those i see have on the web have generic x,y point function
this results in a high overhead...

!login ou Inscrivez-vous pour poster

Par ARTRAG

Enlighted (6977)

Portrait de ARTRAG

24-10-2009, 08:33

ti8X calculators have true bitmap graphic with 1 bpp
screen 2 is a bit messy wrt pure bit fields, due to the fact that it is a character mode
ultra optimized line algorithm for ti8x calculators need a lot of work to be adapted to screen 2
to take into account character boundaries

not willing to reinvent the wheel, maybe what i need is already there

ps
this seems the fastest i've found, but it is for ti and assumes a true bitmap screen 96 pixels wide

; Very Fast Line Drawing Routine - Improved version 2002/05/19
;
; by Patai Gergely
;
; patai.gergely@freemail.hu
; http://www.extra.hu/cobb (note: there is no TI stuff here)
; http://eclipse.sch.bme.hu/~cobb
;
; You are free to use the DrawLine routine in your own programs as
; long as you are kind enough and give credit. If you feel that you
; can make a better line drawing routine, I would be glad to hear
; from you, because I'm convinced that the current version cannot
; be made any faster without applying space-consuming modifications
; (unrolling, tables...). If you have any questions, I'm virtually
; always available via e-mail.

.NOLIST

_getkey		.equ	4CFEh
SAVESSCREEN	.equ	8265h

.org 9327h

 ld hl,SAVESSCREEN
 call ClearScreen
 ld de,$0000		; Hint: top left corner is at 0,0 and the bottom right at 95,63
 ld hl,$1008
 ld ix,SAVESSCREEN
 call DrawLine
 ld hl,SAVESSCREEN
 call FlipScreen
 call _getkey
 ret

ClearScreen:		; Clearing the virtual screen at HL (~6500 cycles)
 di			; Interrupts would be surprised by the new position of the stack...
 ld (DL_VLinc+1),sp	; Backing up SP to a safe place
 ld de,768
 add hl,de
 ld sp,hl
 ld hl,0
 ld b,48
CS_loop:
 push hl		; 16 bytes are cleared in 8*11+13=101 cycles
 push hl		; That would be 16*21=336 with LDIR
 push hl
 push hl
 push hl
 push hl
 push hl
 push hl
 djnz CS_loop
 ld sp,(DL_VLinc+1)
 ret

FlipScreen:		; Copies the screen at HL to the LCD in ~52000 cycles (ION FastCopy)
 di
 ld a,$80
 out ($10),a
 ld de,755
 add hl,de
 ld a,$20
 ld c,a
 inc hl
 dec hl
FS_column:
 ld b,64
 inc c
 ld de,-767
 out ($10),a
 add hl,de
 ld de,10
FS_inner:
 add hl,de
 inc hl
 inc hl
 inc de
 ld a,(hl)
 out ($11),a
 dec de
 djnz FS_inner
 ld a,c
 cp $2c
 jp nz,FS_column
 ret

DrawLine:		; This routine draws an unclipped line on an IX-pointed screen from (d,e) to (h,l)
 ld a,h			; Calculating delta X and swapping points if negative
 sub d			; (Lines are always drawn from left to right)
 jp nc,DL_okaydx
 ex de,hl
 neg
DL_okaydx:
 push af		; Saving DX (it will be popped into DE below)
 ld b,0			; Calculating the position of the first pixel to be drawn
 ld c,d			; IX+=D/8+E*12 (actually E*4+E*4+E*4)
 srl c
 srl c
 srl c
 add ix,bc
 ld c,e
 sla c
 sla c
 add ix,bc
 add ix,bc
 add ix,bc
 ld a,d			; Calculating the starting pixel mask
 ld c,$80
 and 7
 jp z,DL_okaymask
DL_calcmask:
 srl c
 dec a
 jp nz,DL_calcmask
DL_okaymask:
 ld a,l			; Calculating delta Y and negating the Y increment if necessary
 sub e			; This is the last instruction for which we need the original data
 ld hl,12
 jp nc,DL_okaydy
 ld hl,-12
 neg
DL_okaydy:
 pop de			; Recalling DX
 ld e,a			; D=DX, E=DY
 cp d
 jp c,DL_horizontal	; Line is rather horizontal than vertical
 ld (DL_VLinc+1),hl	; Modifying y increment
 push ix		; Loading IX to HL for speed; we don't need the old value of HL any more
 pop hl
 ld b,e			; Pixel counter
 inc b
 srl a			; Setting up gradient counter (A=E/2)
 ld (DL_HLinc+1),sp	; Backing up SP to a safe place
 di			; Interrupts are undesirable when we play around with SP :)
DL_VLinc:
 ld sp,0		; This value is replaced by +/- 12
DL_Vloop:
 ex af,af'		; Saving A to alternative register
 ld a,(hl)
 or c			; Writing pixel to current position
 ld (hl),a
 ex af,af'		; Recalling A (faster than push-pop, and there's no need for SP)
 add hl,sp
 sub d			; Handling gradient
 jp nc,DL_VnoSideStep
 rrc c			; Rotating mask
 jp nc,DL_VnoByte	; Handling byte boundary
 inc hl
DL_VnoByte:
 add a,e
DL_VnoSideStep:
 djnz DL_Vloop
 ld sp,(DL_HLinc+1)
 ret
DL_horizontal:
 ld (DL_HLinc+1),hl	; Modifying y increment
 push ix		; Loading IX to HL for speed; we don't need the old value of HL any more
 pop hl
 ld b,d			; Pixel counter
 inc b
 ld a,d			; Setting up gradient counter
 srl a
 ld (DL_VLinc+1),sp	; Backing up SP to a safe place
 di			; Interrupts again...
DL_HLinc:
 ld sp,0		; This value is replaced by +/- 12
DL_Hloop:
 ex af,af'		; Saving A to alternative register
 ld a,(hl)
 or c			; Writing pixel to current position
 ld (hl),a
 ex af,af'		; Recalling A
 rrc c			; Rotating mask
 jp nc,DL_HnoByte	; Handling byte boundary
 inc hl
DL_HnoByte:
 sub e			; Handling gradient
 jp nc,DL_HnoSideStep
 add hl,sp
 add a,d
DL_HnoSideStep:
 djnz DL_Hloop
 ld sp,(DL_VLinc+1)
 ret

.end

Par PingPong

Enlighted (4156)

Portrait de PingPong

24-10-2009, 14:08

A good line drawing algo for msx should take in account if the address of byte is changing avoiding to access the same byte more and more to alter bits, but instead buffering i ram this byte and accessing VRAM only when needed.

For example, assuming one will draw a horizontal line of 96px:
A normal algo will access 12 bytes of vram x 8 times, so set all bits to '1'.
An optimized msx version should realize that the address of the byte change every 8 accesses and access vram only 12 times.
may be the counters of breshenham could be used to detect when the address of byte changes.

Par ARTRAG

Enlighted (6977)

Portrait de ARTRAG

24-10-2009, 14:45

this is second order, wrt the fact that standard drawing algorithms need a x,y to vram address conversion for any point in the line

Par Yukio

Paragon (1540)

Portrait de Yukio

24-10-2009, 14:56

I think that this should be one of the factors that "slow down" MSX1 games. It is a little easier to trace lines into pure bitmap modes (like the ones on MSX2 VDP), sure that hardware lines (a type of hardware acceleration) should be cool! After all, worked into the hardware sprites. Even if into the MSX case their are unable to display a entire line into most modes (except maybe Screen 3) ...

Pattern mode is good for pre-calculated stuff, not so good for real time computation! "Sequential" access to the RAM (trough I/O ports) is not so bad either ...

Par ARTRAG

Enlighted (6977)

Portrait de ARTRAG

25-10-2009, 11:48

Come on, anyone willing to share his screen 2 code?
In exchange I could release something nice
dunno, Raytracing demo sources, MOAM development tools...
erh... no this latter not yet
:-)

Par PingPong

Enlighted (4156)

Portrait de PingPong

25-10-2009, 14:18

Come on, anyone willing to share his screen 2 code?
In exchange I could release something nice
dunno, Raytracing demo sources, MOAM development tools...
erh... no this latter not yet
:-)

ARTRAG: you've got an email

Par Heca

Rookie (21)

Portrait de Heca

25-10-2009, 15:08

At first glance, I'll say the Ti-8x algo is pretty good.
It's seems not so difficult to adapt.

Replace all the 12 const values by 32. It should work as-is.

The only part you can optimize is when drawing horizontal lines (starting at DL_horizontalSmile
When HL doesn't vary you don't have to reload the corresponding byte from the VDP.

Par ARTRAG

Enlighted (6977)

Portrait de ARTRAG

25-10-2009, 15:18

Heca, sorry, Screen 2 isn't a true bitmap mode but has patterns to be considered
each pattern takes 8 bytes and is 8x8 wide
to draw an horizontal line you need to access at vram adddresse

vram, vram+8,vram+16,vram+24 etc etc

On Ti-8x the same line would need to access addresses

ram,ram+1,ram+2,ram+3 etc etc

Par Heca

Rookie (21)

Portrait de Heca

25-10-2009, 15:52

Ok but this algo already handles patterns.
Instead of having 8x8 patterns, it handles 8x1 patterns.

So, to addapt what i said.

Replace all the 12 consts by 32*8=256
Replace all the (inc HL) instructions by HL=HL+8

(HL=HL+8 becomes tricky to do because there is no more free registers to do it fast)

Par ARTRAG

Enlighted (6977)

Portrait de ARTRAG

25-10-2009, 16:44

no way my friend
the arrangement in vram is totally different

according to conditions

y&7!=0
and
x&7!=0

the code should decide if add to vram addr

+/- 1
or
+/- (256-8)
or
nothing

do you see ?

Page 1/3
| 2 | 3