VDP timings

By pitpan

Prophet (3152)

pitpan's picture

22-06-2003, 17:34

Hi!

I want to know if you can improve this copy routine from RAM to VRAM made for MSX1 (that's it TMS9918 and compatibles). My goal is speed as well as flexibility

; Parameters:
; HL: source address (RAM)
; DE: destiny address (VRAM)
; B: number of 8 byte blocks to be copied
RAM2VRAM:
ld a,e
di
out (99h),a
ld a,d
or a,40h
out (99h),a
ei
ld c,98h
LOOP:
ld d,b
outi
outi
outi
outi
outi
outi
outi
outi
ld b,d
djnz LOOP
ret

It's really easy and quite useful. It's goal is to copy in a flexible-enough routine the maximum of data during V-Blank.
The main loop is 4x2+16x8+13x1=149 T-states long. So it takes 149 T-states to copy a full 8-byte block and restart the loop.
The v-blank period is 4300 microsecs long (at 60 Hz). So, making some numbers, the maximum 8-byte blocks that can be copied using this routine, if my timings and calculations are right, is 103. Getting round numbers, let's say 100 because of the overhead of the begining of the routine. That is only 100x8=800 bytes of data copied from RAM to VRAM.
Any other interesting idea? I know that this routine is fast enough to copy a whole NAME TABLE in graphic mode I or II (screen 1 or screen 2). It's not a very high bandwith.

Any suggestions?

Kind regards,

Edu R.

Login or register to post comments

By pitpan

Prophet (3152)

pitpan's picture

22-06-2003, 19:51

WYZ suggested in HispaMSX mailing list the following modification:

; Parameters:
; HL: source address (RAM)
; DE: destiny address (VRAM)
; B: number of 8 byte blocks to be copied
RAM2VRAM:
ld a,e
di
out (99h),a
ld a,d
or a,40h
out (99h),a
ei
ld c,98h
LOOP:
ld d,b
outi ; now only 7xOUTI
outi
outi
outi
outi
outi
outi
ld b,d
outi
jp nz,LOOP ; faster, takes 10T
ret

So now the main loop takes 146 T-states. Correct me if I am wrong. My doc about opcode timings is not very good. 105 8-byte blocks can be copied this way during v-blank period (840 bytes).

By Grauw

Ascended (10578)

Grauw's picture

22-06-2003, 21:58

; Parameters:
; HL: source address (RAM)
; DE: destiny address (VRAM)
; BC: number bytes to be copied
RAM2VRAM:
ld a,e
di
out (99h),a
ld a,d
or a,40h
ei
out (99h),a

dec bc ;Number of loops is in BC
inc c ;Calculate 'fast loop' value
inc b
ld a,c
neg ;Jump value inside outi array
and 16-1
add a,a
ld (SelfM+1),a
ld a,b
ld b,c
ld c,98h
SelfM:
jr $
Loop:
REPEAT 16
outi
ENDR
jp nz,Loop
dec a
jp nz,Loop
ret

With this you can copy any number of bytes from the RAM to the VRAM. Only the first loop takes some additional calculation time, but the rest of the iterations are very fast. Also, the loop can be easily unrolled more by replacing both 16's with for example 32, and that will make it even a bit faster.

Check out http://map.tni.nl/?p=articles/fast_loops.html

~Grauw

By pitpan

Prophet (3152)

pitpan's picture

22-06-2003, 23:06

...The problem is that I do need to deal with 8 byte precission, not more, not less.
Anyway, the basic idea of using a serie of OUTIs and not OUTIR was taken from your excelent article about fast-loops in MAP, as you pointed.

Kind regards,

Ed Robsy

By NYYRIKKI

Enlighted (5918)

NYYRIKKI's picture

26-06-2003, 17:23

Here is my suggestion:

ld a,b
di
ld c,#99
out (c),e
set 6,d
out (c),d
ei
dec c
loop:
outi
outi
outi
outi
outi
outi
outi
outi
dec a
jp nz,loop
ret

... and if you want some more speed you can expand the loop like this:

loop:
outi
outi
outi
outi
outi
outi
outi
outi
dec a
ret z
outi
outi
outi
outi
outi
outi
outi
outi
dec a
ret z
outi
outi
outi
outi
outi
outi
outi
outi
dec a
jp nz,loop
ret

By NYYRIKKI

Enlighted (5918)

NYYRIKKI's picture

26-06-2003, 21:23

Grauw, you can still optimize your routine at least 8 T States... Smile

Change:

dec bc ;Number of loops is in BC
inc c ;Calculate 'fast loop' value
inc b
ld a,c
neg ;Jump value inside outi array

TO:

dec bc ;Number of loops is in BC
inc b ;Calculate 'fast loop' value
ld a,c
cpl ;Jump value inside outi array

If memory is not an issue and you need move really large blocks with maximum speed, then here is a extreme version of Grauw's routine. Smile (I don't really think, that this is usefull, but anyway...)

; Parameters:
; HL: source address (RAM)
; DE: destiny address (VRAM)
; BC: number bytes to be copied

LD A,C
EXX
LD H,0
LD DE,LOOP
NEG
LD L,A
ADD HL,HL
ADD HL,DE
PUSH HL
EXX
LD A,B
INC A
DI
LD C,#99
OUT (C),E
SET 6,D
OUT (C),D
EI
DEC C
RET

LOOP:
REPEAT 256
OUTI
ENDR
DEC A
JP NZ,LOOP
RET

By Grauw

Ascended (10578)

Grauw's picture

29-06-2003, 18:46

Grauw, you can still optimize your routine at least 8 T States... Smile
Heh, it was a quickie I wrote it in notepad... Never really tried or used it. So forgive me Smile. Still, I think I can improve the examples in the MAP article with this a little aswell.

If memory is not an issue and you need move really large blocks with maximum speed, then here is a extreme version of Grauw's routine. Smile (I don't really think, that this is usefull, but anyway...)
Cool. But ah, yes, in my routine you can unroll it up till 128 loops, and that's already pushing it speed-wise (there isn't much gain). By the way, if you're into memory wasting, you might aswell align the OTIR array on a 256-byte boundary, which should make some calculations a little faster Smile. Still a nice way of jumping inside the OTIR array though Smile.

~Grauw