wait 15 T-States between reading and writing to Vram

Página 1/6
| 2 | 3 | 4 | 5 | 6

Por norakomi

Paragon (1156)

imagem de norakomi

09-01-2022, 11:28

Hallo all !

Got a question about the mandatory wait states between reads and writes.
I read that you have to wait 15 T-states (on msx2, screen 5, while not on Vblank or when screen is disabled) in between reads and writes.

On http://map.grauw.nl/articles/vdp_tut.php I found this piece of code:
SetVdp_Write:
rlc h
rla
rlc h
rla
srl h
srl h
di
out (#99),a
ld a,14 + 128
out (#99),a
ld a,l
nop
out (#99),a
ld a,h
or 64
ei
out (#99),a
ret

So I see there is a nop placed, to wait in between the two out (#99),a instructions.
the ld a,l is only 4 T-states and the nop is also 4 T-states.
That means there are 8-states in total in between the two out (#99),a instructions.
But we have to have 15 T-states.....

That's when I realised that the out (#99),a are actually 3 instructions/cycles.
out (#99),a 3 cycles 11 Tstates(4,3,4)
The first (4 T-states) cycle and the second (3 T-states) cycle set the address, while only the third (4 T-states) cycle actually writes the data.

So if you have
out (#99),a 3 cycles 11 Tstates(4,3,4)
ld a,l 1 cycle 4 Tstates(4)
nop 1 cycle 4 Tstates(4)
out (#99),a 3 cycles 11 Tstates(4,3,4)

You do actually have exactly 15 T-states in between the write instructions.
ld a,l (4) + nop (4) + the first two cycles of out (#99),a (4,3) = 4 + 4 + 4 + 3 = 15 T-states.

I finally understand the required nop instruction now.
Please correct me if I'm wrong.

I wanted to optimise my own code, and removed the nop (because I didn't do the math, and didn't understand it properly).
On BlueMsx it worked fine, but then on OpenMsx it was a mess.

I realise that the required wait states are not emulated on Bluemsx.

ok, fine... Now comes my real question:

Up until now I wrote my Spat (sprite attribute table) like this:
ld hl,spat ;sprite attribute table
ld c,$98 ; port to write to
call outix128 ; 32 (sprites) * 4 bytes

now this is all good and well. Every sprite has 4 bytes in the spat:
y, x, pattern number, color code

I realised I never use the 3d and the fourth byte (I set them only once at initialisation), so I tried to speed up my routine, removed the 3d and 4th byte of each sprite in my spat in ram, and changed the routine into this:
ld hl,spat ;sprite attribute table
ld c,$98 ; port to write to
outi|outi|in a,($98)|in a,($98)
outi|outi|in a,($98)|in a,($98)
outi|outi|in a,($98)|in a,($98)
... 32 times

So basically the first outi is to write the y coordinate, the second outi is to write the x coordinate and then I use two times in a,($98) ONLY to increment the vram pointer. I have no interest in the value that is returned in a.

Once again... this piece of code works fine in BlueMsx, but OpenMsx says: Nah ah !!

So my question is, how exactly should I make the above code ?
Should it be this:
outi|outi|nop|in a,($98)|nop|in a,($98)
outi|outi|nop|in a,($98)|nop|in a,($98)
outi|outi|nop|in a,($98)|nop|in a,($98)
... 32 times

or this:
outi|outi|nop|in a,($98)|nop|nop|in a,($98)
outi|outi|nop|in a,($98)|nop|nop|in a,($98)
outi|outi|nop|in a,($98)|nop|nop|in a,($98)
... 32 times

and why ?

both of the above 2 examples work fine in OpenMsx, but the first one doesn't respect 15 T-states between the two in a,($98) instructions...

Please advise

Entrar ou registrar-se para comentar

Por aoineko

Paragon (1209)

imagem de aoineko

09-01-2022, 12:20

You don't use the right timing for your calculation.
MSX Z80 use more wait state (NOP is 5cc and OUT (n),A is 12cc for example).
Check : http://map.grauw.nl/resources/z80instr.php

OUTI is 18cc so your problem is not related to the 15cc limit of the screen mode 5.

Por gdx

Enlighted (6641)

imagem de gdx

09-01-2022, 12:41

Don't trust emulators (especially BlueMSX and others than OpenMSX), trust articles from Grauw's site.

Por aoineko

Paragon (1209)

imagem de aoineko

09-01-2022, 14:04

I don't understand why you use IN in your SAT copy routine.
If you absolutely do not want to copy the 4th byte, you could do something like this (untested):

   ld C, #0x98
   ld HL, #DATA_ADDR
.rept 31
   outi
   outi
   outi
   inc HL
.endm
   outi
   outi
   outi

EDIT: Not correct because we need to increment VRAM counter as well.

But for the little bit of cycle time you'll save, personally I'll use a much shorter version that copies the entire SAT at once (untested):

   ld C, #0x98
   ld HL, #DATA_ADDR
   ld B, 128 ; 32*4
   outir

Por norakomi

Paragon (1156)

imagem de norakomi

09-01-2022, 13:16

I use the IN to increase VRAM pointer, since for each sprite I don't need to out byte 3 and byte 4
My spat (in ram) is 32*2 bytes (only y + x)
So my remaining questions are:
1. What is the best way to wait in between two IN instructions?
Is it: |in a,($98)|nop|in a,($98) or is it |in a,($98)|nop|nop|in a,($98) or is there a better/faster way ?
2. Should I add wait instructions between an OUTI and an IN a,($98) (and vice versa) ?

Por aoineko

Paragon (1209)

imagem de aoineko

09-01-2022, 15:35

When you write (or read) sequentially to VDP, the destination address in VRAM is auto-incremented.

IN A,(nn) is used to read data from VDP; I don't see what you want to achieve with this instruction.

And I don't see a solution that allows to copy only the first 3 bytes of each SAT entry faster than copying everything at once.

   ld C, #0x98
   ld HL, #DATA_ADDR
   ld B, 128 ; 32*4
   outir

Or 127 OUTI. ^^
In both case, you are above the 15cc limit.

EDIT: Oh... OK, I see. You use dummy IN to increment VRAM address (nice trick).

.rept 31
    outi       ; 18cc
    outi       ; 18cc
    in a,($98) ; 12cc
    nop        ; 5cc
    in a,($98) ; 12cc
    nop        ; 5cc
.endr
    outi       ; 18cc
    outi       ; 18cc

Should work

Por norakomi

Paragon (1156)

imagem de norakomi

09-01-2022, 16:36

This code isn't working. It ruins all the sprites in the spat, showing total garbage in screen:
.rept 31
outi ; 18cc
outi ; 18cc
in a,($98) ; 12cc
nop ; 5cc
in a,($98) ; 12cc
nop ; 5cc
.endr

This however works fine:
.rept 31
outi ; 18cc
outi ; 18cc
nop ; 5cc
in a,($98) ; 12cc
nop ; 5cc
in a,($98) ; 12cc
.endr

So you DO need a nop between an outi and a in a,($98):
outi ; 18cc
nop ; 5cc
in a,($98) ; 12cc

But you do NOT need a nop between an in a,($98) and an outi:
in a,($98) ; 12cc
outi ; 18cc

And this is don't understand. Can anyone verify/explain this to me?
(I just wanna make sure I'm doing this correctly, and not run into strange random timing issues later on)

Por Metalion

Paragon (1639)

imagem de Metalion

09-01-2022, 17:14

norakomi wrote:

I use the IN to increase VRAM pointer

I don't know where you got that idea... You have set up the VDP to a write operation, so I think that that "IN" instruction will NOT increase VRAM pointer.

Por Arjan

Paladin (787)

imagem de Arjan

09-01-2022, 19:01

10 SCREEN 5
20 VPOKE 0,10:VPOKE 0,20:VPOKE 0,30
30 VPOKE 0,10
40 A=INP(&H98)
50 SCREEN 0
60 PRINT A

Prints 20 on my real NMS8250.

Por Grauw

Ascended (10862)

imagem de Grauw

09-01-2022, 19:07

norakomi wrote:

On http://map.grauw.nl/articles/vdp_tut.php I found this piece of code:

SetVdp_Write:
    rlc h
    rla
    rlc h
    rla
    srl h
    srl h
    di
    out (#99),a
    ld a,14 + 128
    out (#99),a
    ld a,l
    nop
    out (#99),a
    ld a,h
    or 64
    ei
    out (#99),a
    ret

So I see there is a nop placed, to wait in between the two out (#99),a instructions.

That nop there is unnecessary, it remains in that article from a time when I didn’t know better. VDP register access never needs any waits. Only VRAM access via I/O port 98H needs waits. I see the text already explains this, but I’ve also corrected the code example.

gdx wrote:

Don't trust emulators (especially BlueMSX and others than OpenMSX), trust articles from Grauw's site.

Actually, trust testing on real machines :). But openMSX is also quite trustworthy in this regard, their VDP timing implementation is thankfully quite accurate, and the two excellent VRAM timings articles on the MAP are by the hands of openMSX developers. Although I’m not sure how accurate the timing of interleaving IN and OUT to port 98H is emulated.

Anyway I recomment reading the VRAM timing articles: part 1, part 2.

Por Grauw

Ascended (10862)

imagem de Grauw

09-01-2022, 19:36

norakomi wrote:

So you DO need a nop between an outi and a in a,($98):

    outi       ; 18cc
    nop        ; 5cc
    in a,($98) ; 12cc

But you do NOT need a nop between an in a,($98) and an outi:

    in a,($98) ; 12cc
    outi       ; 18cc

And this is don't understand. Can anyone verify/explain this to me?

Look at when precisely the I/O is done within the instruction. For in and out (12 cycles), the /IORQ signal is active in cycles 9-12. For outi (18 cycles) the /IORQ signal is active in cycles 15-18. In other words, the actual I/O is the very last thing these instructions do.

So if you exclude the nop in the first example the time between the actual I/O requests (/IORQ signal) is 12 cycles. Adding the nop there makes it 18. Whereas in the second example the time between I/O requests is 18 cycles.

Página 1/6
| 2 | 3 | 4 | 5 | 6