# wait 15 T-States between reading and writing to Vram

Pagina 1/6
| 2 | 3 | 4 | 5 | 6

Hallo all !

Got a question about the mandatory wait states between reads and writes.
I read that you have to wait 15 T-states (on msx2, screen 5, while not on Vblank or when screen is disabled) in between reads and writes.

On http://map.grauw.nl/articles/vdp_tut.php I found this piece of code:
SetVdp_Write:
rlc h
rla
rlc h
rla
srl h
srl h
di
out (#99),a
ld a,14 + 128
out (#99),a
ld a,l
nop
out (#99),a
ld a,h
or 64
ei
out (#99),a
ret

So I see there is a nop placed, to wait in between the two out (#99),a instructions.
the ld a,l is only 4 T-states and the nop is also 4 T-states.
That means there are 8-states in total in between the two out (#99),a instructions.
But we have to have 15 T-states.....

That's when I realised that the out (#99),a are actually 3 instructions/cycles.
out (#99),a 3 cycles 11 Tstates(4,3,4)
The first (4 T-states) cycle and the second (3 T-states) cycle set the address, while only the third (4 T-states) cycle actually writes the data.

So if you have
out (#99),a 3 cycles 11 Tstates(4,3,4)
ld a,l 1 cycle 4 Tstates(4)
nop 1 cycle 4 Tstates(4)
out (#99),a 3 cycles 11 Tstates(4,3,4)

You do actually have exactly 15 T-states in between the write instructions.
ld a,l (4) + nop (4) + the first two cycles of out (#99),a (4,3) = 4 + 4 + 4 + 3 = 15 T-states.

I finally understand the required nop instruction now.
Please correct me if I'm wrong.

I wanted to optimise my own code, and removed the nop (because I didn't do the math, and didn't understand it properly).
On BlueMsx it worked fine, but then on OpenMsx it was a mess.

I realise that the required wait states are not emulated on Bluemsx.

ok, fine... Now comes my real question:

Up until now I wrote my Spat (sprite attribute table) like this:
ld hl,spat ;sprite attribute table
ld c,\$98 ; port to write to
call outix128 ; 32 (sprites) * 4 bytes

now this is all good and well. Every sprite has 4 bytes in the spat:
y, x, pattern number, color code

I realised I never use the 3d and the fourth byte (I set them only once at initialisation), so I tried to speed up my routine, removed the 3d and 4th byte of each sprite in my spat in ram, and changed the routine into this:
ld hl,spat ;sprite attribute table
ld c,\$98 ; port to write to
outi|outi|in a,(\$98)|in a,(\$98)
outi|outi|in a,(\$98)|in a,(\$98)
outi|outi|in a,(\$98)|in a,(\$98)
... 32 times

So basically the first outi is to write the y coordinate, the second outi is to write the x coordinate and then I use two times in a,(\$98) ONLY to increment the vram pointer. I have no interest in the value that is returned in a.

Once again... this piece of code works fine in BlueMsx, but OpenMsx says: Nah ah !!

So my question is, how exactly should I make the above code ?
Should it be this:
outi|outi|nop|in a,(\$98)|nop|in a,(\$98)
outi|outi|nop|in a,(\$98)|nop|in a,(\$98)
outi|outi|nop|in a,(\$98)|nop|in a,(\$98)
... 32 times

or this:
outi|outi|nop|in a,(\$98)|nop|nop|in a,(\$98)
outi|outi|nop|in a,(\$98)|nop|nop|in a,(\$98)
outi|outi|nop|in a,(\$98)|nop|nop|in a,(\$98)
... 32 times

and why ?

both of the above 2 examples work fine in OpenMsx, but the first one doesn't respect 15 T-states between the two in a,(\$98) instructions...

Aangemeld of registreer om reacties te plaatsen

You don't use the right timing for your calculation.
MSX Z80 use more wait state (`NOP` is 5cc and `OUT (n),A` is 12cc for example).
Check : http://map.grauw.nl/resources/z80instr.php

`OUTI` is 18cc so your problem is not related to the 15cc limit of the screen mode 5.

Don't trust emulators (especially BlueMSX and others than OpenMSX), trust articles from Grauw's site.

I don't understand why you use `IN` in your SAT copy routine.
If you absolutely do not want to copy the 4th byte, you could do something like this (untested):

```   ld C, #0x98
.rept 31
outi
outi
outi
inc HL
.endm
outi
outi
outi
```

EDIT: Not correct because we need to increment VRAM counter as well.

But for the little bit of cycle time you'll save, personally I'll use a much shorter version that copies the entire SAT at once (untested):

```   ld C, #0x98
ld B, 128 ; 32*4
outir
```

I use the IN to increase VRAM pointer, since for each sprite I don't need to out byte 3 and byte 4
My spat (in ram) is 32*2 bytes (only y + x)
So my remaining questions are:
1. What is the best way to wait in between two IN instructions?
Is it: |in a,(\$98)|nop|in a,(\$98) or is it |in a,(\$98)|nop|nop|in a,(\$98) or is there a better/faster way ?
2. Should I add wait instructions between an OUTI and an IN a,(\$98) (and vice versa) ?

When you write (or read) sequentially to VDP, the destination address in VRAM is auto-incremented.

`IN A,(nn)` is used to read data from VDP; I don't see what you want to achieve with this instruction.

And I don't see a solution that allows to copy only the first 3 bytes of each SAT entry faster than copying everything at once.

```   ld C, #0x98
ld B, 128 ; 32*4
outir
```

Or 127 `OUTI`. ^^
In both case, you are above the 15cc limit.

EDIT: Oh... OK, I see. You use dummy `IN` to increment VRAM address (nice trick).

```.rept 31
outi       ; 18cc
outi       ; 18cc
in a,(\$98) ; 12cc
nop        ; 5cc
in a,(\$98) ; 12cc
nop        ; 5cc
.endr
outi       ; 18cc
outi       ; 18cc
```

Should work

This code isn't working. It ruins all the sprites in the spat, showing total garbage in screen:
.rept 31
outi ; 18cc
outi ; 18cc
in a,(\$98) ; 12cc
nop ; 5cc
in a,(\$98) ; 12cc
nop ; 5cc
.endr

This however works fine:
.rept 31
outi ; 18cc
outi ; 18cc
nop ; 5cc
in a,(\$98) ; 12cc
nop ; 5cc
in a,(\$98) ; 12cc
.endr

So you DO need a nop between an outi and a in a,(\$98):
outi ; 18cc
nop ; 5cc
in a,(\$98) ; 12cc

But you do NOT need a nop between an in a,(\$98) and an outi:
in a,(\$98) ; 12cc
outi ; 18cc

And this is don't understand. Can anyone verify/explain this to me?
(I just wanna make sure I'm doing this correctly, and not run into strange random timing issues later on)

norakomi wrote:

I use the IN to increase VRAM pointer

I don't know where you got that idea... You have set up the VDP to a write operation, so I think that that "IN" instruction will NOT increase VRAM pointer.

```10 SCREEN 5
20 VPOKE 0,10:VPOKE 0,20:VPOKE 0,30
30 VPOKE 0,10
40 A=INP(&H98)
50 SCREEN 0
60 PRINT A
```

Prints 20 on my real NMS8250.

norakomi wrote:

On http://map.grauw.nl/articles/vdp_tut.php I found this piece of code:

```SetVdp_Write:
rlc h
rla
rlc h
rla
srl h
srl h
di
out (#99),a
ld a,14 + 128
out (#99),a
ld a,l
nop
out (#99),a
ld a,h
or 64
ei
out (#99),a
ret
```

So I see there is a nop placed, to wait in between the two out (#99),a instructions.

That `nop` there is unnecessary, it remains in that article from a time when I didn’t know better. VDP register access never needs any waits. Only VRAM access via I/O port 98H needs waits. I see the text already explains this, but I’ve also corrected the code example.

gdx wrote:

Don't trust emulators (especially BlueMSX and others than OpenMSX), trust articles from Grauw's site.

Actually, trust testing on real machines :). But openMSX is also quite trustworthy in this regard, their VDP timing implementation is thankfully quite accurate, and the two excellent VRAM timings articles on the MAP are by the hands of openMSX developers. Although I’m not sure how accurate the timing of interleaving IN and OUT to port 98H is emulated.

Anyway I recomment reading the VRAM timing articles: part 1, part 2.

norakomi wrote:

So you DO need a `nop` between an `outi` and a `in a,(\$98)`:

```    outi       ; 18cc
nop        ; 5cc
in a,(\$98) ; 12cc```

But you do NOT need a `nop` between an `in a,(\$98)` and an `outi`:

```    in a,(\$98) ; 12cc
outi       ; 18cc```

And this is don't understand. Can anyone verify/explain this to me?

Look at when precisely the I/O is done within the instruction. For `in` and `out` (12 cycles), the /IORQ signal is active in cycles 9-12. For `outi` (18 cycles) the /IORQ signal is active in cycles 15-18. In other words, the actual I/O is the very last thing these instructions do.

So if you exclude the `nop` in the first example the time between the actual I/O requests (/IORQ signal) is 12 cycles. Adding the `nop` there makes it 18. Whereas in the second example the time between I/O requests is 18 cycles.

Pagina 1/6
| 2 | 3 | 4 | 5 | 6