VATT - VRAM Access Timing Tester

Página 1/16
| 2 | 3 | 4 | 5 | 6

Por aoineko

Paragon (1138)

imagem de aoineko

06-04-2023, 13:58

Hello,
When I discovered that the VRAM access timing given on the MSX Assembly Page website that I used as a reference to build my MSXgl library, did not match what I observed on my real MSX, I wanted to know more.
So I created a tool to test a series of VRAM copy functions with different speed that I tested on the different display modes of the MSX.
The results confirm that the reference does not seem to be correct under certain conditions.

Example with FS-A1 emualted on openMSX (this emulator and Emulicious are not totally accurate, but this is an other story):

Before sharing the tool with the community so that we can test it on as many real MSX as possible, I would like to confirm the method with you to verify that the numbers are correct.

The method is simple:
- Write to a fixed address in VRAM a series of 256 characters (empty circle) with a slow copy function (29 t-states).
- Then copy to the same address a series of 256 characters (filled circle) with the speed function to test. For example, the fastest function is just a sequence of 256 out (n),a which is supposed to take 12 t-states.
- Count (with slow reading function) the number of filled circle compare to the number of empty circle and compute a percentage.
- (optional) Repeat this test 16 times and keep the min/max and the average values.

Here are the different functions that can be tested with their corresponding speed (I would like to check this point):

;---------------------------------
; Function Fill_12
; ---------------------------------
_Fill_12::
	.rept	256
	out	(0x98), a
	.endm
	ret
;---------------------------------
; Function Fill_14
; ---------------------------------
_Fill_14::
	ld	c, #0x98
	.rept	256
	out	(c), a
	.endm
	ret
;---------------------------------
; Function Fill_17
; ---------------------------------
_Fill_17::
	.rept	256
	out	(0x98), a
	nop
	.endm
	ret
;---------------------------------
; Function Fill_19
; ---------------------------------
_Fill_19::
	ld	c, #0x98
	.rept	256
	out	(c), a
	nop
	.endm
	ret
;---------------------------------
; Function Fill_20
; ---------------------------------
_Fill_20::
	.rept	256
	out	(0x98), a
	or	#0
	.endm
	ret
;---------------------------------
; Function Fill_22
; ---------------------------------
_Fill_22::
	.rept	256
	out	(0x98), a
	nop
	nop
	.endm
	ret
;---------------------------------
; Function Fill_29
; ---------------------------------
_Fill_29::
	ld	b, #0
fill29:
	out	(0x98), a
	or	#0
	djnz	fill29
	ret

Does this sound right to you?
The idea was to have speeds below and above the different timings given on MSX Assembly Page:

// Screen	VDP		MSX1		MSX2/2+
// mode		mode		cycles		cycles
//--------------------------------------------------------------------------
// 0,W40	T1		12		20
// 0,W80	T2				20
// 1		G1		29		15
// 2		G2		29		15
// 3		MC		13		15
// 4		G3				15
// 5		G4				15
// 6		G5				15
// 7		G6				15
// 8		G7				15

Some questions:
- How should we arrange the sprites to be in the worst case of VDP resources usage?
- Is testing only VRAM filling relevant for testing access times or do we risk missing other problems?

I just thought I should make sure we're out of the V-Blank when testing. I'll fix that tonight...

Entrar ou registrar-se para comentar

Por DarkSchneider

Paragon (1030)

imagem de DarkSchneider

06-04-2023, 16:32

There are multiple posts with the same. And yes, seems that on real MSX the timing is a bit slower, if you want to copy RAM<->VRAM the best and safe currently is using OTIR/INIR that takes 21 (22 on MSX with the M-cycle) cycles and doesn't fail.
Anyway the full test should be made for each mode, as seems that on bitmap mode you can use that table, on pattern mode (Screen 4 i.e.) have to use the OTIR/INIR timing, and on text mode (specially Screen 0) maybe some slower.

Por aoineko

Paragon (1138)

imagem de aoineko

06-04-2023, 18:15

Indeed, the test is done for each screen mode (on the 2nd screenshot, each line is a mode).
The plain vertical bars represent the expected speed limit of each screen mode (according to MAP) and are adjust if program is running on MSX1 or above to match VDP generation difference.

Por Grauw

Ascended (10821)

imagem de Grauw

06-04-2023, 18:39

The bars on your screenshot don't match the chart you quoted... The chart says T1 and T2 need 20 cycles while G1 needs 15, but your screenshot has the 20 cycle marker for T1 and G1 while T2 gets the 15 cycle marker.

Por aoineko

Paragon (1138)

imagem de aoineko

06-04-2023, 19:12

You are right. I'll fix that.

More important, do you think the test (fill VRAM at different speed) is relevant to determine access speed limit?
Are the functions timing calculation correct?
How to create worse case scenario for access timing?
Be off v-blank + display all 32 sprites in 32x32 pixels ?

Por Grauw

Ascended (10821)

imagem de Grauw

06-04-2023, 19:51

The function timings seem correct. You can test the 18 cycle case using ret nc and the 21 cycle case using inc hl to wait, see the Z80 + M1 column here, click on the header to sort it: http://map.grauw.nl/resources/z80instr.php.

Note that systems with the Toshiba T9769 MSX-ENGINE add 1 or 2 extra cycles on every VDP I/O, so be wary interpreting results on these machines. Perhaps you can test for it and adjust the cycle counts displayed to remove any doubt of what hardware is in the machine the user reports they are using. You can actually use this extra wait to your advantage to test the 13, 15 and 16 cycle cases.

Sprite count should not matter, they are read from VRAM regardless of whether they occur on a line. However if the timings do not match our current understanding then honestly all bets are off about the cause… so who knows, maybe it does matter after all?

Personally I think once hit an access time limitation when I was outputting too quickly while a VDP command was executing. The VRAM access and VDP commands contend for access slots, in theory the VRAM access should take priority but I think this is not always the case and there is a window where the access slot is already reserved for the VDP command.

Por Pencioner

Scribe (1610)

imagem de Pencioner

06-04-2023, 19:50

Citation from wiki:
The differences between all the versions of the T9769 chip are not yet known, //... skipped ...//. One difference is that the VDP I/O delay is different for these versions. For the C revision it is 1 cycle and for the others it is 2 cycles.

From here:
https://www.msx.org/wiki/Toshiba_T9769

I have PHC-35J and it's using one of those chips, and Unknown Reality demo doesn't work properly at some scenes. Could it be detected somehow so affected machines don't have incorrect results of a measurement?

Por aoineko

Paragon (1138)

imagem de aoineko

07-04-2023, 02:03

Here are the results for my real Panasonic FS-A1.

Default (sprites + display):

No sprites:

Display disabled:

First conclusions:
- By default, the result almost perfectly match the expectation (perhaps with a bigger number of iteration we may have a perfect match).
- Disabling sprite seem to have an effect, but this is almost in the error margin, except for G4 and G6 (the two 4-bits color mode) where change is significant.
- Disabling screen don't match the expectation (no limit in all screen mode) but only for MSX2 bitmap modes (G4-G7). In other mode, the gain seem almost null.

You can download the program here: VATT 0.2 (both ROM or .COM)

Here is the same machine emulated with openMSX:

If I disable screen display, all modes are marked as "OK" for any speed functions (which does not match the observed result).

Disclaimer:
- I haven't implemented the feature yet to make sure I'm testing outside of the V-Blank period (so the results are potentially a bit biased).
- Some screen modes don't display sprites (G5 and G7 I think). I'm not sure why (anything seem OK in the data side), but this may bias the results a bit for the test without sprites.

I have no real MSX1 at home so I can't test TMS9918 (or clones) behaviors.

Por Bengalack

Paladin (802)

imagem de Bengalack

07-04-2023, 16:57

I appreciate the work aoineko!

My only problem with the info on the map page is this:

(this issue has also been discussed here https://www.msx.org/forum/msx-talk/development/wait-15-t-sta... which turned into an issue on github.

Por aoineko

Paragon (1138)

imagem de aoineko

07-04-2023, 18:11

Yes, I know this very interesting thread but I wanted... numbers. ^^

Once I have validated that VATT is accurate, we will be able to know the VRAM access speed limit of each screen mode for each VDP generation in various condition.
I'll add option to force test to be in or out the v-blank period, and a x128 iteration test that will be slow but really accurate.

Por Bengalack

Paladin (802)

imagem de Bengalack

07-04-2023, 18:15

IIRC - when testing, I was nothing near "no speed limit", as it is stated on that page. I had the same limits in VBLANK (I think I was only testing sprites at the time, but can't remember fully).

Página 1/16
| 2 | 3 | 4 | 5 | 6