wait 15 T-States between reading and writing to Vram

Page 4/6
1 | 2 | 3 | | 5 | 6

By Bengalack

Hero (578)

Bengalack's picture

30-01-2022, 09:54

In an attempt to save a few cycles I've jumped the bandwagon Smile and looked into this. Seemingly there isn't much gain when the screen is on, but I wanted to do something in the screen blank / immediately at the interrupt start. From what I can see, norakomi's question was not about vblank, but when screen is on, so this is slightly on the side but still very much related.

It is mentioned both on the wiki and the map-pages that there is no speed limit, in this case. Result:

It works in openmsx17.
It doesn't work on physical machines.

Tested on a1-wsx (T9769C chipset) and svi-738-msx2.

My test: Updating sprite attribute table from a list of bytes with yx-values, like this: y, x, y, x, y, x ...
Code: Unrolled macro like this:

.macro writeSpriteAttrTableRamToVramMasksOne
	outi
	outi
	in a, ( VDPIO )
	in a, ( VDPIO )
.endm

==>

.rept 8
	writeSpriteAttrTableRamToVramMasksOne
.endm

Unless I misunderstood something or did something wrong, openmsx does not emulate this 100% and there is actually a speed limit here.

By Manuel

Ascended (18783)

Manuel's picture

30-01-2022, 10:49

Did you emulate the same machine as the real machine you tested on? Can you share something executable that can be easily tested?

By Bengalack

Hero (578)

Bengalack's picture

30-01-2022, 11:53

Manuel wrote:

Did you emulate the same machine as the real machine you tested on?

Yes. During this development, I have tested on these machines in openmsx:

* Sanyo_phc-70FD2 (+2 wait-states)
* Panasonic_FS-A1WSX (+1 wait-states) - this one tested physically on
* Philips_NMS_8250

All the above works in openmsx. I don't have a openmsx machine-"profile" for my modded SVI-738 MSX2.

Manuel wrote:

Can you share something executable that can be easily tested?

I know, I know, to fix this, we must have an isolated test-case! I don't have that, it may be a bit of work to set up, which will take some time. My current results comes from Lilly's Saga, which does comes with a lot of code that is not ready for sharing at this point in time. Given that I'm not mistaking here, and this (edge-)case really is an error, how interesting is it for the openmsx-team to fix it? It would help me to know your priorities wrt how I should prioritise providing an isolated testcase.

By Manuel

Ascended (18783)

Manuel's picture

30-01-2022, 12:00

It is always interesting to fix things that do not match real hardware behaviour. But I can't promise that it will be fixed within a given time. It may take a lot of time to find out what is wrong exactly and it is very well possible that no one has that time for a (long) while.
All I can say is that a test case makes the probability that it will be investigated a LOT higher.

By hit9918

Prophet (2923)

hit9918's picture

30-01-2022, 22:17

A copy of the SAT takes 3.2% cpu. And then with the two IN it is 16% faster. You spend huge effort in trying to save 16% of 3.2% cpu.
So, this thread practicaly is saying the typical "the MSX VDP is slow", for something that took 0.5% cpu... But the time is not taken by the VDP but by some unknown code.

By santiontanon

Paragon (1636)

santiontanon's picture

30-01-2022, 23:27

It's not my game, so, I'm not sure. But 0.5% cpu might mean having one or two extra enemies/items in the game without slowdowns, severely impacting level design and gameplay. So, I think all these optimization efforts are always worth it Smile

By hit9918

Prophet (2923)

hit9918's picture

31-01-2022, 01:27

But something got broken. With this "optimization" of copying only 2 of 4 bytes, pattern animation is lost.
When you need 0.5% cpu, why not search in the other 97% (!), why insist to search in those tiny 3% and then break something.

By santiontanon

Paragon (1636)

santiontanon's picture

31-01-2022, 03:09

haha, yeah, you have a point. I'm sure Bengalack has searched everywhere else too. But I'm putting words in his mouth, so, I'll let him comment Wink

By PingPong

Prophet (3889)

PingPong's picture

31-01-2022, 12:37

Effectively...
And again the usual mantra: "VDP is slow".
the vdp is not so slow, in most operations, the very high overhead is the VRAM address ptr setup which have a lot of ceremony to be executed, expecially on msx2 machines.
for some contiguos block operations (vram operations tends to be block operations) the speed, even in active area is not that bad.
a typical otir takes the same amount of time of an LDIR and due to autoincrement feature of VDP could save some z80 registers and increments that could be used for extra things.

By Grauw

Ascended (10581)

Grauw's picture

31-01-2022, 13:26

There is still a speed limit, just it’s lower than the fastest the Z80 can do (10 cycles).

But this looks to me more specific than just a wrongly documented speed limit during vertical blank. It seems like it may be caused by the interplay between mixed OUTs and INs to the VRAM transfer port. I think there is not much information known about how these relate to each other timing-wise, how they’re handled by the VDP precisely.

Can you still reproduce the issue if you replace the INs by OUTs?

Also, are you certain the transfer always starts and ends during vertical blanking? It’s quite sensitive to timing, if it’s occasionally pushed out of the vertical blank period by a long music player frame on interrupt handler, or interrupts that are kept disabled for too long on the main loop, that would cause issues.

Page 4/6
1 | 2 | 3 | | 5 | 6