MSX and VRAM access speed: Is really so bad? what do you think?

By PingPong

Prophet (3885)

PingPong さんの画像

10-01-2022, 20:38

How many times we have heard that msx had vram access slow? A lot. this is one of the most overrated issue when talking about speed in gfx management. Well i do not think is slow as most people say. At least for msx2 or higher machines.
Let's examine the real scenarios:
a) if vram I/O operations tends to be block operations in contiguos addresses, (the majority): (copy a block, copy a tile to RAM copy a group of tiles from RAM to VRAM etc). In those situations one use typically the OUTI or OTIR operations, which have the same speed of LDIR. One could use out (nn),a or out(c),r in VBLANK. Let's examine the time they take.
LDIR & OTIR take 18 cycles: 5us.
OUT (nn),a take 12 cycles: 4us.

By comparison a a standard LD a, (HL) takes rougly: 2us
however, due to the kind of operations the majority of I/O is performed with OUTI / OTIR, the speed is the same than I/O operations on z80 memory.
b) if the I/O operations are in non contiguos addresses: here the difference is a lot more: there could be an order of magnitude of difference. Here effectively the VRAM access speed clearly suffer.

however, taking for example the (a), what i see, in terms of data throughput compared to others machines is not so different.
For example:
- take the c64 on uncontended scenarios: a memory write (not page 0 ) takes 3 cycles. So 3us. The same ballpark of a z80 in msx (you need also to increment counters, and addresses here which the vdp does for you in block moves, and this also takes time)
- zx : it does have direct access but also suffer of memory contentions that could be estimated on average to be 2-3 cycles, so 0.5 to 1us leading to a LD a,(HL) requiring 9-10 t-states.
- amstrad CPC was similar (even if it's cpu work at a full 4Mhz)

- there are systems where you can only access vram during vblank which limits the data throughput a lot.

I do not consider the msx2 the machine with faster VRAM access in the world,but i think the situation is less dramatic than most people say.
I do not consider the TurboR: it is a different situation. The CPU speed should have been mached with a faster VDP, unfortunately this was not done...

ログイン/登録して投稿

By thegeps

Paladin (1020)

thegeps さんの画像

11-01-2022, 00:34

When writing sequentially I see no gap. And if writing sequentially during vblank even better

By PingPong

Prophet (3885)

PingPong さんの画像

11-01-2022, 00:59

yes, for example, most zx sprectrum coders really complain about the speed when they convert existing zx games to msx, but they do not realize that it's not a vram I/O speed that slows down the FPS, it's the double size of color attribute table that is 6144 vs 768 bytes on zx, pratically the msx had near to double the amount of work to do.
If the game do not update the color table speed is almost identical

By gdx

Enlighted (5368)

gdx さんの画像

11-01-2022, 01:48

What is slow is to define an address because it is broken down into 3 parts, one of which is in a register. We must also take a few precautions to avoid accessing VRAM too quickly. And finally, the graphics commands are quite slow compared to other machines of the time. This can be seen in games that make annimations in demos with copies. Even with a small image, you can easily see the copies being made. To avoid this, you have to alternate two pages because it cannot be clocked by the VBlank because copy is too slow. This is why it is quite difficult to scroll the screen horizontally. It took a while to come on MSX2. While it was already widespread on several machines even the famicom / NES. They had to add a 2+ register to make it less laborious.

That said, overall it wasn't a bad VDP. While slow for some things, it does a lot of things that others weren't doing.

By PingPong

Prophet (3885)

PingPong さんの画像

11-01-2022, 15:53

gdx wrote:

What is slow is to define an address because it is broken down into 3 parts, one of which is in a register.

I Agree, They should have redefined the interface needed to set VRAM ptr in a more clear concise way, like removing the addtitional register for 16K pages, embeed the write/read bit in some address to imply the operation without the need to waste time in a operation like "OR 0x40".

gdx wrote:

And finally, the graphics commands are quite slow compared to other machines of the time.

What machines? As far i know no other 8 bit machines of the era featured a blitter or gfx comprocessor.

About slowness: Yes, the distinction between logical and byte operation should have not here.
It is very simple, for example to switch between logical operation and byte operation based on Trasparent bit or IMP operation code. the vdp should always operate in byte mode masking out the unwanted bits on pixels, doing always a read followed by a second read then a write skipping the second read on IMP logical operation. that is already here. It's a matter of mapping the more available commands into a smaller ones, (internally the vdp should work as usual), and adding some kind of masking logic and a byte cache to avoid the waste of memory access when the byte address is unchanged between contiguos pixels (that is relatively easy even without a byte cache.)

引用:

...quite difficult to scroll the screen horizontally.

i think they have thought about, and almost sure that an in depth analisys of v9938 die could reveal some surprise about the horizontal scroll reg. [maybe a earlier not finished and not working implementation, disabled because of time to market]

By ARTRAG

Enlighted (6832)

ARTRAG さんの画像

12-01-2022, 07:36

Having dedicated ports for 16 bit addresses for reading and writing would have helped a lot the coders.
I would have also had distinct ports for high and low bytes, in order to be able to write one single byte if you change only the high or the low part of the address and two different inner registers for reading and writing so to be able to read from and addresses and write in another without setting new addresses.
But you know, things went differently.