Direct Video Memory Access (DVMA) for V9938

Página 2/6
1 | | 3 | 4 | 5 | 6

Por st1mpy

Paladin (849)

imagem de st1mpy

21-10-2010, 21:06

Very cool! Almost as impressive as your wooden joystick you made (on your web site).

I'd like to see its capabilities (a demonstration software to show what it can do). A Youtube video would be nice. (Is there one for ADVRAM as well?)

Por PingPong

Prophet (3885)

imagem de PingPong

21-10-2010, 21:11

Nice. The one thing that comes to my mind is: What if this was implemented 20 years ago as a new standard for msx.
OK, that's another fan of direct vram access. As said by many people an infinite number of times, even a z80 working directly on vram could not make any difference pratically. the z80 have about the same processing power of a msx2 vdp.

Of course 20 years later, with the improving hw speed, one could think that a super cpu can outperform the vdp. True.
But improvements are not only on cpu's, they are also true for IC like VDP.

So actually one could have a super vdp without need to have to load a generic cpu with the job of handling graphics.

So, the right question should have been, 'what could have been possible if vdp speed, 20 years ago, was 3-4 times faster?'

the answer, is too simple: ' a lot of things' Tongue

Por Eugeny_Brychkov

Paragon (1184)

imagem de Eugeny_Brychkov

22-10-2010, 00:30

@PingPong:
z80 working directly on vram could not make any difference pratically

Let me show you the difference

Accessing through port

(4)	di
(7)	ld	c,99h
(12)	out	(c),l
(8)	set	6,h
(12)	out	(c),h
(4)	nop
(11)	out	(98h),a
(4)	ei

Total: 62 T-states (13 bytes) + problems with NOP duration if Z80B is over-clocked at 7MHz

Direct access

(10)	ld	de,4000h
(4)	xor	a
(15)	adc	hl,de
(7)	ld	(hl),a

Total: 36 T-states (7 bytes, including mapped memory window off-setting)

super cpu can outperform the vdp. True.
Machine with super CPU or with super VDP is not MSX machine (MSX1, MSX2, MSX2+, MSX Turbo - as defined per standard). It will be a PC with AMD Athlon and NVidia card.

Finally, we do not talk here about CPU doing VDP's work - but about how CPU can better interact with VDP.

Por Eugeny_Brychkov

Paragon (1184)

imagem de Eugeny_Brychkov

22-10-2010, 00:35

@Asuka:
I'd like to see its capabilities (a demonstration software to show what it can do). A Youtube video would be nice.
I have made "rotating cube" demo, but it only proves that hardware does work. Overall speed of rotating cube is completely "eaten" by calculation of the points/lines (or screen adresses). Thus you will notice nothing extraordinary in You Tube video except that it works.

I think I need another demo...

Por flyguille

Prophet (3028)

imagem de flyguille

22-10-2010, 00:41

so ok, the normal way is available, and the fast way to acces vram too....
so it is still inside the standard but with extras

PingPong, when you has direct VRAM access, the routines of your games will be written more efficiently, gaining yeld

because, if you structures a game thinking that you must have do all in a sequential way to avoid by all means to reset the addr register

and if the same game you structure the game thinking that if you do RANDOM reads/writes you will not loss speed

I am sure that more a more games with large gfx refresing is doable in a lot smart way.

Like what?, like those routines that check not to redraw everything that must not change? those are heavily on random vram capabilities without losing speed.

like, fastly upgrade sprite patterns avoiding to use several sprites patterns numbers for the same sprite object?

so a lot of improvements.

, on the only one thing that I can imagine that is not fast, is on bitmap copy coomands handling.

but maybe you can to set up a command on vdp to do the half of the copy, and the Z80 that do the other half, doubling speed. FPS

Por PingPong

Prophet (3885)

imagem de PingPong

22-10-2010, 10:26

Let me show you the difference

Accessing through port

(4)	di
(7)	ld	c,99h
(12)	out	(c),l
(8)	set	6,h
(12)	out	(c),h
(4)	nop
(11)	out	(98h),a
(4)	ei

Total: 62 T-states (13 bytes) + problems with NOP duration if Z80B is over-clocked at 7MHz

Direct access

(10)	ld	de,4000h
(4)	xor	a
(15)	adc	hl,de
(7)	ld	(hl),a

Total: 36 T-states (7 bytes, including mapped memory window off-setting)

Finally, we do not talk here about CPU doing VDP's work - but about how CPU can better interact with VDP.

@Eugeny: Let's me explain:

If one do an absolute compare, that's true, and even worse of your example. This is somewhat the same i've thought time ago. But real word is different.

you must focus on what kind of operations you do with vram, there are basically two kinds of operations:

1) random access. you use this while doing pixel drawing, like pset, lines, or circle
In this kind of use, the time wasted for mapping a logical pixel into a vram address is nothing compared to the time used for calculating the line itself (look at bresenham line drawing algo for example). So the different between two approaches in vram access is somewhat of tiny.

2) block move access. You use this while doing sw sprites, scrolling , background animations. In this mode, you usually setup the address pointer, then read and write some contiguous bytes. In this mode the vdp way is the better. you do not need to code the instruction to update the vram pointer. The vdp does this for you.

In this situation what is extremely important is data rate. Now, let's compare the power. (I consider a pure msx, no overclock, since it's not an msx anyway as you said)

z80 doing LDIR to move a vram region to another vram region. about 21 t-states x byte. about 170Kb/sec
vdp doing an high speed copy move (rectangle move, more suitable for animations than ldir). About 170Kb/sec, with sprites, active area enabled and 50hz frame rate.

they are the same.

Most people that are fan of direct vram access, always forget one thing:

for the majority of uses, (2) the I/O port based approach is not bad. It's VRAM timing that is BAD!

You achieve better speed because you are using vram that have small access time than the original. If you were constrained to 20 years ago RAM, you probably could have realized anyway memory mapped schema, but the delay imposed to CPU when accessing vram without disturbing VDP, would be so high that the final results would not be different (in terms of speed) with respect to the msx way of vram access.

Remember: you go slow because the VRAM is slow, not because the vdp itself is slow. In blanked area, even a z80 @8Mhz could make the msx1 vdp in trouble. But in active area, while the vdp squeeze every drop of vram slot access time even a z80 with 3.5Mhz approach will have to wait a bit.

So, the real problem is VRAM bandwith. Improve this, and you will not need of the mapped vram access for 90% of situations.

As an example, the V9990 uses a similar port based schema when accessing vram. (Works like the TMS one). Prodatron, with a similar schema is able to play three videos concurrently each one at 6fps in screen 7 like mode with a msx turboR.
Thanks to the faster vram interface, even with port based I/O.

Surely not bad.

Obvously, having also a VRAM mapped schema is somewhat a better thing, but not soooo required... IMHO

Por PingPong

Prophet (3885)

imagem de PingPong

22-10-2010, 10:50


PingPong, when you has direct VRAM access, the routines of your games will be written more efficiently, gaining yeld
because, if you structures a game thinking that you must have do all in a sequential way to avoid by all means to reset the addr register
and if the same game you structure the game thinking that if you do RANDOM reads/writes you will not loss speed...

That's the common thing that one can say. The problem is that most programmers tend to use any system like another ( they program an msx like a ZX spectrum, to be clear). Then they complain about poor performances.

The problem is always solved by thinking the game FOR THE SYSTEM, rather than emulating one system on another.
There are a lot of zx convertions where the vram is managed in z80 RAM (as in speccy) then uploaded every frame to the VDP VRAM. Of course this way is the best approach to have poor performances on msx.

Some games, instead for example, are converted by taking advantage of msx specifics (like name table driven gfx). In this way a scroll routine need to move 768 bytes instead of 6144. Even with msx I/O based approach the gain in speed is huge.

I am sure that more a more games with large gfx refresing is doable in a lot smart way.

I am also sure that more and more games could have been better is one used msx specifics features instead of relying on brute force ram-vram upload ;-)

Like what?, like those routines that check not to redraw everything that must not change? those are heavily on random vram capabilities without losing speed.

Not sure about what you are meaning: you use this approach to avoid data I/O.
thus in a average this works also in I/O port based approach, and because of the overhead of setting the vram pointer the gain should be even greater that with memory mapped schema


like, fastly upgrade sprite patterns avoiding to use several sprites patterns numbers for the same sprite object?

if you mean hw sprites, you can do this directly on vbl, without need to save and restore the background. so a bunch of otir, faster than any LD A,(HL) AND something LD(HL),a INC HL, DJNZ somewhat.....

IMHO the msx does not lack a better I/O VRAM interface, it does lack pure I/O speed, because of slow vrams.

Of course in heavily vram random access scenarios, you surely loose a lot of speed, because of the need to set the infamous vram ptr (and also the lack of separate read and write ptrs is a big problem, for me).
But i think, this overhead, compared to the time you send to / read from vram the effective data, is negligible.
Compare the time you need to upload a single 16x16 pixel pattern (aligned) from vram to ram.

You probably need some extra time on msx but not soo much.

Por Eugeny_Brychkov

Paragon (1184)

imagem de Eugeny_Brychkov

22-10-2010, 11:27

Pingpong, thank you for your thoughs. I see your denial, but do you have anything to propose to make things better in present time (not 20 years ago)? By the way, if you do not like the idea or hardware, you are free not to use it.

Por PingPong

Prophet (3885)

imagem de PingPong

22-10-2010, 11:46

Pingpong, thank you for your thoughs. I see your denial, but do you have anything to propose to make things better in present time (not 20 years ago)? By the way, if you do not like the idea or hardware, you are free not to use it.
nothing against the idea of new hw, even for msx1 vdp. (that is the most touched by vram speed issues). So your idea was welcome, for sure!

Can you illustrate ( from a electronic point of view ) how this work?

(signals, address decoding, waits, and so one ?)
thx

Por DD

Expert (88)

imagem de DD

22-10-2010, 11:52

Maybe you can use existing software of the ADVRAM as a demo as well, as far is understand the idea is the same, VRAM in the Z80 address space. I hope there is a good amount of software already but if i search internet i doubt it. There is asked for schematics two times, i don't know of you want to keep the design for yourself but in that case you'll have to write software yourself also or cooperate with a good programmer you know well. It's hard to let hardware break through without any software. Anyway i really hope this direct-vram concept will have more success than ADVRAM!

Página 2/6
1 | | 3 | 4 | 5 | 6