3D raycasting

ページ 11/16
4 | 5 | 6 | 7 | 8 | 9 | 10 | | 12 | 13 | 14 | 15 | 16

By wolf_

Ambassador_ (10037)

wolf_ さんの画像

19-11-2011, 14:49

What's the bottleneck in this whole engine btw? The CPU with its calculations or the VDP with its drawing?

By ARTRAG

Enlighted (6891)

ARTRAG さんの画像

19-11-2011, 16:46

Not the CPU nor the vdp by themselves
The real show stopper is the delay the CPU incurs each time it accesses to vdp ports
Actually, each time you send a byte on I/O ports to the vdp, you have to wait 52 extra cycles of delay from the last I/O command or the R800 will be halted by the circuits in the TR main board.

The hard work (done by Wouter) was to code scalers where there are always not less than 52 cycles delay between each two VDP I/O instructions

It is sad to say but the msx TR had no chance of success: it failed probably for this flaw in the design
All the speed of the r800 is totally lost when accessing the VDP

By Lord_Zett

Paladin (807)

Lord_Zett さんの画像

19-11-2011, 17:08

now great gfx put on the walls and it looks great

By wolf_

Ambassador_ (10037)

wolf_ さんの画像

19-11-2011, 18:00

Is there a lot to win (apart from the palette) in performance when using a G9k? And how about MSX2 with G9k, what kind of framerate would that give?

By ARTRAG

Enlighted (6891)

ARTRAG さんの画像

19-11-2011, 18:39

I do not know G9K so I have no idea
From what I've read there is no delay in accessing to its registers, so potentially the bandwidth towards VRAM should be 5 times larger (using an R800 I mean).
Using a z80 and G9k makes no sense, as the z80 would not benefit of the extra i/o speed
The extra speed in drawing boxes (commands are faster) would be totally nullified by the cost of raycasting and of multiplications used in the scaler

PS
The code has been updated fixing a small bug
I've also risen to 16 the number of sprites in the level

By msd

Paragon (1508)

msd さんの画像

19-11-2011, 19:15

@Atrag: Unless you have a z80 running faster than 3.54Mhz

By ARTRAG

Enlighted (6891)

ARTRAG さんの画像

19-11-2011, 19:22

@Atrag: Unless you have a z80 running faster than 3.54Mhz

how faster?
in case of mz3d multiplications and divisions should kill even a z80 at 7MHz

By msd

Paragon (1508)

msd さんの画像

19-11-2011, 19:43

Using a z80 and G9k makes no sense, as the z80 would not benefit of the extra i/o speed
Only for this it matter.. and yes it will not be fast enough Tongue

By wouter_

Champion (492)

wouter_ さんの画像

20-11-2011, 09:58

What's the bottleneck in this whole engine btw? The CPU with its calculations or the VDP with its drawing?
The speed of the current engine is only for a small part limited by the CPU speed. The main bottleneck is the drawing of the walls (and sprites). But here it's not the VDP that is the bottleneck (we're sending lots of small VDP commands and most of the time such a command is already finished before we can send the next one, except when you come very close to a wall). Nor is the CPU the bottleneck, instead it's the delays introduced by the S1990 when the R800 and the V9958 are communicating too fast.

The hard work (done by Wouter) was to code scalers where there are always not less than 52 cycles delay between each two VDP I/O instructions
Actually it's not about having always more than 52 cycles between VDP IO operations. It's about having to do less work 'outside' VDP operations. If the S1990 detects an IO to the VDP that is less than 62 cycles after a previous IO, it will stall the R800. The IO itself takes 10 cycles, so that means
there is room for 52 other cycles to do useful work. So what I tried to do is move as much work from before the 1st or after the 2nd IO instruction to between the 2 IO instructions.

Is there a lot to win (apart from the palette) in performance when using a G9k?
The S1990 does not introduce stalls when the R800 and the V9990 communicate. So there you gain a lot of speed. In the (few) cases where we have to wait for a V9958 command to finish, the corresponding waiting time for V9990 will also be less. So it's indeed very likely a gfx9000 version will be faster (but I don't dare to estimate how much faster).

And how about MSX2 ...
We currently make heavy use of the R800 multiplication instructions. Though only in the texture map routines (sprites and walls). The actual '3d' part of the code also uses multiplications of course, but that code is not that critical for speed. _Maybe_ it's possible to rewrite the texture map routines to not rely on multiplication so much (likely it will be slower on R800, but possibly better suited for Z80).

Any help to work on a gfx9000 and/or MSX2 port would be welcome of course ;-)

By RetroTechie

Paragon (1563)

RetroTechie さんの画像

20-11-2011, 10:48

The hard work (done by Wouter) was to code scalers where there are always not less than 52 cycles delay between each two VDP I/O instructions
Actually it's not about having always more than 52 cycles between VDP IO operations. It's about having to do less work 'outside' VDP operations. If the S1990 detects an IO to the VDP that is less than 62 cycles after a previous IO, it will stall the R800. The IO itself takes 10 cycles, so that means there is room for 52 other cycles to do useful work. So what I tried to do is move as much work from before the 1st or after the 2nd IO instruction to between the 2 IO instructions.

Or read that as: distribute work in time, such that VDP access & CPU work interleave nicely (with 62 cycles or more between VDP I/O's). Access VDP -> fetch data/do calculation -> access VDP -> fetch data/do calculation -> access VDP etc.

We currently make heavy use of the R800 multiplication instructions (..) _Maybe_ it's possible to rewrite the texture map routines to not rely on multiplication so much
Well if it's really performance-critical, you could re-code byte*byte multiplications as table lookups, and throw memory at the problem. If you need byte*byte calculation (with 16 bit result), you'd need 256*256*2 bytes = 128K table. A lot of memory & not so elegant solution (perhaps not even that fast), but possibly faster than shifting bits & add intermediate totals on a Z80.

ページ 11/16
4 | 5 | 6 | 7 | 8 | 9 | 10 | | 12 | 13 | 14 | 15 | 16