Chibi Akumas Episode 2: Confrontation! [game for CPC then MSX2]

Página 5/9
1 | 2 | 3 | 4 | | 6 | 7 | 8 | 9

Por keith56

Master (162)

Imagen del keith56

04-12-2017, 22:38

I've got the game running on the MSX2 now in an early state.
Chibi Akumas uses two screen buffers, and redraws the whole screen every frame. it seems the CPC is much faster at 'flood filling' the backgrounds than the MSX, as the CPC version is slightly faster than the MSX2 one - the MSX2 catches up a bit on the sprite drawing, but it's slightly slower than the CPC overall, of course there may be some optimizations I can make, but at this stage it's not the CPU during the game logic that's slowing things down, but the time it takes for the VDP to do the work, the CPU is stuck waiting for the VDP to finish the last fill/sprite job it was given.- to be honest, based on my previous testing, I was expecting it to be much worse.

I don't think its a big problem, because, while it's a bit slower, It's still perfectly playable.

I have a V9990 version, which is extremely fast - far faster than the CPC version - I'm going to have to slow it down, because it's too fast to be playable.

I'll be posting pictures soon once I'm happy things are as good as I can make them.

Por Grauw

Ascended (10605)

Imagen del Grauw

04-12-2017, 22:33

I see! Didn’t know that…

On MSX, the M1 wait is there because the instruction fetch memory access is really short (just barely 1½ cycle), so the system inserts the wait to give RAM/ROM chips more headroom to respond. As is described on page 30 of the Z80 CPU manual.

The running theory, the one mentioned in the CPU manual, is that it is to allow manufacturers of machines and cartridges to use cheaper memories. However since on MSX the memory does not connect directly with the CPU, I think it’s to allow a certain amount of logic ICs inbetween (which I think delay the signal as well?), for expanded slots and external slots.

Por Grauw

Ascended (10605)

Imagen del Grauw

04-12-2017, 23:14

@keith: Nice to hear your progress, and that the gap between MSX2 and CPC is not as big as it initially seemed to be! Down to only “slightly slower”. Maybe you can squeeze out some further performance by improving CPU-VDP parallelism?

I wonder, is it faster on turboR? And will you backport the V9990 code to CPC? I’m surprised it’s so much faster on V9990 btw, I thought with so many things going on on screen, the CPU would be the bottleneck as much as the VDP.

Por keith56

Master (162)

Imagen del keith56

05-12-2017, 00:31

There's certainly some improvements that can be made, but one limitation is how I'm coding the game, I'm not writing an MSX version of the game, I'm writing a game that can compile on Spectrum,CPC or MSX - so 90% of the code is identical... it's already taking a lot of work developing code that works on so many hardware configurations, so I don't want to go crazy and completely rewrite huge chunks of the code to get a 3% speed increase - I think the main problem is the fill rate of the MSX2, and that the CPU code is already so optimized that it doesn't make any real impact

I've done some early testing (this morning!!), the TurboR doesn't make it noticably faster at all! I didn't think it was working, but the continue timer was going super fast, so I think it was - is there any way in the OpenMSX debugger to see which CPU is using ??

Has anyone done any speed testing on the V9990? - at a guess I reckon it's about 10x faster than the MSX2 VDP
It would in theory be possible to backport to the CPC, but my reason for migrating to other platforms was to increase the popularity of the game, and I don't think my time is well spent on a niche version CPC-V9990 port... the MSX2-V9990 port is almost identical to the regular MSX2 one, so it's development is almost free.

Por ARTRAG

Enlighted (6846)

Imagen del ARTRAG

05-12-2017, 09:23

Definitely the vdp is the bottleneck. How do you use it to fill the screen? Do you delete the whole screen with one command? Have you considered to delete each object selectively?

Por keith56

Master (162)

Imagen del keith56

05-12-2017, 09:47

I delete the screen with 3-6 commands, if you look at the first post, you'll see the background has a 'gradient effect, and tiled paralax' - so I'm doing all these separately, the MSX2 is too slow for the gradient, so I'm just doing a solid color fill (HMMV), the bitmap tile I'm doing as 2x128px wide sprites (HMMM) so the effect is as close as I can get to the CPC -

If I just make the background solid black then the game runs about 20%-30% faster, but looks lousy... I don't think that deleting each object selectivly is going to work with up to 60 sprites and 300 bullets onscreen :-P

Por Grauw

Ascended (10605)

Imagen del Grauw

05-12-2017, 11:26

keith56 wrote:

I've done some early testing (this morning!!), the TurboR doesn't make it noticably faster at all!

Interesting! And maybe a little expected, if the VDP (or the communication with the VDP) is the bottleneck, although I would really expect there is some improvement. Make sure it is in turbo mode though... if you boot from a cartridge or a DOS1 disk, it’s not in turbo mode by default.

keith56 wrote:

Has anyone done any speed testing on the V9990? - at a guess I reckon it's about 10x faster than the MSX2 VDP. It would in theory be possible to backport to the CPC, but my reason for migrating to other platforms was to increase the popularity of the game, and I don't think my time is well spent on a niche version CPC-V9990 port... the MSX2-V9990 port is almost identical to the regular MSX2 one, so it's development is almost free.

Ah ok, since the V9990 is going to be (or is already?) available for CPC I thought maybe you might do it Smile. Yeah I think it’s 10-20x faster, both because the VDP command engine runs many times faster, and also because I/O access is not throttled on turboR.

Por Manuel

Ascended (18876)

Imagen del Manuel

05-12-2017, 22:16

keith56 wrote:

is there any way in the OpenMSX debugger to see which CPU is using ??

You can run this command in the openMSX console: get_active_cpu

Por keith56

Master (162)

Imagen del keith56

05-12-2017, 22:48

get_active_cpu was exactly what I needed! Thanks!... rather embarassingly I forgot OpenMsx had a console!... the windows debugger is so good I've not had need for it
I see the Z80 kicks back in when the firmware is disk loading, then the R800 re-enables itself automatically when the loading is finished - quite interesting!

Por ARTRAG

Enlighted (6846)

Imagen del ARTRAG

05-12-2017, 23:23

keith56 wrote:

I delete the screen with 3-6 commands, if you look at the first post, you'll see the background has a 'gradient effect, and tiled paralax' - so I'm doing all these separately, the MSX2 is too slow for the gradient, so I'm just doing a solid color fill (HMMV), the bitmap tile I'm doing as 2x128px wide sprites (HMMM) so the effect is as close as I can get to the CPC -

If I just make the background solid black then the game runs about 20%-30% faster, but looks lousy... I don't think that deleting each object selectively is going to work with up to 60 sprites and 300 bullets onscreen :-P

If you have the 3-6 large copy commands one after the other, the 2nd command has to wait for the end of the 1st, the 3rd the end of the 2nd and so on, wasting CPU time in polling the CE flag. If this is the case, have you considered to spread apart the copy commands in order to minimize the time the CPU has to wait for the VDP ?
If you succeed to interleave the VDP commands with CPU processing you can avoid the waiting of the VDP, and greatly improve the speed. Sometimes it is convenient to use the CPU to plot bullets or simple lines to not wait the VDP while doing large copies or filling (both the CPU and VDP copy engine can access in parallel to the VRAM, with some small speed impairment for the copy engine).

Página 5/9
1 | 2 | 3 | 4 | | 6 | 7 | 8 | 9