WebMSX running SymbOS!

Page 4/5
1 | 2 | 3 | | 5

By ppeccin

Champion (375)

ppeccin's picture

23-05-2018, 23:00

tfh wrote:

Manuel seems a bit focused on comparing WebMSX to existing actual hardware. I consider it being an MSX in software with its own specs and features. It's a different concept.

Yes, WebMSX is an imaginary MSX machine, not existing hardware.
Yes, its not 100% accurate as a real HW machine. Far from it.

It does not try to be. There is a trade-of being made here. Its simple... We trade some aspects, to make some others possible.

By Grauw

Ascended (9817)

Grauw's picture

23-05-2018, 23:10

ppeccin wrote:

Your problem (this one you explained) is actually not related to Turbo at all.
Your problem is evidenced by the accuracy limitation of WebMSX. As of now, the VDP impl. on WebMSX is only line-accurate. This means that each line of the VDP is generated at once, at a specific point inside each line. This happens "instantaneously" from the CPU point of view. That is why we have some problems related to screen splits and tight timings related to HR and line INTs in WebMSX.

This problem can happen even at the normal 1x CPU clock. Its not related to the Turbo modes.

I don’t think that’s the case, the line accuracy is a different thing and actually works fine for my code since I don’t do any visible splits during line display. From my tests (which I’ve done quite extensively) the reason turbo fails is because the code simply completes its I/O access to the VDP unrealistically fast.

I don’t necessarily mind, the fix is easy after all, simply don’t run the game in turbo mode, the game doesn’t benefit from turbo anyway as it’s 60 fps vsynced.

However what I don’t like is the thought that someone tries to run my game in turbo modes, sees it doesn’t work well, and then draws the conclusion that I wrote crappy code which isn’t written with turbo CPUs in mind. While I actually went through lengths and quite some testing (also in WebMSX and openMSX) to get it to work right and stable at different CPU speeds.

Yes there is some timing dependency of course while doing screensplits, and at some point I do need to make some assumptions, but I took care to make only those assumptions which would hold true for any fully standards compliant MSXes with turbo CPU.

By ppeccin

Champion (375)

ppeccin's picture

23-05-2018, 23:10

Its strange. Your code should work for anything that is FASTER, if you sync to HR. Even at 100MHz with a faster bus. You only need to do your stuff INSIDE the horizontal border, before the usable part of the next line begins, and before the next HR, right?

What gets wrong if your code runs too fast?

By ppeccin

Champion (375)

ppeccin's picture

23-05-2018, 23:24

Oh... If your code syncs to HR, and then counts on CPU cycles to wait for the NEXT horizontal border (not the one right after HR), that is the problem. In that case you should sync to a second HR, right?

That, I think, is the case of Nemesis Enhanced.

By Grauw

Ascended (9817)

Grauw's picture

23-05-2018, 23:42

A perfect HR sync would poll twice, to check for the low to high transition or vice versa.

However sometimes I really need to cram 4 VDP register writes in one line, and then on a standard Z80 I only have time to poll for HR once, to assert that it’s high. Since it’s preceded by four VDP register writes (8 I/O operations) I feel this is a realistic assumption since the maximum I/O speed is limited by both the MSX standard bus and the VDP speed, by the time the next line’s poll is done it should be safely out of the HR period. But at higher emulation speeds, the I/O completes before the last HR period ends and it thinks it’s already on the next line.

Additionally there’s the case of the start of the line interrupt; the Z80 simply is not fast enough to respond within one line, so there is some CPU speed dependence there for me to select, test FH and restore the status register and call the line ISR, and if the CPU is too fast then the split will occur a line too soon. There is no real opportunity for me to sync on HR inbetween. However here too I do a fair amount of VDP I/O, so again I rely on the fact that I/O to the VDP is limited by the bus speed.

The latter case is more sensitive than the former btw (and harder to address), and I think this is actually where it fails at 4x emulation speed.

ppeccin wrote:

Oh... If your code syncs to HR, and then counts on CPU cycles to wait for the NEXT horizontal border (not the one right after HR), that is the problem. In that case you should sync to a second HR, right?

No, no such CPU cycle counting. I don’t attempt to time register changes during the HR period. Instead I carefully assign couple of VDP operations to each line, and do HR syncs inbetween so that e.g. display page and vertical scroll are set while I’ve enabled blanking, horizontal scroll is set on a line where I only use a single colour so you won’t notice it, and palette colours are set on the line before they are used.

Take a look at the split code for the top of the dialogue box border:

VDPRegister_SetDI_M  ; status register 2
Split_PollHREnd_M
VDPRegister_SetDI_M  ; palette index
VDPRegister_SetDI_M  ; disable display
VDPRegister_SetDI_M  ; disable sprites
Split_PollHR_M
VDPRegister_SetDI_M  ; pattern name base
VDPRegister_SetDI_M  ; vertical offset 1
inc c
PaletteColor_SetDI_M  ; color 10
dec c
VDPRegister_SetDI_M  ; enable display
Split_PollHR_M
VDPRegister_SetDI_M  ; horizontal offset (characters)
VDPRegister_SetDI_M  ; horizontal offset (dots)
inc c
PaletteColor_SetDI_M  ; color 11
PaletteColor_SetDI_M  ; color 12
dec c
Split_PollHR_M
inc c
PaletteColor_SetDI_M  ; color 13
PaletteColor_SetDI_M  ; color 14
PaletteColor_SetDI_M  ; color 15
dec c
Split_PollHR_M
ld a,(hl)
ld (VDP_MIRROR_8 + 23),a
Split_PollHRStart_M
VDPRegister_SetDI_M  ; vertical offset 2
VDPRegister_SetDI_M  ; palette index
inc c
PaletteColor_SetDI_M  ; color 10
PaletteColor_SetDI_M  ; color 11
dec c

The macros VDPRegister_SetDI_M and PaletteColor_SetDI_M are just two OUTIs. Split_PollHRStart_M and Split_PollHREnd_M poll s#2 twice, once for HR being low and once for it being high or vice versa, to nicely detect the edge transitions so that it’s definitely sure a new line has started. Where it goes awry is the Split_PollHR_M, this only checks for HR being high, and assumes all the I/O instructions inbetween are slowed down enough so that it doesn’t need to check if it’s low.

By ppeccin

Champion (375)

ppeccin's picture

23-05-2018, 23:49

I mean "count on" as in "rely on", not really count cycles. :-)

But oh... I see! So, does it work for 2x and 3x Turbo on WebMSX?
Because Nemesis Enhanced fails (a line too early) even on 2x.

And I have seen tons of problems and visual artifacts on several games when run on real machines with turbo ON.
But I agree, on WebMSX those problems will get worse.

What I may be able to do is at least make I/Os happen only at the 1x clock "boundaries" (BUS clock), to increase the "turbo accuracy". It will not be perfect and not a real "wait states", but code that rely on the VDP access speed would benefit at least somehow.

What you think?
I want to do what is possible to make your games not look crappy.

By Grauw

Ascended (9817)

Grauw's picture

24-05-2018, 00:16

To be honest I doubt many screensplits made for MSX take even as much care as I do here, already for many existing MSX games you see split artifacts on real MSX turbo CPUs, e.g. Quarth. And I don’t really mind that WebMSX isn’t perfect either in its (non-default) turbo modes, rather it’s maybe even a nice challenge or test case, as long as it’s not mutually exclusive with a real MSX.

And I do admit that an MSX with both Z80 and VDP in FPGA could access the VDP at full speed on the internal bus (like the turboR accesses internal memory at high speed too), while maintaining a standards compliant external bus (OCM doesn’t do this properly btw), so I just drew a bit of a line at some point what I consider a real MSX is.

But still the idea that someone might just think I did not take care of turbo CPUs at all when they play with the turbo settings stings a little, it’s a bit of a silly ego thing Wink, not an actual practical issue.

To make I/O happen on the bus clock is an improvement I think but probably not enough. From the top of my head, IORQ should remain active for ~3 bus clock cycles and MREQ for ~2 bus clock cycles. Adding a modest wait to VDP I/O will also improve compatibility. But still, it might help my rather carefully crafted splits, but probably most others will have much more reliance CPU speed.

And a wait controller itself would I guess not be too difficult to make, but to really benefit from the turbo CPU you need to also have an internal bus with full-speed memory, like on the turboR, otherwise the CPU would be slowed down constantly for every instruction fetch and memory access. And that’s where it gets a bit complex, and affects the machine architecture. Only respecting IORQ speed may be a simpler compromise (turbo mod kits do this), though not standard compliant.

So I think improvements in this regard would be interesting, but also not essential at least for me, just I wanted to at least note in response to Manuel’s question that IMO it is not a fully realistic emulation of what a real MSX with turbo CPU would do Smile.

By ppeccin

Champion (375)

ppeccin's picture

24-05-2018, 00:26

Quote:

I wanted to at least note in response to Manuel’s question that IMO it is not a fully realistic emulation of what a real MSX with turbo CPU would do Smile.

Yes, I understood that. Many things in WebMSX are not 100% realistic, at least not when compared to existing HW. The Turbo feature is for sure one of the least realistic. It's there for the fun of playing with it.

But do real machines with Turbo take all necessary actions to slow down access to the VDP, so the same code designed to run at 1x would run at 3x CPU clock and cause no trouble? Automatically?
I really don't know exactly how they do that.

By Grauw

Ascended (9817)

Grauw's picture

24-05-2018, 01:07

All MSX machines with turbo as well as mod kits slow down access to the VDP I/O ports. Without slowdown there are many VRAM corruption issues due to too fast access to 98H. You can test this in openMSX actually. Also I think the speed at which the VDP can accept register I/O requests is limited to not much less than the bus speed, but maybe it’s got more tolerance there.

Most if not all turbo mod kits also slow down the CPU clock for all I/O and not just the VDP, because not all cartridges can handle I/O at higher speeds. Most simply divide the clock fed to the CPU by 2 while IORQ is active (or maybe a little longer?), at least for 7MHz this seems to be sufficient. They do not do anything for MREQ, and feed the high frequency signals & clock to the external bus, so they are not standard bus compliant (and thus still have compatibility problems).

The turboR aligns all I/O to the bus clock, and all memory access to external memory as well, and inserts waits so that the timing complies to the standard bus. Additionally it inserts (rather excessive) waits for VDP I/O if they are less than 27 bus cycles apart. I think it’s a smart design and compatibility is as good as it gets with such a fast CPU, a template for how a proper turbo machine should be designed imo.

I don’t know much about the Panasonic MSX2+ machines. I do know that the V9958 wait function is never used (except for in a Brazilian mod). I think they do add delays to the VDP, maybe I/O too, but I have a feeling they don’t touch MREQ since afaik there’s no separation between internal and external memory in those machines. But because the turbo is fairly modest (1.5x) it’s probably within tolerances that it doesn’t cause compatibility issues, and I think they do supply a 3.58 MHz clock signal.

I think the OCM & Zemmix Neo add no delays at all, afaik its highest turbo mode even feeds a 10 MHz clock to the cartridge slot, something which my Yamaha SFG-05 absolutely does not like (and I’m actually a bit scared of using it with that turbo mode in fear of overclocking it so much that it breaks).

I think a general solution to make all splits designed to run at 1x work on turbo CPUs is not possible. I think the best you could do is add a VDP wait and then just test a lot of games and find the wait amount to make most of their splits work. Maybe that’s how the turboR engineers arrived at their 27 bus cycle wait even though it’s far more than actually needed for the V9938 itself, especially for register access. It doesn’t seem like a particularly great idea to pick a wait value like that though.

By ppeccin

Champion (375)

ppeccin's picture

24-05-2018, 01:48

Thanks for the explanation. Really nice.

Page 4/5
1 | 2 | 3 | | 5