Grauw’s RPG in development

Страница 16/25
9 | 10 | 11 | 12 | 13 | 14 | 15 | | 17 | 18 | 19 | 20 | 21

By Manuel

Ascended (18719)

Аватар пользователя Manuel

10-08-2019, 15:49

I apologize.

I guess I got frustrated seeing so many unfinished projects. I had no right to say that spoiling your fun.

By Grauw

Ascended (10564)

Аватар пользователя Grauw

10-08-2019, 15:54

Thanks, appreciated Smile.

I definitely know what you mean, so that’s why despite having a few too many projects that I’m working on, I try to get at least something out of the door every now and then. Through for this one it’s tricky (more work than I thought), but hopefully a development topic like this one is also entertaining.

By ToriHino

Paladin (759)

Аватар пользователя ToriHino

10-08-2019, 16:20

The drop shadow indeed improves grounding of the character (if that's even a term Tongue ). The flickering too get translucency is not ideal, I think it will get really annoying when playing for a longer time.

And good to hear you picked up working on this again!

By Grauw

Ascended (10564)

Аватар пользователя Grauw

10-08-2019, 16:22

I agree the drop shadow is important for grounding, so I should just make one which doesn’t flicker.

By Grauw

Ascended (10564)

Аватар пользователя Grauw

10-08-2019, 16:51

In other news, the past couple of days I also experimented with using the CPU to draw the tiles rather than VDP copy commands.

Background info: while scrolling I draw a 256x4 row or a 4x256 column of tiles in the borders of the screen. The display area is 243x248, so I have 8 pixels of border on the right and 13 pixels of border at the bottom. The scroll is 2 pixels at a time at 60 fps, and internally the game engine ticks at 30 fps, so this is why I draw 4 pixel wide strips. Drawing one 256x4 row of tiles with VDP commands takes roughly 5 ms. The CPU is waiting for the VDP for at least part of that time.

I had been wondering how fast it would be if I drew with the CPU, so I made an experiment with that. It’s not finished but representative. Turns out that it takes about the same amount of time, also roughly 5 ms. The OUTIs themselves outpace the VDP, but including the tile lookups it ends up around the same.

Using the CPU has the benefits that 1. it frees up the VDP to execute other tasks like tile animations, 2. I could use an infinite number of tiles rather than the 256 that fit in one screen page, and 3. I don’t need to keep the tile set in VRAM.

The first point is relevant to make it easier for me to use the VDP for other things. Although the VDP is far from fully busy, it is challenging to dispatch commands to the VDP with the right timing so that the CPU doesn’t have to wait for it, so any extra breathing room is still welcome.

The third point is relevant because I’m having an internal struggle whether I should use screen 11. Then I wouldn’t have room in VRAM for the tile set. I keep considering it because the 16 colour palette is restricting, in screen 11 you can make more colourful graphics, nice 15-bit RGB too. It would show off the MSX2+ better, however it also makes things harder and breaks palette effects.

I also got some numbers on the performance difference between using a 32K screen mode (screen 5-6) and a 64K mode (screen 7-12). Currently the scrolling code takes about 33% of the frame time, when I change to a 64K mode, drawing twice the amount of bytes, this increases to 50%, in both the VDP and CPU drawing cases.

For now I’m going to stick with screen 5 and VDP copies for drawing tiles, it seems the more time efficient option at this point, and also less restricting in terms of VRAM usage and performance. Learning to use the 16 colours of screen 5 optimally is also important after all. But it’s good to know that it performs about the same, this knowledge can be useful in the future.

By Grauw

Ascended (10564)

Аватар пользователя Grauw

10-08-2019, 16:53

Lastly I have been thinking about the dialogue boxes that pop up as you walk on a trigger. Currently their height fits 3 lines of text, which is what I want for dialogues. However the 232x36 background clear VDP command is a bit heavy, making things a bit more difficult with the budgeting of frame time. Although it works right now it could get in the way of implementing tile animations, and then I would have to do complicated things like reducing or delaying the animations when the dialogue pops up.

So I’ve been thinking; what if I make these dialogue boxes just high enough to fit one line of text, and then if there’s additional text you have to press a button and it will grow to its full size for the continuation of the dialogue, while locking the player in place like RPGs usually do. When the player can’t move I got lots of CPU and VDP time due to not having to scroll, and with a smaller box while moving the clear will be 3x smaller.

Since it takes a couple of seconds to draw 3 lines of text anyway, just a single line is also more suited for quick reading while passing by.

By DarkSchneider

Paladin (942)

Аватар пользователя DarkSchneider

10-08-2019, 17:16

Grauw wrote:

I also got some numbers on the performance difference between using a 32K screen mode (screen 5-6) and a 64K mode (screen 7-12). Currently the scrolling code takes about 33% of the frame time, when I change to a 64K mode, drawing twice the amount of bytes, this increases to 50%, in both the VDP and CPU drawing cases.

That give us an idea about overhead, AKA the time that takes the commands execution themselves (CPU setting the parameters).

If you are going to use CPU, please could you try this approach?: Create buffers in RAM of the size you update borders (are updated partially when moving?). Then fill them using the CPU, this is, using the "copy" command of the CPU (unrolled LDI/LDD). Then copy it from RAM to VRAM with a single operation each using HMMC (High speed move CPU to VRAM).

It would be interesting to know if working with CPU over RAM and reducing the number of commands to execute (or setting the VRAM destination registers) compensates or not. On faster CPU probably, but what about on Z80A?

By Grauw

Ascended (10564)

Аватар пользователя Grauw

10-08-2019, 18:02

DarkSchneider wrote:

If you are going to use CPU, please could you try this approach?: Create buffers in RAM of the size you update borders (are updated partially when moving?). Then fill them using the CPU, this is, using the "copy" command of the CPU (unrolled LDI/LDD). Then copy it from RAM to VRAM with a single operation each using HMMC (High speed move CPU to VRAM).

It would be interesting to know if working with CPU over RAM and reducing the number of commands to execute (or setting the VRAM destination registers) compensates or not. On faster CPU probably, but what about on Z80A?

I didn’t try it, but I can do some maths;

I would need to replace the OUTIs with LDIs, which take the same amount of time. I would only have to set the VRAM address once instead of 64 times, one set takes 48 cycles so this saves 3024 cycles. However I would also need to add a 512x OUTI to flush the buffer to VRAM. This takes 9216 cycles. So it doesn’t seem there is a performance win.

I think the RAM buffer is mostly useful when you need pure random access where not every value is updated.

For context, the code looks something like this:

; de = vram start address
; ix = pointer to first tile
MapView_DrawTileRow:
    set 6,d
    res 7,d
    ld bc,16 << 8 | 98h
Loop:
    ld l,(ix)  ; get tile data bank & address
    inc ix
    ld h,(ix)
    inc ix
    ld a,(hl)
    ld (ROM_BANK_A000_CHANGE),a
    inc hl
    ld a,(hl)
    add a,iyl  ; offset y
    inc hl
    ld h,(hl)
    ld l,a

    res 7,e    ; 1st tile line
    res 0,d
    inc c
    di
    out (c),e
    ei
    out (c),d
    dec c
    REPT 8
    outi
    ENDM

    set 7,e    ; 2nd tile line
    inc c
    di
    out (c),e
    ei
    out (c),d
    dec c
    REPT 8
    outi
    ENDM

    res 7,e    ; 3rd tile line
    set 0,d
    inc c
    di
    out (c),e
    ei
    out (c),d
    dec c
    REPT 8
    outi
    ENDM

    set 7,e    ; 4th tile line
    inc c
    di
    out (c),e
    ei
    out (c),d
    dec c
    REPT 8
    outi
    ENDM

    ld a,e     ; x += 16
    rlca
    add a,16
    rrca
    ld e,a
    djnz Loop
    ret

p.s. For drawing the column I would set up a 4x256 HMMC and wouldn’t need to re-set the VRAM address pointer all the time, however then I need 16 add hl,nn’s for every tile (unless I duplicate all the tiles in ROM with a rotated version), so it performs about the same.

By Grauw

Ascended (10564)

Аватар пользователя Grauw

11-08-2019, 01:25

I added a tile animation if you walk a bit to the right in the WebMSX demo here.

I really like the potential of this, it can add a lot of life to the environment! Add in some particle effects too, and it can become really cool! Imagine a campfire made with tile animations, and then a smoke particle effect billowing from it which the player can walk behind.

It’s currently an object evaluated continuously, so I can’t have too many of them. To really scale this up (think water animation), it would need a pool of animator units, activating animations when they come into view. I could add “sleep” and “wake” events to tiles, but worst case I would need to call 31 of them in a frame, which is a bit much.

I wonder, should I model these as objects or as tiles. Is the tile map a dumb thing and are all “live” objects in the world done with objects drawn on top of it, or do I give tiles a lot of power and they modify themselves? I think there are a lot of games which do the former, maybe it’s simpler like that.

Additionally performance kinda tanks when I have several animating tiles, because the CPU has to wait for the VDP. There’s quite some spare VDP time, but the challenge is to utilise it fully. I reckon it’s going to need some kind of draw queue to submit to, but unfortunately the V9958 does not have a command completion interrupt.

So I wonder what is the best way. I have two ideas: 1. litter non-blocking draw-from-queue calls throughout the code, or 2. make a lot of line splits (let’s say every 16 lines) and call draw-from-queue there. What do you think?

Either approach will benefit from moving tile drawing to the CPU btw.

By DarkSchneider

Paladin (942)

Аватар пользователя DarkSchneider

11-08-2019, 10:22

By points:

- Dialogue box: stop all the other drawing while appearing, nobody will bother, and it is something natural.

- Objects or tiles: if you want something specific for this game only, easy and direct to handle, tiles. In the other hand, if you plan to extend it on the future, or want something more flexible, objects. I'd go for objects directly.

- Command queue: as it is VDP time, use VDP sync way, this is, line interrupts. If you could measure the time required to copy a tile by the VDP in lines while rendering the display (this is, on visible area), you already have the number of lines between calls. Notice that this could be measured at start, like while showing logo or start screen. Execute a copy tile command i.e. at line 0 (so the VDP is drawing the screen) with sprites on and the other stuff you do (maybe copying with CPU at the same time), and then starting from low value check the finished command flag after that value of lines, then increase it until get the finish flag, and add 1 for safety. It is something like the checking CPU speed for PSG samples you mentioned.

And concerning CPU copy to VRAM, can be OUTI in a row without glitches? I remember a recent post about having glitches when reading or writing to VRAM. What about with turbo mode?
I have been revising the documentation, and is not clear at all.
On https://www.konamiman.com/msx/msx2th/th-4a.txt at point 2.2 section 4 talks about "access of continuous memory in VRAM".
At point 1.3 it says

Quote:

It is generally recommended that BIOS be used for I/O operations for purposes of compatibility. However, the screen display often requires high speed, so these I/O ports are capable of accessing MSX-VIDEO directly.

Well, but on BIOS we have:

Quote:

SETRD (0050H) *1
Function: sets VRAM address to VDP and enables it to be read. This is used to read data from the sequential VRAM area by using the address auto-increment function of VDP. This enables faster readout than using RDVRM in a loop. This is for TMS9918, so only the 14 low order bits of VRAM address are valid. To use all bits, call NSETRD.

And this is for MSX1! And we already know in all MSX1 we have glitches when using this method. ??
Also on MSX-Video manual it talks about timmings, but we don't know the CPU used on the machine or any other thing. So that is really for technical and manufacturers than for programmers.
There is no mention at all about rendering the screen concerns to the CPU access to VRAM, or what we know now as "memory slots" (not mentioned on documentation).
Here https://www.konamiman.com/msx/msx2th/th-4b.txt at point 6.6 we only have:

Quote:

MSX-VIDEO performs various screen management duties in addition to executing the specified commands. Sometimes the command execution speed seems to be a bit slow because of this. Thus, by discarding these operations, the speed of the command executions can be made faster. This can be done using the following method.

This is, only for VDP commands. As the CPU is supposed to have priority, and the VDP has the finish flag.

How the hell is designed this computer?, that you don't know the result you will have even following the documentation.

Страница 16/25
9 | 10 | 11 | 12 | 13 | 14 | 15 | | 17 | 18 | 19 | 20 | 21