How to solve r#24 and the stupid Y magic value

Página 1/2
| 2

Por PingPong

Prophet (3513)

Imagen del PingPong

27-12-2018, 13:34

I need to do a fast write of 128 bytes are of the sat.
normally having the sat in ram i can only issue a otir or outi x 128 .

however i must account for the r#24 value by adjusting the y sat value according to r24 value to keep sprites in the same y pos.
now i can simply add a offset value to the y value like this:
ld a, (HL)
add a,d // d = offset
out (0x99h),a
outi // X
outi // dummy color byte
outi // pt no.

however, when a+d = magic value sprites disappear so to correct the problem i forced to do something like this:

ld a, (HL)
add a,d
cp magicvalue
jr nz, noadjust
inc a
noadjust:
out (0x99),a

this increases the sat write time by adding cp+jr+inc , about 24 T-States! that multiplied by 32 gives 768 !

anyone know a better method to avoid this stupid bottleneck?

Login sesión o register para postear comentarios

Por Grauw

Ascended (8905)

Imagen del Grauw

27-12-2018, 14:47

There’s an extensive topic on the subject here.

The fastest way is to avoid that line to begin with.

The sprites in my RPG (wip) are set up so that each object can decide for themselves what coordinates to set. Rather than write to a temporary SAT buffer, their DrawSpriteAttributes methods are called in sequence and out(c) the coordinates to the VDP directly.

I mostly use the “only use even y-coordinates” approach (option 2), for the main player sprite and rain effect. For for the mana effect I do check for line 216 (option 1) because it moves so slowly, though alternatively I could’ve only made them spawn at positions where they would never reach line 216.

Por PingPong

Prophet (3513)

Imagen del PingPong

28-12-2018, 00:14

thx grauw but i've seen this post and the workarounds are worst than my solution. they require some extra logic or to double the patterns ( that is already a quite scarse and valuable resource) or to use the cpu to move around data.
Worse than on the fly adjusting the y value during sat upload imho.
so my best alternative is to use y odd coordinates and steps of 2 pixels for scrolling and sprite movement.

It's incredible how such a stupid feature is hard to workaround and give al lot of headache to be addressed.

Por Grauw

Ascended (8905)

Imagen del Grauw

28-12-2018, 00:47

PingPong wrote:

thx grauw but i've seen this post and the workarounds are worst than my solution.

It all depends on your situation, what resources you have available, how your scrolling is done, what visual compromises you can accept, and how you can flex your game design. The thread offers different angles to approach the problem. If the first option is the only feasible for you of the ones mentioned there, then I don’t have any other ideas.

Because the possible solutions for this problem are tied so much into the game design, how you scroll, etc., it may help for you to elaborate a bit on that, then maybe some solution custom tailored for your specific case could come to mind.

p.s. I said “only use even y-coordinates” because sprites are y-offset by one, so they align to even screen lines. The coordinates themselves must of course then actually be odd.

p.p.s. The comparison + relative jump overhead is 21 cycles (no matter which branch is taken).

Por PingPong

Prophet (3513)

Imagen del PingPong

28-12-2018, 00:53

well, all the solutions require you pay too much for the kind of problem i need to address.
for example,
-a custom pattern require to sacrifice a line of sprite and not to mention the need to check of the y situation that is a kind of check i wished to avoid. plus feasible only on main charater.
-blitting a custom pattern require 16*18 T-States, a waste of time similar to what i'm trying to avoid
-using adjust registers involve using the vdp to make a kind of copy operation, worse than ever, you sacrifice computational power ..... and you have shaking borders..... :-( :-( :-(
...
the only viable solution is the y even/odd coordinate trick.

someone knows how this is handled on games like zanac, aleste, space manbow etc. ?

Por Grauw

Ascended (8905)

Imagen del Grauw

28-12-2018, 03:39

It goes down to 18 cycles (592 total) if you store the value 216 in a register:

    ld e,216
...
    ld a,(hl)
    add a,d
    cp e
    jr nz,noadjust
    inc a
noadjust:
    out (98h),a

Do it in 15 cycles (488 total) by changing the behaviour a bit, offsetting by 1 between lines 216-255:

    ld e,216
...
    ld a,(hl)
    add a,d
    sub e
    ccf
    adc a,e
    out (98h),a

Por Grauw

Ascended (8905)

Imagen del Grauw

28-12-2018, 01:45

PingPong wrote:

(3) -blitting a custom pattern require 16*18 T-States, a waste of time similar to what i'm trying to avoid

You can just select a different pattern index right? You have 64 available. But this option is more suitable if you want to avoid the visual disturbance rather than optimal performance.

PingPong wrote:

(4) -using adjust registers involve using the vdp to make a kind of copy operation, worse than ever, you sacrifice computational power ..... and you have shaking borders..... :-( :-( :-(

You can scroll 39 lines without line 217 ever coming into view, without shaking borders. And if the name table is already repainted every once in a while anyway (e.g. for animations or horizontal scrolling), you might as well do it scrolling vertically without losing frame budget.

Not to say that makes these useful for you, just that they can be viable in certain situations Smile.

PingPong wrote:

someone knows how this is handled on games like zanac, aleste, space manbow etc. ?

I think most of them probably just offset by one (option 1). But if Space Manbow does vertical scrolling the same way it does horizontal scrolling, it just avoids the line coming into view entirely (option 4). Good example of what I described above.

Por Grauw

Ascended (8905)

Imagen del Grauw

28-12-2018, 04:26

Down to 13 cycles (454 total) if you calculate the pre-offset line in advance:

    ld a,216
    sub d
    ld e,a
    ld a,d
    add a,e
    inc a
    ld d,a
...
    ld a,(hl)
    sub e
    sub 1
    adc a,d
    out (98h),a

Additionally, if you are using sprite mode 2, the fourth byte of each SAT entry is ignored so you can just OUT whatever, which saves 6 cycles each loop:

...
    inc hl       ; align buffer to use inc l
    outi
    outi
    out (98h),a

If that’s writing too fast for the V9938, you can put the ld a,(hl) of the next entry before the out (98h),a to introduce a delay. Probably a good idea.

And of course it may be even faster to skip this temporary buffer entirely and just out these values straight from the sprite update code that populates it.

Also note that if sprites have the same y position (e.g. OR-sprites), you really only need to do the offset and line 216 check once. And if certain sprites move in 2 pixel increments (e.g. the player sprite) it doesn’t need to do the line 216 check at all. So it may be faster to do this in the sprite update code.

Por bore

Expert (116)

Imagen del bore

28-12-2018, 09:22

Grauw wrote:

p.p.s. The comparison + relative jump overhead is 21 cycles (no matter which branch is taken).

Is this really desired?
I would think that the case with 216 is "rare".

Wouldn't it be better to optimize for the case where no adjustment is necessary?

    ld e,216
...
    ld a,(hl)
    add a,d
    cp e
    jr z,adjust
noadjust:
    out (98h),a
...
adjust:
    inc a
    jp noadjust

It costs 21 extra cycles when the match happens but saves 5 cycles for every sprite that doesn't need to be adjusted.
If you are looping through the sprites you might be able to duplicate the loop counter into the match-case and save some cycles there but that is besides the point.

Por PingPong

Prophet (3513)

Imagen del PingPong

28-12-2018, 10:21

Quote:
Quote:
Grauw wrote:

It goes down to 18 cycles (592 total) if you store the value 216 in a register:

    ld e,216
...
    ld a,(hl)
    add a,d
    cp e
    jr nz,noadjust
    inc a
noadjust:
    out (98h),a

Do it in 15 cycles (488 total) by changing the behaviour a bit, offsetting by 1 between lines 216-255:

Thanks i already do it, the code posted was only using cp n for clarity. (and i'm running out of registers).

Quote:
    ld e,216
...
    ld a,(hl)
    add a,d
    sub e
    ccf
    adc a,e
    out (98h),a

thx, Nice trick!

Por PingPong

Prophet (3513)

Imagen del PingPong

28-12-2018, 10:24

Quote:
Grauw wrote:
PingPong wrote:

(3) -blitting a custom pattern require 16*18 T-States, a waste of time similar to what i'm trying to avoid

You can just select a different pattern index right? You have 64 available. But this option is more suitable if you want to avoid the visual disturbance rather than optimal performance.

PingPong wrote:

(4) -using adjust registers involve using the vdp to make a kind of copy operation, worse than ever, you sacrifice computational power ..... and you have shaking borders..... :-( :-( :-(

You can scroll 39 lines without line 217 ever coming into view, without shaking borders. And if the name table is already repainted every once in a while anyway (e.g. for animations or horizontal scrolling), you might as well do it scrolling vertically without losing frame budget.

Not to say that makes these useful for you, just that they can be viable in certain situations Smile.

PingPong wrote:

someone knows how this is handled on games like zanac, aleste, space manbow etc. ?

I think most of them probably just offset by one (option 1). But if Space Manbow does vertical scrolling the same way it does horizontal scrolling, it just avoids the line coming into view entirely (option 4). Good example of what I described above.

Selecting a different pattern index, as you said, require special handling and a little computational power.
Umh, sorry i forgot to mention that is a screen 5 mode. no name table.

Página 1/2
| 2