wait 15 T-States between reading and writing to Vram

Page 5/6
1 | 2 | 3 | 4 | | 6

By Bengalack

Hero (578)

Bengalack's picture

31-01-2022, 19:31

Grauw wrote:

It seems like it may be caused by the interplay between mixed OUTs and INs to the VRAM transfer port. I think there is not much information known about how these relate to each other timing-wise, how they’re handled by the VDP precisely.

Right. Actually I asked for this earlier, but no one answered Smile

Grauw wrote:

Can you still reproduce the issue if you replace the INs by OUTs?

Let me test and come back on this. Anyways, my plan was to, at some point in time, set up a simple, isolated test case, so we can see this clearer.

Grauw wrote:

Also, are you certain the transfer always starts and ends during vertical blanking? It’s quite sensitive to timing, if it’s occasionally pushed out of the vertical blank period by a long music player frame on interrupt handler, or interrupts that are kept disabled for too long on the main loop, that would cause issues.

Yes. I am certain.

By Bengalack

Hero (578)

Bengalack's picture

31-01-2022, 20:03

Bengalack wrote:
Grauw wrote:

Can you still reproduce the issue if you replace the INs by OUTs?

Let me test and come back on this

Ok, just verified that OUT behaves exactly like IN, in my case.

By hit9918

Prophet (2923)

hit9918's picture

31-01-2022, 20:51

The VDP has a byte buffer. That byte is still waiting to get written to vram. And then when you do a quick IN. There is no extra logic or state bits that say "oh, I gonna memorize this extra request for later". The plan of the things that are waiting is getting destroyed.
So even when you dont care that the IN is reading trash. Still something has been destroyed.

By hit9918

Prophet (2923)

hit9918's picture

31-01-2022, 21:11

The question is whether an IN puts the VDP into read mode. Maybe it does not go into read mode but just do an address register increment.
Then you can do it this way

outi
outi
nop
in
in
outi
outi
nop
in
in

Then maybe the IN gets the value from the byte buffer, the recently written OUT.

By Bengalack

Hero (578)

Bengalack's picture

31-01-2022, 21:21

hit9918 wrote:

A copy of the SAT takes 3.2% cpu. And then with the two IN it is 16% faster. You spend huge effort in trying to save 16% of 3.2% cpu.
So, this thread practicaly is saying the typical "the MSX VDP is slow", for something that took 0.5% cpu... But the time is not taken by the VDP but by some unknown code.

I see your point wrt to the numbers... But I guess, people are different? Smile

Three things:
1. I just question why this doesn't work, when documentation says it should
2. Some of us are obsessed with slashing cycles (can't help it Smile)
3. If I can help getting the emulator better, I will try

If my post could be read as "VDP is slow", I'm sorry, it was not the intention.

By Bengalack

Hero (578)

Bengalack's picture

31-01-2022, 21:36

hit9918 wrote:

The question is whether an IN puts the VDP into read mode. Maybe it does not go into read mode but just do an address register increment.
Then you can do it this way

outi
outi
nop
in
in
outi
outi
nop
in
in

Then maybe the IN gets the value from the byte buffer, the recently written OUT.

I tested on a physical a1-wsx. does not work, like the others.

I can add: if I add a NOP (or any 5-cycle op) before the second in as well, then things work. And of course, this is the crux the whole thing. Supposedly, these two extra ops are not supposed to be required.

By hit9918

Prophet (2923)

hit9918's picture

31-01-2022, 22:10

Bengalack wrote:

I see your point wrt to the numbers... But I guess, people are different? Smile

Three things:
1. I just question why this doesn't work, when documentation says it should
2. Some of us are obsessed with slashing cycles (can't help it Smile)
3. If I can help getting the emulator better, I will try

If my post could be read as "VDP is slow", I'm sorry, it was not the intention.

Some things came across wrong. I say the MSX can do it. While the 2 byte RAM SAT is saying that there is no time to copy the 3rd byte, the P byte for pattern animation.

By Bengalack

Hero (578)

Bengalack's picture

01-02-2022, 20:37

I've made a simple program that shows the behaviour. I used C (sdcc) and fusion-c to get the setup done with a few lines only. I also targeted a hook directly on 0x38 to remove any doubts to where the cycles go. The normal build-script in fusion-c makes a dos/com-file, ie. ram is already present in page 0 when you run it.

I put .c/.com-files here: https://drive.google.com/drive/folders/1ByJARr_sqUUB4XKAImWE...

What it does:
* It sets screen 4, page 0, 16x16 pix sprites, white background. 32 sprite patterns are prepared. They are different, but only pattern 0 is used in sprite attribute table, and it looks like a "full block". All 32 sprites are initially placed in position (0,0) and made black.
* The interrupt routine takes the first 8 sprites and attempts to put them at x=75, but starting on top of screen and then every 16 pixel downwards. The interrupt routine can easily swap between IN and OUT to advance the vram-address-pointer, but none gives better result than the other.

Result: It looks different on physical machine than in emulator, and hence I question the no/minimum delay when in vblank.

Here is the c-file with inline asm:

#include "fusion-c/header/msx_fusion.h"
#include "fusion-c/header/vdp_sprites.h"

#define COLOR_BLACK                             0x01
#define COLOR_WHITE                             0x0F

#define SCR4_PAGE0_SPRITE_ATTR_TABLE_ADDRESS    0x1E00
#define SCR4_PAGE0_SPRITE_COLOR_TABLE_ADDRESS   (SCR4_PAGE0_SPRITE_ATTR_TABLE_ADDRESS-512)
#define SCR4_PAGE0_SPRITE_PATTERN_TABLE_ADDRESS 0x3800

#define SPRITES_NUM                             32
#define SPRITE_PATTERN_BYTES                    32
#define SPRITE_COLOR_BYTES                      16
#define SPRITE_ATTR_ENTRY_LEN                   4

#define VDPIO                                   0x98
#define VDPPORT1                                0x99

const unsigned char yx_array[] = { 0, 75, 16, 75, 32, 75, 48, 75, 64, 75, 80, 75, 96, 75, 112, 75 };

// ---------------------------------
void putFirst8SpritesAtPos()
{
__asm

.macro macroSetVdpWrite ; writeaddress in AHL
    rlc     h
    rla
    rlc     h
    rla
    srl     h
    srl     h

    out     ( VDPPORT1 ), a     ; // set bits 15-17
    ld      a,#14 | #0x80       ; // sets write bit
    out     ( VDPPORT1 ), a

    ld      a, l                ; // set bits 0-7
    out     ( VDPPORT1 ), a
    ld      a, h          ; // set bits 8-14

    or      #64           ; // + write access
    out     ( VDPPORT1 ), a       
.endm

.macro macroWriteSATEntry
    outi
    outi
    out     ( VDPIO ), a
    out     ( VDPIO ), a

    ; in a, ( VDPIO )
    ; in a, ( VDPIO )
.endm

    xor     a
    ld      hl, #SCR4_PAGE0_SPRITE_ATTR_TABLE_ADDRESS
    macroSetVdpWrite

    xor     a
    ld      c, #VDPIO
    ld      hl, #_yx_array
    macroWriteSATEntry
    macroWriteSATEntry
    macroWriteSATEntry
    macroWriteSATEntry

    macroWriteSATEntry
    macroWriteSATEntry
    macroWriteSATEntry
    macroWriteSATEntry

__endasm;
}

// ---------------------------------
void myInterrupt() __naked
{
__asm

    push    af
    push    bc
    push    hl
    
    xor     a                       ; // read stats register 0 to make sure further processing happens
    out     ( VDPPORT1 ), a         ; // status register number
    ld      a, #0x8F                ; // VDP register R#15
    out     ( VDPPORT1 ), a         ; // out VDP register number
    in      a, ( VDPPORT1 )         ; // read VDP S#0
    
    call    _putFirst8SpritesAtPos  ; // our "main routine"
    
    pop     hl
    pop     bc 
    pop     af

    ei
    ret
    
__endasm;
}

// --------------------------------------
// MSX2 only. Assumes default palette.
// Interrupt is hi-jacked, so you need to
// reset after this to re-gain control
// --------------------------------------
void main(void) 
{
    SetColors( COLOR_BLACK, COLOR_WHITE, COLOR_WHITE );     // white background
    Screen( 4 );
    SetDisplayPage( 0 );                                    // prolly default, but anyway
    SetActivePage( 0 );                                     // prolly default, but anyway
    Sprite16();
    
    unsigned char i;
    
    for( i=0;i<SPRITES_NUM;i++ ) // Define 32 sprite patterns uniquely. Index/pattern #0 is a full, square block (16x16 pix)
        FillVram( SCR4_PAGE0_SPRITE_PATTERN_TABLE_ADDRESS+i*SPRITE_PATTERN_BYTES, 0xFF-i, SPRITE_PATTERN_BYTES );
    
    for( i=0;i<SPRITES_NUM;i++ ) // Set Color for all 16 lines in all 32 sprites to Black
        FillVram( SCR4_PAGE0_SPRITE_COLOR_TABLE_ADDRESS+i*SPRITE_COLOR_BYTES, COLOR_BLACK, SPRITE_COLOR_BYTES );
    
    for( i=0;i<SPRITES_NUM;i++ ) // Put all 32 sprites at pos (0,0), using pattern 0. "unused"-attribute is also set to 0
        FillVram( SCR4_PAGE0_SPRITE_ATTR_TABLE_ADDRESS+i*SPRITE_ATTR_ENTRY_LEN, 0, SPRITE_ATTR_ENTRY_LEN );

__asm
    di                                  
    ld      a, #0xC3                        ; // "jp" opcode
    ld      hl, #_myInterrupt
    ld      ( 0x0038 ), a                   ; // Short-circuit the system: Hi-jack the interrupt all-together
    ld      ( 0x0039 ), hl                  ; // to remove any doubts that we run our code in vblank
    ei
__endasm;

    while( 1 )                              // Loop forever
        Halt();
    
}

Edited twice: "ld a, #0x0F" was changed to "ld a, #0x8F ; // VDP register R#15" as I fooled around with both values, and made a mistake initially.

By Bengalack

Hero (578)

Bengalack's picture

01-02-2022, 19:41

Here are some images of the results:

openmsx17 - and the way the code was intended to behave:

Real a1-wsx, "with two out-commands", (one other run gave a different output):

Real a1-wsx, "with two in-commands":

Real a1-wsx, "with two in-commands", another run:

Real svi-738, MSX2, "with two out-commands" (blinking):

Real svi-738, MSX2, "with two in-commands" (blinking):

By Grauw

Ascended (10581)

Grauw's picture

02-02-2022, 00:03

Great work, can you create an openMSX ticket for this on GitHub as well?

If you change the out (VDPIO),a to out (c),a, giving it two extra cycles, does the problem disappear? Just to give a ballpark idea of how big the timing error is.

As a confirmation, in the openMSX debugger I hacked in a change where the border colour is set to red during the I/O, and to blue otherwise, and indeed it happens neatly after the blanking starts. (Could be a nice addition to the test.)

Page 5/6
1 | 2 | 3 | 4 | | 6