[VDP] V9938... Blinking "Text" in Screen 7...

Page 3/3
1 | 2 |

By NYYRIKKI

Enlighted (5366)

NYYRIKKI's picture

16-09-2019, 14:01

ducasp wrote:

Now I can go back and resume working on this blinking feature Tongue (I just needed page flipping, but you know, if you are going to paint a room, why not paint the whole house? oO )

Well... You said that not me. Smile

So, let's talk a bit about how could we improve the performance of this display driver...

I think you know this problem very well, but I open it up a bit for possible other people interested about the subject:

As you may imagine writing text to screen is typically very small routine that has quite a tight loop calling the print character routine. Most of the time is therefore used for actual outputting of the characters and checking all kinds of things like "Should this character be sent to printer?" in a tight loop while rest of the program does nothing. You can see the slow down if you for example compare "DIR" and "DIR >NUL" speed with each other. There is quite a remarkable speed difference.

When you write one 6*8 character to screen in SCREEN 7 directly this means you need to do minimal 2 byte I/O for address setup and 3 byte I/O for data transfer for each line. This means printing one character 40 I/O operations. If you use HMMC I believe you get very similar results: Minimum 15 setup + 24 data = 39 I/O operations (I believe).

Thinking direct access this means that only 60% of the bandwidth to VDP is actual data. If we could print ie. 2 characters at a time we could go up to 75% and if we could print 4 characters the "useful data" would be around 85%. With HMMC the benefit would be pretty similar.

So what could be done to improve the situation? -> Do the buffering

If the actual print routine would not do much more than calculate the line on screen, detect "CLS" and buffer the characters to buffer, it could be made really fast. This would have the benefit that stuff like mentioned DIR could execute much faster.

The downside is that it still needs to be drawn to screen and that would mean an interrupt driven screen handler. The good thing is anyway that it could draw much more at a time and sometimes it might skip parts of the input buffer completely... Say there is stuff that is not drawn on screen, but "CLS" clears the screen, you only need to worry about the CLS part and stuff after that... Or ie. the DIR-command has output more lines to screen that fits in there, you don't need to draw the characters that have scrolled off the screen already before screen is updated.

This would have benefit for the blinking stuff also... Let's say that there is lot of stuff in the buffer, so before returning from screen update you handle ie. 3 last lines of text from the buffer that don't have blinking. Then before you give CPU back to real program you put VDP command engine to copy this 480*24 screen block to another page. You get multitasking and when blinking characters are needed you are much more ready to put the other page in use as it is already at least partially up to date.

Naturally the down side of all this would be that you would get screen tearing during fast output, you would need to keep track what has been drawn, what needs to be drawn etc. and all of this would mean allocating RAM for buffers and huge amount of code to all kinds of book keeping stuff.

... but why not paint the whole house? Tongue

By ducasp

Master (147)

ducasp's picture

17-09-2019, 01:30

This is surely something to keep in mind, but as this implementation started and is meant as an ANSI escape code animation renderer, I don't think that discarding screen information is something really nice to do, so this is something I really wouldn't like to implement as i.e.: telnet cliente and network adapter can download the entire HISPAMSX boot animation in less than 1 second, so this would result in a very rough looking piece of animation (with no animation at all Tongue ), but buffering can bring considerable speed-ups when printing, but I believe this is something I would do on the string printing functionality and leave the interrupt / buffering to the application using the library... Lots of stuff to think, for sure! Cool

By NYYRIKKI

Enlighted (5366)

NYYRIKKI's picture

17-09-2019, 14:50

ducasp wrote:

This is surely something to keep in mind, but as this implementation started and is meant as an ANSI escape code animation renderer, I don't think that discarding screen information is something really nice to do, so this is something I really wouldn't like to implement as i.e.: telnet cliente and network adapter can download the entire HISPAMSX boot animation in less than 1 second, so this would result in a very rough looking piece of animation (with no animation at all Tongue ), but buffering can bring considerable speed-ups when printing, but I believe this is something I would do on the string printing functionality and leave the interrupt / buffering to the application using the library... Lots of stuff to think, for sure! Cool

Yes, in this particular case it might not be wanted behavior, but I can how ever imagine that this kind of functionality would be more than welcome for purposes like if you want to compile your asm sources with Devpac2 or something similar (Been there, suffered that. Smile )

Considering how hard this kind of buffering would be to implement, disabling such functionality would be super easy... ie. from BIOS point of view: Remove "partial flush buffer" from H.TIMI, remove "flush buffer" from H.PINL and then add "flush buffer" after "write char to buffer" call. From developer point of view it would be ideal to just have this kind of public calls to the library as well as "write string to buffer" call and some parameters to adjust row count of "partial flush buffer" in the library. Naturally placing these calls to hooks or using them directly should be left for the application developer to consider.

By hit9918

Prophet (2866)

hit9918's picture

17-09-2019, 15:46

NYYRIKKI wrote:

When you write one 6*8 character to screen in SCREEN 7 directly this means you need to do minimal 2 byte I/O for address setup and 3 byte I/O for data transfer for each line. This means printing one character 40 I/O operations. If you use HMMC I believe you get very similar results: Minimum 15 setup + 24 data = 39 I/O operations (I believe).

but to get such a nice simple copy of the bytes, the font needs to have the RIGHT COLOR! and with 16 foreground and 16 background colors you would need 256 fonts!

the only way I see is that for every letter there is a cache value that tells which color it is.
if the color changed then the slow code got to render the letter again in a new color.
but if the letters color is same as the current color then it goes with the DREAM of a direct copy of the letter.

further, the fastest is to have the font in vram because of the PARALLELISM while the HMMM is going, the cpu has time for all the rest.

but for the blinking feature you need two pages. and then with the vertical scroll register they roll into the font.
so, set the whole font cache to invalid values and set the font base to a different vram location.
the printing that comes after that will recalculate the font cache.

use a 16bit word for the color cache word. then you can store the "invalid" value. and as a bonus the code is prepared for the future. for some truecolor and dithering.

and about a buffered console: I wonder whether the jumpy UX console is the right thing. it is made for UX where you always somehow end up getting a pile of text printed and CTRL-C does not work!
it is SKIPPING some rendering work. no good for animation and no good for the DIR command.

By NYYRIKKI

Enlighted (5366)

NYYRIKKI's picture

17-09-2019, 23:26

hit9918: I don't think you got my point.
I don't believe in font caching in VRAM. I still believe reserving all 128kB only for display data is better idea.
Doing full color output does not need to be very CPU intensive if you do a little precalc. You need some delay anyway between I/O, so here is one rough, untested idea how it could be done:


	; I would prepare the used colors like:
BYTE1	DB B_COLOR * 17
BYTE2	DB ( F_COLOR XOR B_COLOR ) * 17

	;... and I would make table like:

	ORG #C000
TABLE:  ; (#C000 - #C0FF)
	REPT 64
	DB (($ AND 128)/128)*#F0 + (($ AND 64)/64)*#0F
	DB (($ AND 32 )/32 )*#F0 + (($ AND 16)/16)*#0F
	DB (($ AND 8  )/8  )*#F0 + (($ AND 4 )/4 )*#0F
	DB 0
	ENDR


	; ... then when outputting the font row to VRAM I would do something like:

OUTPUT_CHR_ROW:

	; A = bitmask of the font row like inside BIOS (bits 0 & 1 == 0)

	LD DE,(BYTE1)
	LD H,TABLE/256
	LD L,A
	LD A,(HL)
	AND D
	XOR E
	OUT (#98),A
	INC L
	LD A,(HL)
	AND D
	XOR E
	OUT (#98),A
	INC L
	LD A,(HL)
	AND D
	XOR E
	OUT (#98),A
	RET

By hit9918

Prophet (2866)

hit9918's picture

18-09-2019, 16:47

NYYRIKKI. yesterday you talked about removing the port 99 setup and HMMC setup overheads.
while today you post a code that is slower than a simple copy.

and. a console is code in page 3. even a monochrome font takes too much RAM to store there. so you use the system font and that needs a super slow RDSLT !!!

By NYYRIKKI

Enlighted (5366)

NYYRIKKI's picture

19-09-2019, 00:59

hit9918 wrote:

NYYRIKKI. yesterday you talked about removing the port 99 setup and HMMC setup overheads.
while today you post a code that is slower than a simple copy.

In order to do that "simple copy" you would need to keep track of already drawn characters location, ASCII value, front color and background color in order to reuse it in other VRAM position. I'm just guessing here, but I think that seeking correct combinations is slower than uploading new font from RAM. (Remember that you can't hide permanently any part of VRAM due to screen scroll)

... but maybe I just don't get your idea here... Especially this part of your idea I did not get at all:

Quote:

so, set the whole font cache to invalid values and set the font base to a different vram location.
the printing that comes after that will recalculate the font cache.

use a 16bit word for the color cache word. then you can store the "invalid" value. and as a bonus the code is prepared for the future. for some truecolor and dithering.

Quote:

and. a console is code in page 3. even a monochrome font takes too much RAM to store there. so you use the system font and that needs a super slow RDSLT !!!

In last post ducap was talking about telnet client. If you think a bit you will realize that you can't even use MSX system fonts as the ANSI client expects a different looking font in different order. In general I feel you managed to miss my "rough, untested idea" completely and see only the parts you were not supposed to look. I think I should keep the language English while I try to express ideas or thoughts instead of full code. Smile

The point I was trying to make is that the table needs to be on 256-byte border. The monochrome bitmap fonts takes only 2kB in this format, so they fit more that nicely to page 3 and there is still almost 48kB of continuous space below the table for the telnet program. If we talk about final product as display library then yes, you need to move this stuff to another address. I thought this is something that is obvious, but obviously not... Also with OUT (#98),A I also just meant "output to VRAM" and did not necessarily mean only this literal port, it was just an example.

About the idea behind: As a storage format I see no reason why this kind of table or system font bit mask style would be bad. (If you use the AND & XOR to calculate the colors in to separate table, you can switch to OUTI on output.) Especially if you output multiple letters at one go having 1 byte that represents 3 bytes in VRAM seems to me like easy and reasonable way to go since next 3 bytes should be generated from a different letter anyway.

By Manuel

Ascended (15696)

Manuel's picture

21-09-2019, 19:43

Manuel wrote:

Ah, right, this thing: https://github.com/openMSX/openMSX/issues/1091

And thanks to ducasp, openMSX now also emulates that "Cadari" bit :)

By ducasp

Master (147)

ducasp's picture

22-09-2019, 01:34

Manuel wrote:
Manuel wrote:

Ah, right, this thing: https://github.com/openMSX/openMSX/issues/1091

And thanks to ducasp, openMSX now also emulates that "Cadari" bit :)

Wouter also helped with great ideas on how to improve resource usage and a simpler version of my algorithm. B-)

Page 3/3
1 | 2 |