3D raycasting

Pagina 3/16
1 | 2 | | 4 | 5 | 6 | 7 | 8

Van NYYRIKKI

Enlighted (5745)

afbeelding van NYYRIKKI

27-04-2011, 20:25

When I started to optimize the routine I wrote, I actually ended up to exactly same as hit9918 Smile

I don't have compiler or MSX emulator on this machine, so I don't know if this works or not...



	ORG #9000

	DB 0, low (16384),low (16384/2),low (16384/3) ... ,low (16384/254), ,low (16384/255)
	DB 0, high(16384),high(16384/2),high(16384/3) ... ,high(16384/254), ,high(16384/255)

; input HL -> texture colum: 64 bytes 
; input A = final size in [1...255]
; input DE = (X,Y) coords for the starting point where to plot the scaled colum 

scaler: ;#9200

	LD IXL,A
	LD B,A
	LD C,#9B
	LD A,#24
	LD (#99),A
	LD A,#91
	OUT (#99),A

	OUT (C),E
	OUT (C),0
	OUT (C),D
	OUT (C),0

	OUT (C),0
	OUT (C),0
	OUT (C),B
	OUT (C),0
	
	LD A,(HL)
	OUT (#9B),A

	OUT (C),0
	LD A,#F0
	OUT (#9B),A

	LD A,#AC
	LD (#99),A
	LD A,#91
	OUT (#99),A

	LD L,B
	LD B,H ; (Texture starts also from 256 byte boundary)
	LD H,#90
	LD E,(HL)
	INC H
	LD D,(HL)
	LD HL,0

CORE:
	ADD HL,DE
	LD C,H
	LD A,(BC)
	OUT (#9B),A

	DEC IXL
	JP NZ,CORE

	ret

Van hit9918

Prophet (2901)

afbeelding van hit9918

27-04-2011, 21:13

I thought the idea of HMMC feeding a column with the cpu does not work so easy?
Why not?

Sorry, I am MSX1 programmer, ARTRAG said something I misinterpreted, but rereading I see he was explaining something about HMMV.

I ask again to be clear:
There is a mode where I can do just OUT (port),A and the VDP will paste a column of pixels feed from the cpu? Is there some cycle figure about how fast one can do this without wrecking the VDP?

Then, re:screen 8 scroller, screen 5 scroller: to use the full blitter bandwidth, draw the right column with the cpu. the time this takes with blitter is lost for screen to screen copy.

Van ARTRAG

Enlighted (6504)

afbeelding van ARTRAG

27-04-2011, 23:38

Actually I have no experience on HMMC usage, so i cannot say if the loop testing the EC bit is really needed or not
This is a good reference on how the command works

http://www.ccas.ru/brychkov/MSX/V9938_programmers_guide.html

but does not give hints about timings

BTW, your idea of storing textures run length encoded sounds very good for large upscaling

My guess is that the best results can be achieved using run length encoded textures and HMMV for upscaling above a given pixel size and CPU to VRAM copies with HMMC for smaller scales and downscaling

If textures presents large constant chunks on the columns storing it twice, once as run length data another as bitmap, could be acceptable.
Maybe, if the max size where run length becomes convenient is small, one could also think to store the bitmap data pre-scaled

E.g. assume textures 64 bytes tall and that we use
- RLE+HMMV for final scale > 64
- HMMC and bitmap data final scale <= 64

Each column, scaled in all sizes between 1 and 64 would take 2080 bytes
16 textures of 64x64 pixels would take about 1Mbyte of rom

Quite a lot but it could become almost acceptable if we assume step 2 in the scaling (this would lead to about 512Kbytes) and/or smaller textures

What do you think?

Van ARTRAG

Enlighted (6504)

afbeelding van ARTRAG

27-04-2011, 23:39

@NYYRIKKI
I'll try asap you code in the actual maze 3d project releasing the results, thanks!

Van NYYRIKKI

Enlighted (5745)

afbeelding van NYYRIKKI

28-04-2011, 00:13

Ok, few quick tips/thoughts to help testing:

- A means actually A+1 (zero is calculated as pixel as well)
- High and low nibble of texture should be same
- Bit 0 of E should be always 0 (I don't know how you think X)
- If it does not work, try adding some delays.
- You might need to use a bit lower base value in divide table in order to avoid texture overflow. ( 0-64 -> 0-63 )

Van Vampier

Prophet (2384)

afbeelding van Vampier

28-04-2011, 04:00

when can I play doom?

Van ARTRAG

Enlighted (6504)

afbeelding van ARTRAG

28-04-2011, 08:41

whenever you like, just not on msx...

Van Metalion

Paragon (1360)

afbeelding van Metalion

28-04-2011, 09:08

The wait cycle for the VDP during a macro command IS necessary, I experienced it on other commands.
However, it can only be tested on real hardware, as emulators do not emulate the VDP timing.

Van ARTRAG

Enlighted (6504)

afbeelding van ARTRAG

28-04-2011, 12:58

not always, generally it depends on the command and on the parameters you use
I've never tested HMMC but I guess that the CPU delay is sufficient for the VDP to set the new VRAM address

Van Metalion

Paragon (1360)

afbeelding van Metalion

28-04-2011, 13:21

In fact, I did test HMMC, both on hardware and on BlueMSX, but I used the wait cycle.

If the wait cycle is indeed necessary, and given the fact that the VRAM offset between a double pixel and the next one is always the same (128 bytes), it might be interesting to check if the direct VRAM access (by setting the VRAM address) could be faster than HMMC ... I know for a fact that direct VRAM access works flawlessly with the speed of OTIR (which uses 21 T-states between each OUT).

About the wait cycle ... It says here that :

before a new command is sent to r#32-r#46 the program should first check the CE bit in s#2 (bit 0). This bit indicates whether the previously given command has finished or not. If it hasn’t, you should wait with giving the next command, or the previous one will be aborted

Now, when they're talking about a "command", are they talking about a whole macro command, or just the execution of bits of the actual command in progress ? In the first case, it means that if the VDP is in idle mode, you can throw it your data as fast as possible, whereas in the latter case, you have indeed to poll the s#2 register and use the wait cycle ...

EDIT : I just checked the VDP manual, and it is clear : with the HMMC command, the s#2 poll is not used to check the #CE# (VDP busy) bit but the #TR# (data transmit) bit which implies that the poll is indeed necessary. However, like you said, nothing is said about the time needed to clear the #TR# flag. I guess a test program could be made to find out, though ...

EDIT 2 : In the appendix (page 121), there is some info about the write cycle of the VDP. I am not sure how to interpret that, but if you add all times given (TASW, TAHW, TDSW, TDHW and TCSW), it gives a total time between 326ns and 2140ns (with a typical value of 840ns). That's between 1 and 8 T-states ... That's quite low, although I am certainly no expert on the subject and I might be completely mistaken.

Pagina 3/16
1 | 2 | | 4 | 5 | 6 | 7 | 8