Can't do 3D on C64? https://www.youtube.com/watch?v=zVPW40ygds4 :P
Can't do 3D on C64? https://www.youtube.com/watch?v=zVPW40ygds4 :P
watch out at the hw sprites.... Not a wireframe or filled poly gfx ;-) only optimized hw sprites
anyway it does not mean you cannot do.... it does only mean that is more difficult than on z80 machines because of more limited cpu horse power
Can't do 3D on C64? https://www.youtube.com/watch?v=zVPW40ygds4 :P
I didn't say it couldn't do delta-compressed movie playback!
But yeah, it can do okay even when performing actual 3d computations if given enough storage space for lookup tables (that's a large cartridge game). It's just that decent 3d on that platform is much rarer than on the z80 machines.
Though it might just be the low clock rate on that specific 6502; the Atari does a much better Escape from Fractalus than the C64, and the BBC does a much better Revs.
The C64 probably compensates the slow calculations with the fast drawing...
The C64 probably compensates the slow calculations with the fast drawing...
Much like the consoles, I think; the hardware is well optimised for a big tranche of games, and for cost.
Aside: glancing at the 65816, the '16-bit' (ish) drop-in (sort of) 6502 replacement, I see they added block move instructions. So we're not the only ones to have considered the 6502 solutions to LDxR to be problematic.
(EDIT: though, that being said, LDxR aren't exactly brilliant on the Z80, since they spend 50% of their memory accesses on redundantly rereading their own opcodes.)
(EDIT: though, that being said, LDxR aren't exactly brilliant on the Z80, since they spend 50% of their memory accesses on redundantly rereading their own opcodes.)
the z80 is not a DMA controller or specific device to do massive memory moves and is one of the 8 bit cpu that by first had this functions. Even if not so much optimized, they are faster than hand made code on others processors, giving the same amount of flexibility and general purpose functions.
Even if 6502 -> 16 bit version added that we are talking about early 8 bit cpus. Even 80386 added, for example, linear virtual addressing (later) but x86 cpu did not have originally this.
By the way the not so efficient way to do block moves is the same on x86 "REP MOVSB" is not only than a REP LDI aka LDIR.
(Even if i think on x86 there is some mitigation on the inefficiency)
Also, of course, I forgot the refresh cycle. There are five memory accesses total for each iteration of an LDxR: two to read the operation, a refresh cycle in between, then a read and a write to perform the memory mutation. So actually it's only 40% of memory accesses that are redundant, amounting to 5 of 21 cycles. Or, on an MSX, 6 of 22. I guess there might be some additional decode costed in after the second opcode byte fetch (decode of the first byte happens during the refresh cycle) but the dominant cost seems to be the 16-bit arithmetic.
EDIT:
The 65816 has MVN/MVP instructions of LDIR analog. It runs at 7 clocks per byte!
Aside: glancing at the 65816, the '16-bit' (ish) drop-in (sort of) 6502 replacement, I see they added block move instructions. So we're not the only ones to have considered the 6502 solutions to LDxR to be problematic.
Apologies to -LeoN- for missing his post, then saying the same thing, with less detail! At least there's one regard in which the 65816 isn't an absolutely terrible thing.
If anyone's interested in these 'other cpu's', I'm starting some Beginners 6502 and 68000 beginners assembly tutorials on my Youtube channel,
6502:https://youtu.be/lsvSZamCCBM
68000:https://youtu.be/ilSvChwlmtw
I'm going to be covering the SNES, but only in 6502 mode - as the assembler I use doesn't support 65816 - I'd be interested to try it eventually - so maybe one day!
(EDIT: though, that being said, LDxR aren't exactly brilliant on the Z80, since they spend 50% of their memory accesses on redundantly rereading their own opcodes.)
Yes, in the common case an 'internal repeat' that doesn't re-read opcodes would allow much faster block-copy. But whether you'd want that, depends on what you want the documented behaviour to be:
I tend to regard eg. LDIR as LDI + JR back to the opcode's start address (while BC<>0). Re-reading the opcode there may produce a different opcode than the previous read. For example if that LDI(R) overwrote itself. Or if external hardware interferes with the opcode read somehow. Or say that opcode is produced by a memory-mapped I/O device that produces a different opcode once some other hardware condition occurs. A program may use this for code obfuscation. Or the last write changed the memory configuration (including opcode's address) through a memory mapper style mechanism.
Okay such cases are unusual, but even then it's possible that is the intended behaviour for LDIR & co.
If you only want to have the fastest possible block-copy ignoring the possibility of cases like above, then yes such 'internal repeat' would be preferred.
I think that an explanation is simple: the instruction MUST be interruptable.