Presumably just averaging all the 0-bit pulse lengths and, separately, all the 1-bit pulse lengths would be a bit more reliable than — via the divide by three — making an implicit assumption that 50% of bits are 0s and 50% are 1s?
Also: despite my earlier comments about the terribleness of TSX, I've implemented it in my MSX emulator. It works exactly as well as CAS for ROM-format files, regardless of the internal encoding (so they load quickly, whether you've stored the data in block 4b or via any of the other thousand-and-one ways of saying the same thing in TSX), and non-ROM loaders will proceed at their original speed. I doubt it's going to make a lot of difference but I hope that helps in your attempt to improve the quality of preservation of original MSX tapes regardless of the quality of the container.
(EDIT: and the implementation, in case it helps to verify the breadth of support, in case the emulator should ever be useful as a component of TSX testing)
It might be useful to try to document any notable special loaders in use on the MSX, as the Spectrum community has, in case it is possible for a sufficiently smart emulator to turbo load those as well, as an optional user convenience.
Presumably just averaging all the 0-bit pulse lengths and, separately, all the 1-bit pulse lengths would be a bit more reliable than — via the divide by three — making an implicit assumption that 50% of bits are 0s and 50% are 1s?
You are confusing length of pulses and number of pulses. The way you differentiate between a high or low pulse is by the length. 0-bit pulses are twice as long as 1-bit pulses. if the average bit has a 0 and a 1 pulse then the length of the average bit is 0-bit plus 1-bit which is why you divide by 3.
EDIT:It is a terrible format but it will work perfectly well in an emulator as you are playing the TSX file at the speed it is encoded at.
What I am saying the speed it is encoded at is incorrect.
The fault is with how they are created and not how they are played.
EDIT2: Ah. I see what you mean. No because all the 0 bit pulse lengths are the same in each block as are all the 1 bit lengths in each block
Presumably just averaging all the 0-bit pulse lengths and, separately, all the 1-bit pulse lengths would be a bit more reliable than — via the divide by three — making an implicit assumption that 50% of bits are 0s and 50% are 1s?
You are confusing length of pulses and number of pulses. The way you differentiate between a high or low pulse is by the length. 0-bit pulses are twice as long as 1-bit pulses. if the average bit has a 0 and a 1 pulse then the length of the average bit is 0-bit plus 1-bit which is why you divide by 3.
I don't think I am, but I'll explain my logic as it's definitely possible that I'm under a misapprehension.
In the absolute worst case, if I had a block that were, say, 256 instances of FFh, then each instance of FF would be the bits 0 1111 1111 11 (one start bit, two stops).
There are 256 instances of that, so there are: 256 0 bits, plus 2560 1 bits. So that's 512 0-length pulses, plus 10240 1-length pulses.
If the true length of a 0 pulse is n, and the true length of a 1 pulse is m,
If I just averaged the length of every individual pulse after classification, I would take 10240m/10240 as the length of a 1 pulse, and 2560n/2560 as the length of a 0 pulse. No assumptions about the exactly relationship between the two, no statement made about quantity of pulses.
Conversely, going with divide by three logic:
The total length of every pulse added together is 2560n + 10240m.
There are 10240 + 2560 = 12800 pulses in total.
So the average length of a pulse is (2560n + 10240m)/12800.
By divide-by-three logic, we are asserting that m = (2560n + 10240m)/(12800 * 3).
So:
m = (2560n + 10240m)/(12800 * 3) = (2560n + 10240m)/38400
38400m = 2560n + 10240m
28160m = 2560n
11m = n
Which contradicts the assumption that 2m = n. Since 2m = n is [likely, if not quite exactly] true, I think that m = (2560n + 10240m)/(12800 * 3) must be false — divide by three is false for this data set.
So I think divide-by-three works only when you've exactly the same number of 0 pulses as 1 pulses. And statistically you shouldn't quite expect that to be the case for BIOS data because there is one start bit but two stop bits, which tilts things in 1's favour.
(EDIT: and as to the terrible file format, cool, it feels good to vent, but I agree that getting tapes preserved in some form is priority one)
CAS files do not allow for varying speeds in blocks. The play at one single speed. Default of 1200.
This means they should have a fixed 0-bit and 1-bit length that works out to 1200 bps.
At the moment CAS files that are converted to TSX files have a 0-bit and 1-bit pulse length that works out at 1600bps which means they must be encoded incorrectly. It also means that if they are incorrect then so are all the other ID4B blocks.
In any single block the 0-bit and 1-bit are a fixed length. 0-bit is twice as long as a 1-bit. If you add 0-bit to 1-bit you get the average bit length. If you divide 3,500,000 (the z80 processing speed in Spectrums) by the average bit length you get the baud rate of that block.
Oh, I wasn't thinking about CAS at all. Just that if I had sampled a tape, decided to use block 4b rather than e.g. the CSW block, and hence made a decision about which are 0 pulses and which are 1 pulses, I think I'd be likely to average the lengths of all the 0 pulses and, separately, average the lengths of all the 1 pulses to get those fields.
I can make sense of dividing by three only if you had exactly the same number of 0s and 1s (I think I phrased that incorrectly before, that might be where I was confusing pulse length and quantity). Hopefully they're close to equal because data is hopefully randomly distributed, but it might not be.
Apologies for the noise and for distracting from the main thing you're saying: that existing TSXs are encoded with an incorrect baud rate.
Right. Ignore me completely.
I was calculating the baudrate the Spectrum way which is the average bit length is a 0-bit and a 1-bit. I'd forgotten that Kansas City a 0-bit is one long pulse and a 1-bit is two short pulses so to get the average bit length it's 3,500,000/speed and then divide that by 4 to get the length of the short pulses.
TSX files are right and I am wrong.
Still hate the TZX format though.
Interesting given how new TSX is, just a year old, to already get that kind of comments... One would think now would be the time to do something better, that achieves the same goal, and ditch the TZX origins.
TZX is a sloppy file format without adding TSX.
Too many superfluous ID blocks that have nothing to do with tape preservation and everything to do with emulator usage.
Also whilst it's perfectly suited for the Spectrum, it doesn't take into account the different clock speeds of other machines that use it e.g CPC.
A more logical step would've been to use the UEF which is already a Kansas City Format and accurately duplicates the original cassette.
The only one benefit of using TZX is that Laser Squad can be archived.
If that were on the table, I'd still advocate strongly for CSW. It's just a compressed list of the periods between each zero crossing plus a specification of the initial polarity.
EDIT: or UEF, of course. But I invented that so I'm biased.
You need to be able to decompress a GZip stream so it likely adds ZLib or equivalent as a dependency for authors but the kicker is: one of the TZX chunks is just inline CSW data. So whatever complexity there is in implementing CSW is by definition a subset of that of implementing TZX.
From my point of view working with TZXDuino and the Arduino Nano a move to UEF or CSW would mean we couldn't support those file types in a device due to the lack of onboard memory but we could still keep supporting the CASDuino for the MSX.