Your proposals are interesting...
PNG prediction approach is a very interesting way to improve compression of pulse data.
I think that file size is important but is not the main goal. I don't care if a file take 100kb or 50kb... We will store them in SDcards mainly.
Anyway I think that the first step is create a consensus/accord about the main goals/features the format must have, and how prioritize them. Once all agreed about the goals/priorities we could start think about the best approach.
Which are your goals for a new possible format?
In case it was missed, I’ll repeat my suggestion:
Since the zero-crossing times will be very similar, a format based on deltas and simple Huffman compression should already keep the file size down by a lot, and would be simple to implement, stream incrementally, emulate in real-time, and process digitally with threshold values.
Is there any reason to make it more complex than simply that?
The wait, reverse polarity, short/long, etc.,things mentioned earlier to me seem just unnecessary different ways to express the time deltas between zero crossings. Some feature like “semantic data”, I don’t know what the purpose of that would be other than to complicate the format. Metadata also seems unnecessary, but regardless to add a metadata block to any format is fairly trivial.
I agree with you Grauw, but there's some metadata that could be useful for some purposes. E.g. the file type of the loader, so that it is easy to derive how to load the software.
But if a “CAS2” generation tool can determine that, so can the emulator by just processing the data and extracting the file headers, right? And then you’re not reliant on complicating the file format and don’t put burden on the ripper / ripping tools (which, if there is an error or mistake, aren’t as easy to correct once the files are out there).
You already do this with WAV in openMSX I think, though due to the way this suggested format is encoded it would be a bit simpler. Some byte reader class with “CAS2” data as input and bytes as output shouldn’t be too hard to make. Ones and zeroes are easy to find by just comparing the zero crossing delta times against a threshold.
I would say, the format should only include extra data which requires manual intervention because it can’t be derived from the data stream itself. But I have doubts that there is that kind of data.
I agree with you Grauw, but there's some metadata that could be useful for some purposes. E.g. the file type of the loader, so that it is easy to derive how to load the software.
.CAS files don't have Metadata and OpenMSX knows what load commands to use so it's not overly important but even then if you set a file naming system similar to TOSEC then you have all the information you need in the filename.
With CAS no such metadata is necessary because the format is on such high level that inspecting the first few bytes directly reveals the MSX file format and thus the loading instruction.
In case it was missed, I’ll repeat my suggestion:
Since the zero-crossing times will be very similar, a format based on deltas and simple Huffman compression should already keep the file size down by a lot, and would be simple to implement, stream incrementally, emulate in real-time, and process digitally with threshold values.
Is there any reason to make it more complex than simply that?
The wait, reverse polarity, short/long, etc.,things mentioned earlier to me seem just unnecessary different ways to express the time deltas between zero crossings. Some feature like “semantic data”, I don’t know what the purpose of that would be other than to complicate the format. Metadata also seems unnecessary, but regardless to add a metadata block to any format is fairly trivial.
They model the reality behind 99% of tape signals: that there's the encoding on the one hand, the encoded data on the other.
By modelling reality, you get a minimal file size.
That the benefit. As to the cost: the instruction set proposed — seven instructions, no side effects — is so trivial as to be implementable in one line per instruction.
I'd strongly reject any file format that could not in principle contain extended metadata. Shoving it into the filename is an exceedingly ugly hack; that it is present in many filenames proves that it's worth retaining even without the more obvious argument: persevering a tape implies persevering whatever text was printed on the tape, which usually was name and publisher, usually copyright year, and often loading instructions.
I'm in the middle of moving house so time is an absent resource, but I would suggest that a good next step would be to produce some mock files and code to handle them. See where the pitfalls lie. Ability to preserve can be determined logically, but arguments about implementation burden will be easier to give weight to when, you know, there's actual code and files. Then we can discuss whether an X% reduction in file size is worth Y% extra complexity.
EDIT: because, to say it explicitly, the problem with TZX isn't fidelity or size, but the overwhelming complexity it foists upon implementors. Quite apart from the psychological discouragement, thorough testing is almost impossible. So the community gets a whole bunch of products that all claim to implement TZX but actually only implement various different subsets, whether through design or error. So I consider the relevant factors to be threefold: complexity, fidelity, size. It's a trade-off between the three. Any one has to be severely askew actually to disqualify.
Are you all for a pulse based format?
And against a byte based format?
The seven-instruction format is byte based where it can be. But I think the issue is that you cannot encode everything an MSX can read with a purely byte-based format, but you also don't want too much complexity if you're going to try to do both things.
TZX has a lack of uniformity and manifold duplications of functionality, that make the code for handling it lengthy and hard to verify.
Ok... byte based blocks needs a format template assigned to be encoded in a MSX way.
In this case something like "0"b0b1b2b3b4b5b6b7"11" for byte level.
But is needed a bit level template too.
In each byte based block we need to include:
- Bit level template (using pulses)
- Byte level template (using bits)
- Byte stream
Defining templates in a flexible way, we could reproduce every data block of every platform, not only for MSX systems (remember that some european games use Spectrum data blocks like copy protection).
The pilot tone could be externalized from byte stream block (using other block) to not increase complexity.