I checked the Brazilian video and interchange mappings and they seem correct. A curious fact, though: in the Expert 1.0 character set all letters with diaeresis are represented with tilde instead (so, for example, ä and ã looked the same). I believe this was intentionally made since, at the time, it was common to use diaeresis instead of tilde in computers where vowels with tilde were not available. I'm not sure about this, though, it's just the only explanation I could imagine.
Currently the diaeresis isn't used in the Portuguese language anymore, incidentally. Brazilian standard keyboards still have it as one of the main keys (you get it with shift + 6).
For your convenience, wide editions of the same table for Arabic.
Character set of Bawareth Perfect MSX1 and Yamaha AX500 (respectively):
Please review:
Video mapping
Interchange mapping
Character set of Al Alamiah AX-170:
Please review:
Video mapping
Interchange mapping
Wide:
Character set of Bawareth Perfect MSX1 and Yamaha AX500 (respectively):
Please review:
Video mapping
Interchange mapping
Character set of Al Alamiah AX-170:
Please review:
Video mapping
Interchange mapping
Thanks Manuel! You beat me to it (I'm on my way back home now). So the mapping is incomplete. This is due to the smart font design which allowed them to avoid about one third of the total number of glyphs needed to properly display the cursive Arabic script.
What this means is that some MSX glyphs are actually used to represent more than one Unicode code point. For example, many letters in their isolated form work also as final form. However with the current mapping I see, if you request final form for those letters you will get nothing. So I guess the question is, can I add two or three entries per MSX glyph in the table? If not, how would you like me to express this?
Example:
0xB0 0xFEB1 # ARABIC LETTER SEEN ISOLATED FORM
but also:
0xB0 0xFEB2 # ARABIC LETTER SEEN FINAL FORM
so 0xB0 is mapped to two Unicode code points at the same time.
So, that character could be used to represent both these unicode points? Interesting...! I'll check with Rebecca how we could handle this. But that shouldn't stop you from reviewing. Just listing both is fine for now.
Alright, I've reviewed all Arabic files. There were some inaccuracies that I fixed, and I've added all the possible mappings to Unicode. Please note that all possibilities are equally correct and important. You can't assume there is a "main" one and the others are duplicates. Here are the updated files:
MSX BAWARETH PERFECT MSX1 & YAMAHA AX500 Video Mapping
MSX BAWARETH PERFECT MSX1 & YAMAHA AX500 Interchange Mapping
MSX AX-170 Video Mapping
MSX AX-170 Interchange Mapping
So, depending on how those mapping will be used, there are further quirks probably needed due to this "compression" of multiple forms into one. For example, if you take MSX 170AX text, convert it to Unicode points and try to display those Unicode points as-is on Windows, you will get something that looks almost right, but not quite. You will need to rerun the Arabic shaping algorithm on the resulting string to correct the ambiguities. I have personally made a single-header C lib to do this shaping based on the ICU library, which perfectly handles the simple cases that the MSX can generate. Let me know if anyone needs it for MSX support (I use it in my current WIP game).
Thanks a lot!
So, any Korean or Japanese users who want to help with these respective mappings?
AX-350II and AX-350IIF fonts are same as AX-170.
Tested on OpenMSX.
ax350ii_arabic.rom (sha: 2c9600c6e0025fee10d249e97448ecaa37e38c42)
ax350iif_arabic.rom (sha: 5077b9c86ce1dc0a22c71782dac7fb3ca2a467e0)
Brazillian
There are several variants, so they all have their own mapping.
I'm not sure if it's worth to have so many variants, as it can be confusing. The Hotbit 1.2 and Expert 1.1 have exactly the same charset, because it was standardised by ABNT (the Brazilian ANSI/DIN/JISC-like organisation) at that time and named "MSX-BR". The only difference is the font style. This version is to be considered the real Brazilian standard charset. Maybe a mesh with the best characters of the two sets could be used, since each one of them has one or another character that looks weird.
The Hotbit 1.1 has just a single character that's different from the Hotbit 1.2 and Expert 1.1: It's the 9Eh character, the useless Cruzado monetary symbol "Cz". So I would also recommend to just discard this charset.
The only real Frankenstein of the group is the Expert 1.0. Gradiente created a proprietary charset (just a mod of the international charset) without consulting anyone. I can't remember of any software that was specific for this machine, and it showed garbage characters for any Portuguese texts and software. As a result, Gradiente offered an "1.1 upgrade kit" that was composed of a new ROM and replacement keys for the keyboard, so very few pure 1.0 machines still exist.
But, if I understood correctly, the intention of the Retro Computers Unicode project is to be able to represent characters available at the legacy computers, right? This would solve problems like this page has, where plenty of PETSCII characters can't be represented on the page.
AFAIK, all of the Expert 1.0 characters can be represented by chars from the MSX-BR charset, or the MSX-international charset, IMHO it's not necessary to create a specific charset for this machine.
Cruzado (Cz) != Cruzeiro (Cr) right?
Cruzado (Cz) != Cruzeiro (Cr) right?
According to Wikipedia
-1942 Real (Rs)
1942-1967 Cruzeiro (Cr$ / ₢)
1967-1970 Cruzeiro Novo (NCr$)
1970-1986 Cruzeiro (Cr$)
1986-1989 Cruzado (Cz$)
1989-1990 Cruzado Novo (NCz$)
1990-1993 Cruzeiro (Cr$)
1993-1994 Cruzeiro real (CR$) (note the all caps)
1994-present Real (R$)