MSX character set -> Unicode HELP NEEDED

ページ 5/8
1 | 2 | 3 | 4 | | 6 | 7 | 8

By gdx

Enlighted (5336)

gdx さんの画像

08-10-2019, 01:12

What do you mean?
I quickly split it into two pages and added some tables, the texts may need to be corrected.

By Manuel

Ascended (18704)

Manuel さんの画像

09-10-2019, 12:11

There are several issues...

1. the "MSX Characters" article doesn't show all MSX characters. Instead, some control codes in the tables are covering up the characters. Examples are the character at 0x7F and the characters of Arabic character sets. This is a good example of a character set Wiki article: https://en.wikipedia.org/wiki/Code_page_437 I hope someone would like to make this for the MSX character set as well (the Wikipedia article is quite bad and has also these issues with control codes)
2. the "MSX font" article does show the full character sets, but only for a selection of machines. And it's a bit odd, because most of these fonts are the same, just displaying different character sets. The interesting part is what different shapes were used for the same character (that is the whole point of a font). For example, there are variations between Japanese machines (see first page of this forum thread) and also between some Brazillian machines (really different font style). The style of the "usual" Western unaccented characters are the same everywhere, as far as I know.

By Manuel

Ascended (18704)

Manuel さんの画像

09-10-2019, 12:37

wbahnassi wrote:

Alright, I've reviewed all Arabic files. There were some inaccuracies that I fixed, and I've added all the possible mappings to Unicode. Please note that all possibilities are equally correct and important. You can't assume there is a "main" one and the others are duplicates.

Rebecca asked me to thank you for fixing the Arabic mappings for her.

As for the Korean and Japanese mappings: any help is mostly appreciated!

By gdx

Enlighted (5336)

gdx さんの画像

09-10-2019, 13:33

引用:

1. the "MSX Characters" article doesn't show all MSX characters.

"MSX Characters codes" is intended to show the printable characters and how, but also the differences between the MSXs.

引用:

2. the "MSX font" article does show the full character sets, but only for a selection of machines.

"MSX fonts" is to show where the characters are in memory, but also the differences between the MSXs.

It takes time to do that and I find less and less. So do not hesitate to add what is missing. Otherwise it will be incomplete for a long time.

By Manuel

Ascended (18704)

Manuel さんの画像

09-10-2019, 14:33

Well, I don't agree with this set up. "MSX Character Codes" should show the character sets (including differences between machines). Right now it doesn't, a lot of characters are missing.

The method to print these characters with the BIOS is a nice extra; it's what Rebecca calls the "transaction" mapping. But the fundamental mapping is the character number and the symbol belonging to that number. The mapping as shown in the MSX Technical Databook. The data that is also in the font part of the MSX ROM, indexed by that same number. That is what is defined as the 'character set'.
Obviously, a 'listing' of the character set should show all possible characters comprising that character set. If there are characters that can't be printed with the BIOS, that doesn't mean they do not exist. They're still part of the character set.

As I said before, a 'font' is the representation of the characters.

To illustrate the difference: it makes sense to make a mapping from MSX Character Sets to Unicode. Unicode is another way to assign a number to a glyph. The MSX Character Set is the MSX way to assign a number to a glyph.
But fonts are how a glyph is represented. You can have the same character set with different fonts. It means that you will get the same characters when putting a set codes in VRAM, just rendered in a different style.
But if there are differences in character sets, you get a different character. Font isn't even relevant then, the whole character is different.

There are different character sets used on the MSX and that's what this whole thread is about. As I went along, I noticed also some slightly different styles in font.

By gdx

Enlighted (5336)

gdx さんの画像

09-10-2019, 17:29

Whether you agree or not, MSX work like this. And now any characters are missing. (except for arabic that is not finished because I do not know all the codes.) The mapping as shown in the MSX Technical Databook is in "MSX fonts". The two pages are complementary.

I wonder if you really saw the pages to say all that.

By wbahnassi

Master (159)

wbahnassi さんの画像

09-10-2019, 19:53

Manuel wrote:

Rebecca asked me to thank you for fixing the Arabic mappings for her.

You are both welcome Smile And thanks for your effort in all of this!

By Manuel

Ascended (18704)

Manuel さんの画像

20-10-2019, 00:13

Apart from the Wiki discussion I'd still like people to review the unicode mappings Rebecca made. See page 1 of the review. Especially Korean and Japanese speaking people can help. Highly appreciated!

And thanks again for everyone who already helped.

By Manuel

Ascended (18704)

Manuel さんの画像

20-01-2022, 00:39

Note that the mappings as they were have landed in the Unicode 13.0 standard. See official documents:
https://www.unicode.org/L2/L2021/21235-terminals-supplement.pdf
https://www.unicode.org/L2/L2021/21235-terminals-supplement-...

I hope they're correct, as not many people responded to the call to review... (But wbahnassi sure helped to get the Arabic one fixed, for instance!) If you find mistakes, please report them here, to me personally or to Rebecca Bettencourt of Kreative Korp. She can process this for the next Unicode update.

By Manuel

Ascended (18704)

Manuel さんの画像

24-01-2022, 00:25

sd_snatcher wrote:
Manuel wrote:

Brazillian

There are several variants, so they all have their own mapping.

I'm not sure if it's worth to have so many variants, as it can be confusing. The Hotbit 1.2 and Expert 1.1 have exactly the same charset, because it was standardised by ABNT (the Brazilian ANSI/DIN/JISC-like organisation) at that time and named "MSX-BR". The only difference is the font style. This version is to be considered the real Brazilian standard charset. Maybe a mesh with the best characters of the two sets could be used, since each one of them has one or another character that looks weird.

The Hotbit 1.1 has just a single character that's different from the Hotbit 1.2 and Expert 1.1: It's the 9Eh character, the useless Cruzado monetary symbol "Cz". So I would also recommend to just discard this charset.

The only real Frankenstein of the group is the Expert 1.0. Gradiente created a proprietary charset (just a mod of the international charset) without consulting anyone. I can't remember of any software that was specific for this machine, and it showed garbage characters for any Portuguese texts and software. As a result, Gradiente offered an "1.1 upgrade kit" that was composed of a new ROM and replacement keys for the keyboard, so very few pure 1.0 machines still exist.

If I now look at the character sets in the Unicode 3.0 proposal, I see the following variants for Brazillian machines:
BG: Gradiente Expert XP-800 (Brazilian) (Manuel's note: this is "Expert 1.0", right?)
BH: Sharp Hotbit HB-8000 1.1 (Brazilian) (Manuel's note: same as BR below, but with Cz instead of Pt on 9Eh)
BR: Gradiente Expert DDPlus, Sharp Hotbit HB-8000 1.2 (Brazilian) (Manuel's note: this is then the ABNT charset)

So, this matches what you described.

openMSX has these variants of keyboards, i.e. descriptions on which keys to press to produce a certain unicode character:
unicodemap.br_gradiente_1_0 used for the Gradiente XP-800, so matching the BG above
unicodemap.br_gradiente_1_1 used for Gradiente_Expert_DD_Plus, Gradiente_Expert_GPC-1, Gradiente_Expert_Plus, and CIEL Expert-Turbo
unicodemap.br_hotbit for both Hotbit 1.1 and 1.2

So the latter 2 things do not seem to be consistent with the mappings above. They of course do not necessarily have to be, as (so far) these files contain different information than Rebecca's mappings, the latter only tell which MSX character code is matching which Unicode point.
As I am in the process of merging Rebecca's information with the existing unicodemaps, it looks like we need more variants:

unicodemap.br_gradiente_1_0 - with BG character mapping for the Expert 1.0/XP-800
unicodemap.br_gradiente_1_1 with BR character mapping for Gradiente_Expert_DD_Plus, Gradiente_Expert_GPC-1, Gradiente_Expert_Plus, and CIEL Expert-Turbo
unicodemap.br_hotbit with BH mapping for the Hotbit 1.1 -> to be renamed as unicodemap.br_hotbit11
unicodemap.br_hotbit with BR mapping for the Hotbit 1.2 -> to be renamed as unicodemap.br_hotbit12

Do you agree with this analysis, Brazillian MSX experts? :)

ページ 5/8
1 | 2 | 3 | 4 | | 6 | 7 | 8