Type in BASIC source list by OCR tool without typing

By st1mpy

Paladin (780)

st1mpy's picture

23-10-2020, 12:57

Any way to do that? Scan the source code list pages from magazines and convert to a MSX .bas file.

Login or register to post comments

By theNestruo

Champion (297)

theNestruo's picture

23-10-2020, 13:11

Probably not the answer you are looking for, but my first step would be looking for the game in this page: http://msxbasic.blogspot.com/ Maybe you are lucky and Ryback already typed it!

For the OCR way, I guess the OCR gives you a plain text... That text can be saved as ASCII (.ASC), loaded in an emulator, and then saved back as tokenized BASIC (.BAS). Don't know if there is a shorter path.

By FiXato

Scribe (1703)

FiXato's picture

23-10-2020, 13:33

theNestruo wrote:

Probably not the answer you are looking for, but my first step would be looking for the game in this page: http://msxbasic.blogspot.com/ Maybe you are lucky and Ryback already typed it!

For the OCR way, I guess the OCR gives you a plain text... That text can be saved as ASCII (.ASC), loaded in an emulator, and then saved back as tokenized BASIC (.BAS). Don't know if there is a shorter path.

With openMSX you could paste it directly into basic I guess.

You'd still rely on the quality of the OCR, and how it was formatted.
Have a look for example at some of the earlier examples in the first MCM Listingboek.
Narrow columns, fonts that might not easily be recognised, listings that by the looks of it were originally printed and then scanned, a checksum column that would require column selection, word-wrapping.

The PDF actually already supports text selection, but as you can see here, it looks like that already had issues detecting line and word boundaries:

By Briqunullus

Champion (360)

Briqunullus's picture

23-10-2020, 14:17

I've done a few tests a while ago. First you'll need high resolution scans, I think I did 300 dpi. Then you may need to convert images to black and white and enhance contrast, depending what colors the magazine used. Those images can be processed by OCR, but they'll still contain errors. So the final step would be to verify the checksum for each line.

This is quite a task, but it still is a lot quicker than typing everything.