Basic to C Translator (b2c)

b2c developed from my earliest gaming interests back in the 1980s when games where written and distributed in listing-format in various flavors of the BASIC programming language.

Back then an old friend of mine, Dave, and I spent many hours laboriously typing in these programs from paper listings of the source code. OCR wasn’t a reality for us yet. Both Dave and I worked on mainframe computers so games were pretty rare. So after many hours of cramped hands we got to play our games. They were trivial but fun.

However, BASIC has gone from an interesting alternative to my commonly used languages of Fortran and Cobol (yuk!), at the time, to a nuisance — especially after getting used to structured modern languages.

The idea of translating these programs into C appealed to me but the process didn’t so I decided to let my university education kick in and I built a Lex/Yacc based translator to convert the source code from Basic to C. It seemed to work really well. The days and weeks it took to develop b2c was well offset by the 1/10sec it took to translate these BASIC programs ;-).

If you are a programmer you’ll understand that the more sophisticated compiler programming was a lot more rewarding than the original games so I lost a little bit of my interest in the games themselves.

The b2c project got to the point where I wanted to convert the BASIC spaghetti code into a structured C instead of doing a straight line-for-line translation. But I got bogged down in trying to translate my parse tree in place and put the project down for a while.

When I tripped across a project called BNFC (Backus-Naur Form Converter) this reignited my interest in b2c as it makes the front end considerably easier to re-write from scratch. Or at least I thought it was going much better but then I recognized that the extra burden of learning BNFC and the limitations that it imposed didn’t offset the tedium of writing YACC grammars. Oh, well!

Currently I’m busy writing a game for Android and when I get finished that I’ll go back and work on these translators some more.

Once I get this working again I’d like to continue doing my Fortran-to-C translator because I’ve got the Empire, Galaxy and Star Trek sources I’d like to translate and get working again too.

The file BasicGames.7z contains some translated sources for some familiar Basic games. The translations may not work, or even be completely translated, but they do show the direction I was headed. The original Basic program is included as part of the listings in the ‘cpp’ (really just C) file. The one external file ‘intrinsics.h’ adds some extra functionality that was part of the Basic language which isn’t available directly in ‘C’.

I was also working on a dump2asc.cpp file that translates from Tokenized Basic to human readable Basic. However, this program isn’t well commented yet and is a bit of a mess so will be added here later. It translates from multiple versions of Tokenized Basic but the number of test programs available to debug it is limited so it isn’t perfectly yet. These are the Tokenized Basic versions it knows about so far:

gwbp – GW-Basic Protected,
gwbu – GW-Basic Unprotected,
basx – Unknown XBasic with first byte 0x42, and
basy – Unknown YBasic with first byte 0x42.

This file (BasicGamesSource.7z) contains most of the Basic source files that I’ve gathered as well as tokenized versions of some of these files. The tokenized versions of the files might have a slightly modified name to the Basic file and are located in the tokenized subdirectory of the directory containing the Basic file.

My dump2asc (Dump-to-ASCII) program is functioning much better now. It translates a tokenized Basic file to a text version (in ASCII). Recently I added the functionality to convert the file with a variety of tokenizations (GW-Basic isn’t the only version) defined mostly by the Tokenized Basic pages at the FileFormats archive. Also added a function to check a tokenized source file against all the known tokenizations with a single invocation and reporting the number of errors for each. This give the user the ability to figure out the tokenization of a file without checking laboriously against each available tokenization.

I just tripped across a tokenized file (programs/launch.bas in the source archive above) that had 0x1D (float) tokenized real numbers and was able to create, and test the code, to translate these to their correct value in the output source. I’ve also implemented (not tested) code for the 0x1F (double) conversion but I don’t have any tokenized files that include those tokens. Please if anybody has tokenized files that you can supply me with to test my de-tokenization with please pass them on to me at NonAligned.Games@gmail.com and I’ll send back the translated copy. So far I’ve been able to test with:

TRS-80 Level II BASIC tokenized file, and
GW-BASIC tokenized file (or BASICA)

The list of tokenizations that my program should handle (pending testing) are:

amos – [F] AMOS BASIC tokenized file
apfb – [F] APImagination Machine BASIC tokenized file
apfi – APF Imagination Machine BASIC tokenized file
apib – Apple Integer BASIC tokenized file
apsb – Applesoft BASIC tokenized file
atar – [F] Atari BASIC tokenized file
basx – Unknown XBasic with first byte 0x42
basy – Unknown YBasic with first byte 0x42
bbcb – [F] BBC BASIC tokenized file
cceb – [F] CCE MC-1000 BASIC tokenized file
cole – Coleco ADAM SmartBASIC tokenized file
comm – Commodore BASIC tokenized file
comp – Compucolor BASIC tokenized file
exid – Exidy Sorcerer BASIC tokenized file
gwba – GW-BASIC tokenized file (or BASICA)
gwbp – GW-BASIC tokenized file (or BASICA) protected
gwbu – [F] GW-BASIC tokenized file (or BASICA) unprotected
matt – [F] Mattel Aquarius BASIC tokenized file
mbas – MBASIC tokenized file (Microsoft BASIC for CP/M)
nasc – NASCOM BASIC tokenized file
ohio – Ohio Scientific BASIC tokenized file
sinc – [F] Sinclair BASIC tokenized file (for ZX80, ZX81 and Spectrum)
solb – Sol BASIC tokenized file
sole – Sol BASIC tokenized file
tiba – TI BASIC tokenized file (TI 99/4A)
tibe – TI BASIC tokenized file (TI 99/4A)
tiny – [F] Tiny BASIC tokenized file (ran on KIM-1 and some other early machines)
trs2 – TRS-80 Level II BASIC tokenized file
trsc – TRS-80 Color BASIC tokenized file

The [F] indicates a tokenization to be implemented in the future. The gwbp, gwbu, basx, and basy were tokenizations I developed before I became aware of the FileFormats archive.

On another note, I’ve returned to working on my b2c (translates Basic, all flavours, to C) program. I’m a little horrified by some of my design decisions way back then so I’ll be revisiting those. I have been able to get some translated C programs running. The larger programs like Super Star Trek are still causing problems but that will just be a process of debugging all the translation nuances. I’ll start posting some of the translations as they become available.

Some useful links to {Super} Star Trek:

Star Trek (1971 video game) (Wikipedia)
List of Star Trek games (Wikipedia)
Tom Almy’s version
BSD Trek (Github. This appears to be a subset of Tom Almy’s version)

May 03,2022

Managed to get b2c to translate strek2.bas fairly successfully. The C-file compiles and the resulting program runs but has bugs. The compile command is:

g++ -std=c++11 -g -Wno-write-strings strek2.c -o strek2

The 7zip-archive strek2.z7 has five files:

intrinsics.h – The header file that includes all the support routines to implement things like MID$(<str>,<start>,<length>) and other Basic functions.
strek2.bas – The original Basic file.
strek2.c – The translated C file.
strek2.log – The log file from the translation of the Basic program.
strek2.vars – The translation from the mangled variable/function/label names to user-chosen names.