Basic to C Translator (b2c)
b2c developed from my earliest gaming interests back in the 1980s when games where written and distributed in listing-format in various flavors of the BASIC programming language.
Back then an old friend of mine, Dave, and I spent many hours laboriously typing in these programs from paper listings of the source code. OCR wasn’t a reality for us yet. Both Dave and I worked on mainframe computers so games were pretty rare. So after many hours of cramped hands we got to play our games. They were trivial but fun.
However, BASIC has gone from an interesting alternative to my commonly used languages of Fortran and Cobol (yuk!), at the time, to a nuisance — especially after getting used to structured modern languages.
The idea of translating these programs into C appealed to me but the process didn’t so I decided to let my university education kick in and I built a Lex/Yacc based translator to convert the source code from Basic to C. It seemed to work really well. The days and weeks it took to develop b2c was well offset by the 1/10sec it took to translate these BASIC programs ;-).
If you are a programmer you’ll understand that the more sophisticated compiler programming was a lot more rewarding than the original games so I lost a little bit of my interest in the games themselves.
The b2c project got to the point where I wanted to convert the BASIC spaghetti code into a structured C instead of doing a straight line-for-line translation. But I got bogged down in trying to translate my parse tree in place and put the project down for a while.
When I tripped across a project called BNFC (Backus-Naur Form Converter) this reignited my interest in b2c as it makes the front end considerably easier to re-write from scratch. Or at least I thought it was going much better but then I recognized that the extra burden of learning BNFC and the limitations that it imposed didn’t offset the tedium of writing YACC grammars. Oh, well!
Currently I’m busy writing a game for Android and when I get finished that I’ll go back and work on these translators some more.
Once I get this working again I’d like to continue doing my Fortran-to-C translator because I’ve got the Empire, Galaxy and Star Trek sources I’d like to translate and get working again too.
The file BasicGames.7z contains some translated sources for some familiar Basic games. The translations may not work, or even be completely translated, but they do show the direction I was headed. The original Basic program is included as part of the listings in the ‘cpp’ (really just C) file. The one external file ‘intrinsics.h’ adds some extra functionality that was part of the Basic language which isn’t available directly in ‘C’.
I was also working on a dump2asc.cpp file that translates from Tokenized Basic to human readable Basic. However, this program isn’t well commented yet and is a bit of a mess so will be added here later. It translates from multiple versions of Tokenized Basic but the number of test programs available to debug it is limited so it isn’t perfectly yet. These are the Tokenized Basic versions it knows about so far:
- gwbp – GW-Basic Protected,
- gwbu – GW-Basic Unprotected,
- basx – Unknown XBasic with first byte 0x42, and
- basy – Unknown YBasic with first byte 0x42.
This file (BasicGamesSource.7z) contains most of the Basic source files that I’ve gathered as well as tokenized versions of some of these files. The tokenized versions of the files might have a slightly modified name to the Basic file and are located in the tokenized subdirectory of the directory containing the Basic file.
My dump2asc (Dump-to-ASCII) program is functioning much better now. It translates a tokenized Basic file to a text version (in ASCII). Recently I added the functionality to convert the file with a variety of tokenizations (GW-Basic isn’t the only version) defined mostly by the Tokenized Basic pages at the FileFormats archive. Also added a function to check a tokenized source file against all the known tokenizations with a single invocation and reporting the number of errors for each. This give the user the ability to figure out the tokenization of a file without checking laboriously against each available tokenization.
I just tripped across a tokenized file (programs/launch.bas in the source archive above) that had 0x1D (float) tokenized real numbers and was able to create, and test the code, to translate these to their correct value in the output source. I’ve also implemented (not tested) code for the 0x1F (double) conversion but I don’t have any tokenized files that include those tokens. Please if anybody has tokenized files that you can supply me with to test my de-tokenization with please pass them on to me at NonAligned.Games@gmail.com and I’ll send back the translated copy. So far I’ve been able to test with:
- TRS-80 Level II BASIC tokenized file, and
- GW-BASIC tokenized file (or BASICA)
The list of tokenizations that my program should handle (pending testing) are:
- amos – [F] AMOS BASIC tokenized file
- apfb – [F] APImagination Machine BASIC tokenized file
- apfi – APF Imagination Machine BASIC tokenized file
- apib – Apple Integer BASIC tokenized file
- apsb – Applesoft BASIC tokenized file
- atar – [F] Atari BASIC tokenized file
- basx – Unknown XBasic with first byte 0x42
- basy – Unknown YBasic with first byte 0x42
- bbcb – [F] BBC BASIC tokenized file
- cceb – [F] CCE MC-1000 BASIC tokenized file
- cole – Coleco ADAM SmartBASIC tokenized file
- comm – Commodore BASIC tokenized file
- comp – Compucolor BASIC tokenized file
- exid – Exidy Sorcerer BASIC tokenized file
- gwba – GW-BASIC tokenized file (or BASICA)
- gwbp – GW-BASIC tokenized file (or BASICA) protected
- gwbu – [F] GW-BASIC tokenized file (or BASICA) unprotected
- matt – [F] Mattel Aquarius BASIC tokenized file
- mbas – MBASIC tokenized file (Microsoft BASIC for CP/M)
- nasc – NASCOM BASIC tokenized file
- ohio – Ohio Scientific BASIC tokenized file
- sinc – [F] Sinclair BASIC tokenized file (for ZX80, ZX81 and Spectrum)
- solb – Sol BASIC tokenized file
- sole – Sol BASIC tokenized file
- tiba – TI BASIC tokenized file (TI 99/4A)
- tibe – TI BASIC tokenized file (TI 99/4A)
- tiny – [F] Tiny BASIC tokenized file (ran on KIM-1 and some other early machines)
- trs2 – TRS-80 Level II BASIC tokenized file
- trsc – TRS-80 Color BASIC tokenized file
The [F] indicates a tokenization to be implemented in the future. The gwbp, gwbu, basx, and basy were tokenizations I developed before I became aware of the FileFormats archive.
On another note, I’ve returned to working on my b2c (translates Basic, all flavours, to C) program. I’m a little horrified by some of my design decisions way back then so I’ll be revisiting those. I have been able to get some translated C programs running. The larger programs like Super Star Trek are still causing problems but that will just be a process of debugging all the translation nuances. I’ll start posting some of the translations as they become available.
Some useful links to {Super} Star Trek:
- Star Trek (1971 video game) (Wikipedia)
- List of Star Trek games (Wikipedia)
- Tom Almy’s version
- BSD Trek (Github. This appears to be a subset of Tom Almy’s version)
May 03,2022
Managed to get b2c to translate strek2.bas fairly successfully. The C-file compiles and the resulting program runs but has bugs. The compile command is:
g++ -std=c++11 -g -Wno-write-strings strek2.c -o strek2
The 7zip-archive strek2.z7 has five files:
- intrinsics.h – The header file that includes all the support routines to implement things like MID$(<str>,<start>,<length>) and other Basic functions.
- strek2.bas – The original Basic file.
- strek2.c – The translated C file.
- strek2.log – The log file from the translation of the Basic program.
- strek2.vars – The translation from the mangled variable/function/label names to user-chosen names.
Hi,
I need to translate some old unstructured BASIC code into C or any other currently used language.
Can you share your B2CTrans project?
Thanks in advance
Hi Basilio,
Sorry, to take so long to respond…
Unfortunately, I put the b2ctrans project in limbo for a while so I could work on other projects. I actually started building the translator because I had a number of Basic programs I wanted to convert, including some from an old book “Basic Computer Games” http://www.vintage-basic.net/games.html
As I remember it was doing quite well translating them statement-to-statement (one Basic statement to one C statement(roughly)). However, I wanted to convert Basic programs into structured C which is where I got bogged down.
I’m not sure about the state of the project right now so can’t offer you much.
What type of programs and what flavour of Basic were you wanting to translate? B2Ctrans does some varieties micro basic.
Take care,
/Alan
Hello,
I was wondering how were you able to convert vintage basic to c, specifically with goto and gosub statements? I am currently trying to translate the Star trek game from the “BASIC Computer Games.”
Thanks!
Hi Jay,
The C language has statements that are almost identical to the Basic statements. For instance for the Basic “GOTO NNNN \n…\n NNNN” has an equivalent in C “goto Lbl_NNNN \n…\nLbl_NNNN: ” so you just have to prefix your Basic line numbers (the ones that are used for GOTO or GOSUB targets) with “Lbl_” or something similar. You can see this in some of the images of listings above.
A Basic GOSUB is just a call to a subroutine. So the Basic “GOSUB NNNN \n…\n NNNN \n…\n END” can be replaced in C with “routine_NNNN \n…\n void routine_NNNN(){ \n … \n }”. But beware Basic subroutines can have multiple entry points. For instance “NNNN … MMMM … END” where the NNNN and MMMM are line numbers and the “\n” represents newlines. My first attempt at this problem just copies the MMMM chunk of code to a completely separate routine. This will duplicate a number of lines of code but it was easy to implement. The hard part in this case is to determine where overlapping subroutines occur.
This was one advantage of b2c because it hunts down all the “active” line numbers (ones used by GOTOs, GOSUBs etc) and traces the flow of control in cases where you have overlapping subroutines (which seems to happen quite often).
The b2c program is written in C++ and uses Flex/Bison to parse the old Basic program which produces an AST (Abstract Syntax Tree) in memory and then runs through the AST to output the equivalent C statements.
BTW, there are a bunch of StarTrek games that seem to have been spawned from, or in parallel with, this initial version.
Take care,
/Alan
Thank you for relating your background and experience with this problem.
I hope that this comment isn’t too late (other comments are around 18 months to 2 years ago).
I had visions (long long ago) of doing similar, but was wanting to convert to C to run (easily/sensibly) on Linux.
(I note from your screen shot that you are a Linux user as well)
The input being, programs written in the GWbasic dialect for DOS, mostly games BUT……
The “Spaghetti code” problem was partly solved when MS brought out QuickBasic which had a utility which omitted unused line numbers.
However converting Basic statements like “pos” or “csrlin” and “line (x y) – (x y)” … etc,
requires the likes of ncurses and sdl (or other graphic screen).
I tried a “basic to C converter” many years ago, but was aghast when it turned a 2k program into over 20k of obtuse code.
I wanted C code which was easy to read and could be compared against its “basic code” counterpart.
My approach has been to convert in steps (mainly for debugging)
– clean out the unused line numbers; get a list of all variables; get a list of all basic statements used
– decide on variable/s data type; substitute hand-written C functions for complicated Statements;
– automatically write a header file, include needed C libraries and global variables
– determine unused and/or unset variables (I couldn’t get a simple 25 liner to run because “loops” wasn’t init’ed to 0 )
– refactor to get/separate out functions (still just a dream)
Like you, I keep coming back to my translator…
esp. When I find an old Basic game that would be nice to have working in C to run on Linux
and … to maybe also have a coding language to allow quick prototyping
Hi John,
I actually started working on this problem as a interesting diversion a long time ago — it appears one of the listings has a Windows file structure. I abandoned Windows eons ago when I found out all the filters I had to write on Windows where already in linux as standard (and much better than my programs).
It probably would have been easier to re-write the basic code by hand but I wanted to try using Flex/Bison as my lexical-analyser and parser generators. Those seemed to work fairly well.
But while playing with that code I noticed that some of the code had ‘subroutines’ with multiple entry&exit points so I first unraveled that code.
After that I was working on trying to induce some structure into the code (for/while loops etc) so I didn’t have to use GOTOs and labels. For some reason I tried to do translation in-place instead of coding from one statement* vector to another (kinda a crazy decision with a little hindsight).
That’s probably when some shiny piece of code caught my eye and I got distracted. I do intend to get back to it sometime … I’m busy working on my Royals game right now so it will be a while.
If you look at the picture of the listings I put up the program intersperses Basic and C lines so it is easy to see what C code is being generated for each Basic line. There is some ancillary code like MyFpf() function which is just a shell around a vfprintf(). My approach is very much like yours.
Keep in touch … I’m curious how your project is going. I’ll try and put up some of the listings I’ve generated so you can see my translations in more detail.
Take care,
/Alan
I went from DOS to Linux about the time Windows 3.1/XP came out
(probably hung on to DOS for too long, but I had a CAD and other that used DOS)
That is when I wanted to migrate most of my DOS/gwBasic programs to Linux
Initially I did a C program that emulated the QBasic utility that omitted unused line numbers.
(my reasoning was to not rely on anything MS and/or was a locked black box)
I looked at flex/Bison but it seemed an unnecessary/complicating step which still eventually needed a hand written function to substitute for the Basic statement (and at that stage I was still undecided on which graphic window)
I decided to suffer GOTO’s (as C had these) to get a working C program. And eventually re-structure (if necessary) esp. if/when a good refactoring program for C came along. Also with my approach one gets a terminal window AND a graphic window. Having BOTH these windows allows more than re-structuring, but re-jigging/evolving the program.
My converter/translator also echoes the original Basic Program Line (before its C line/s) as a comment
This is switchable on/off (via option/s on the Command Line).
I am also looking at echoing the original to the right after the C line (like bcx does )
Because I did it in steps, the total conversion is done via a shell script executing each step in turn
Would you like an example, ie. the generated C code for a simple gwBasic program? How do I get it to you?
Yep, same here. Except I hung on till the end of XP’s run. I had gotten a lot of software running on Windows and didn’t relish the conversion over to a new OS. In hindsight I wished I had known how easy Linux/Ubuntu made it.
I had collected a number of Basic games during that period … a friend, Dave L., and I had painfully typed in listings we found in a Creative Computing book which I brought over to a project in Linux. It would have been nice to have an OCR program at that time but that was just before they became commonplace.
There were a few aborted attempts at hand converting these Basic games to ‘C’ but finally I had built a program to auto-translated them using the Borland C++Builder platform. When I moved over to Linux that became Code::Blocks.
In the process I had tripped across DECUS tapes (Digital Equipment Corp. User group) and found much more sophisticated versions of some of these games in Fortran so, without finishing the b2c program, I started the f2c program! Both of these are in coding limbo for the moment. The Fortran versions of StarTrek are quite impressive and may serve as a good starting point for future games. There is also a Fortran Empire game but I’ve already started my own version and I’ve got enough working on it to not want to start over.
I had done a little lex/yacc work in school and thought this might save some time. It seemed to help quite a bit but, of course, there’s a learning curve. Because it was tedious to build grammars I started learning BNFC which made the generation of a lexer/parser a little faster but it also required a lot of kludges because old Fortran had some language weirdness like position dependent fields. That’s the point where I dropped that project … or rather decided to go back to regular BNF (someday).
I also decided my first step was with GOTOs in ‘C’ but had planned to convert over to structured code as soon as I had ironed out the tangled code that Basic programmers seemed to enjoy.
Since I had only come across text based games there was no need for a graphics interface. When I did trip across a graphics based program I figured that I could add it to an interface library (instrinsics.h, which I currently have for other stuff).
I did get some of the simpler games working but my excitement evaporated somewhat when I found how trivial the games were — I still have hope for the bigger games. You can find some of my listings in the file BasicGames.7z on the BCTrans (this) page.
If you wouldn’t mind I’d be interested in how your doing in your project. You can send stuff to me at nonABCaligned.games@gmail.com (remove ABC).
/Alan
When I changed over to Linux, the biggest WOW was having multiple workspaces. Something most Windows users don’t have/understand (let alone the CHOICE at login of using another desktop. I’ve pretty well settled on Xfce). Though these days MS users can have “Power Windows” or such. I can’t believe MS don’t include it as standard.
I hung onto a windows PC box for a time (until it died) to play SPWAW and do some OCRing. Even now (though admittedly its been a while since investigating) OCRing on Linux isn’t as good as MS. That aside, if you still have the book you could try OCRing on Linux
I haven’t bothered with C++ as it seems to be targeted at writing very large programs. Besides a C++ compiler will compile C (so I’m safe for a long time). I still think one could/should hand code a Basic to C translator. Basic (ie. gwBasic and qBasic) has only about 150 statements and functions, most of which are seldom used. There is no discernable design/pattern as to how these Statements and Functions have been created/invented, hence my preference to hand coded parsing/translating. Further I have a list of these S & Fs and the translator issues a warning which is also included in the C program as a comment that ‘whatever’ needs attention/writing. I do expect a grammatically and syntactically correct program before attempting translation! However I have looked at the possibility of using regular expressions to assist parsing esp. where Basic uses a reserved word in multiple S & Fs.
After I saw your mention of DECUS, I did some internet digging. Apparently Super Star Trek is the pinnacle of this game, though there are many variations and it would be nice to incorporate ALL of the enhancements. Found a SST zip version which I hope to get into later.
There are a number of basic programs in technical books that demonstrate some concept graphically. This is why I am going down the terminal (stretchable to Curses) and SDL window approach. But I only create/program the SDL window if there is a basic Statement or Function that requires it.
Yes these old games are simple/trivial but newbies can see the correlation between Basic (easily understandable) and its C equivalent. Perhaps a C learning tool? and/or a quick prototyping tool?
My next hurdle (biggest?) is to run the basic program under DosBox or WINE or ?? next to that translated to C (and compiled) for surety of the translation
I think the thing that triggered my disgust with DOS was a batch file I was creating at the time. I was having a problem with setting a variable to persist over the running of the file. I would set the variable then test it a few statements later and found that it still had the old value (not the value I just set). Then the next run of the batch file I tried to set it to something new and it had the *OLD* value I had set in the first run of the BAT program. Ugh! That may of been the day I gave up on windows/dos.
When I switched over to linux I found OCRFeeder did a pretty accurate job of OCRing my text, although, admittedly, it did have a less refined user interface.
I’m not sure Generics/OOP/etc that distinguishes C++ from C would be of much use in translating a “Old Style” basic program (which is what I was aiming at) that doesn’t have any OOP requirements. Besides just adding structured code to a Basic program translation is a pretty significant improvement. C is a good language to work in. I still code stuff in C when there is no advantage to C++. C++ is complex but lots of fun to play with especially generics, OOP and so on. There is one case that I tripped across that could use C++ and that is the INPUT statement returning a string that has no ownership. It would be nice to have smart_pointers for this.
Have you tried some bigger Basic programs with your setup? (I’ve added some in an archive on this page) I got my version working on some of the smaller programs but then I started to trip over more run-time errors as the programs got larger.
I do have one program that requires graphics. I like your approach using SDL and might try that myself.
The Super Star Trek in the DECUS library is probably all VAX Fortran. I think I have a copy … let me know if you want me to make it available. There are multiple copies of Trek in my DECUS directories including one called MTrek. I’m not sure what the difference is. There is also another Trek-like game called Galaxy which could be played on multiple terminals at once (back in the VT100/mainframe days). It was lots of fun playing this game after hours and hearing from the other side of the building the “agony of defeat” as your torpedoes slammed into your opponent. It could probably be made to work on linux as a network game with some work. Maybe I should put it up on my Github account so it could be updated as a open source project. I’m a little hesitant to put it up as Fortran as I’m working on f2c (Fortran to C translator) and that would be a lot faster as long as I don’t procrastinate too much.
I’ve tried python-pcbasic which is a GW-Basic clone that runs on linux. It seemed to choke on the programs that I gave it but that might be because they are from a different interpreter.
My starting problem with DOS was the Basic’s 64K programming limit. It was so frustrating having a 486 with a Meg of RAM that can’t be used as 1 chunk. I broke thru that limit by going to PowerC (but, if I recall correctly, one still had to deal in 64k segments). But C was a bit of a learning curve compared with Basic. That’s where/why I wonder about a rapid prototyping language (perhaps Basic) which can be translated into C to then improve upon.
I used fineReader on a DOS/Windows PC for quite a number of years. It handled 2 column ..etc print.
Got an article off the net many years ago where someone (presumably a very competent programmer) claimed that C already had the C++ features of OOP. Its just that programmers couldn’t think/abstract enough to exploit C’s power. I don’t know (or had a reason to find out) about that. I do know that at times C++ templates may be helpful. And C++’s ability to differentiate between functions with the same name but different arguments could also be helpful.
Did you get my eMail and the very simple example. And did it compile and run OK. (or did I forget something).
It also illustrates why I’m OK with GOTOs. C has them (easy to translate) and they work, they’re not pretty but they work.
re :- INPUT Also if you look in the .h file, I include a “char dummy[80]” to cater for these types of statement’s requirement.
Perhaps you’re overthinking it. In early versions of Basic, all variables were Globals, the program owned them!?
I did copy (tediously by hand) a Basic program that modeled airflow over an airfoil from a book about “Subsonic Aerodynamics”. It was in gwBasic and rendered both text and graphics (hence my translation preference for text/terminal and graphics). My translator was less capable then, but it negated a lot of grunt work. (“Then” was back when I also had a DOS/Windows machine for output comparison/debugging and was converting to Linux). I still needed to modify the result but more in terms of unraveling some of the spaghetti code and became engrossed in making improvements that were now possible because of a faster machine & 2 larger windows (text & graphic) to work with.
The file sst.zip turned out to be Super Star Trek already converted to C. A little disappointing, but it compiled easily and I reacquainted myself with it. My first encounter with Star Trek was a gwBasic version and at startup it played the theme music (a WOW moment) on the small speaker of PCs of that era. This is another reason for my desire to include SDL, for SDL’s ability to play tunes. (done nothing yet but…)
Next chance I get I’ll send you the gwBasic airfoil program.
Thanks. Got your a.bas.c program working. Looks nice. Tried my b2c on the a.bas file but the “SCREEN 9” syntax I hadn’t seen before. b2c understands an ad-hoc version of Basic that was developed by testing with new programs until it didn’t choked on any of them. Which version of Basic is your program written in? Sorry, I was intending to comment on your program but it apparently leaked out of my brain between starting and ending my previous reply.
Have you tried any of the longer Basic games that I put up? These might be a challenge to translate.
Have you considered Python for prototyping? It seems ideally suited and it doesn’t seem to have the overhead of C++ and works with a lot of C libraries.
I think that C programmer that claimed that C had OOP in it was probably thinking in the same way that Assembler programmer can program OOP. In other words, it is such a low level language you can program any programming paradigm.
My planned solution to the GOTO problem was to use a little graph theory with the flow-of-control that b2c discovers in the Basic program and then matching the loops/etc with some common structural programming graphs. I also noticed some programming practices in Basic that when translated to C didn’t sit well. There is a program Bandit.bas that, instead of using indexes into a string array (for the tokens it displays: Cherries, Oranges, Banana etc) it uses the strings themselves and does string comparisons instead of int. Hand conversion to C would produce a trivial program whereas the machine translation is overly complex. I don’t have a good solution for that currently.
I went looking for SST and found it on Tom Almy’s site and DLed/compiled it. It seems to work fine. This sounds like a considerably more complex version than I used to play. I might poke through it and see if it is worth converting to ncurses.
https://almy.us/sst.html
I would like to test out your converter (b2c) with some Commodore BASIC 2.0 programs. Any chance of getting either the src (so I can build it on macos) or a macos binary I can test with please?
Hi d3bug, sorry about not getting back to you right away. The site got inundated with spam and I got distracted by other shiny things. Right now the project isn’t in a state that is usable generally (lots of dependencies on my system). I’ve been meaning to create a web page where people can translate their source, but again the time thing got in the way.
I can run your source through b2c and send back the results if that is OK.
Let me know,
/Alan