mirror of
https://github.com/pkali/feudal-economy.git
synced 2026-05-20 22:33:54 +02:00
492 lines
17 KiB
Markdown
492 lines
17 KiB
Markdown
---
|
|
title: "Turbo-Basic XL and Atari BASIC parser tool"
|
|
author: https://github.com/dmsc/tbxl-parser
|
|
book: true
|
|
classoption: [oneside]
|
|
titlepage: true,
|
|
titlepage-text-color: "FFFFFF"
|
|
titlepage-rule-color: "FFFFFF"
|
|
titlepage-rule-height: 0
|
|
titlepage-background: "expr.pdf"
|
|
...
|
|
|
|
# Turbo-Basic XL and Atari BASIC parser tool
|
|
|
|
This program parses and tokenizes a _Turbo-Basic XL_ or _Atari BASIC_ listing in
|
|
a flexible format and produces any of three outputs:
|
|
|
|
- A tokenized binary file, directly loadable in the original _Turbo-Basic XL_
|
|
(or _Atari BASIC_ if the `-A` option is given) interpreter. This mode also
|
|
replaces variables with single letters by default, but with the `-f` option
|
|
writes the full variable names and with the `-x` option writes empty variable
|
|
names, making the program unable to be listed or edited.
|
|
|
|
This is the default operating mode, and also can be forced with the `-b`
|
|
command line switch.
|
|
|
|
- A minimized listing, replacing variable names with single letters, using
|
|
abbreviations, removing spaces and using Atari end of lines.
|
|
|
|
This mode is selected with the `-s` command line switch. Adding the `-f`
|
|
option keeps the names of variables with 2 or less characters.
|
|
|
|
- A pretty printed expanded listing, with one statement per line and
|
|
indentation, and standard ASCII line endings.
|
|
|
|
Note that this format can be read back again, but some statements are
|
|
transformed in the process, this can lead to problems in non-standard
|
|
`IF`/`THEN` constructs.
|
|
|
|
Currently, `IF`/`THEN` with statements after the `THEN` are converted to
|
|
multi-line `IF`/`ENDIF` statements.
|
|
|
|
This mode is selected with the `-l` command line switch.
|
|
|
|
|
|
## Example Programs
|
|
|
|
The following is an example of a simple program in free form:
|
|
|
|
```purebasic
|
|
' Example program
|
|
|
|
' One statement per line:
|
|
print "Hello All"
|
|
print "---------"
|
|
print "This is a heart: \00"
|
|
|
|
' Also, multiple statements per line:
|
|
for counter = 0 to 10 : ? "Iter: "; counter : next counter
|
|
|
|
' Line numbers
|
|
30
|
|
' And abbreviations:
|
|
g. 30
|
|
|
|
```
|
|
|
|
To generate a tokenized BAS file, loadable by _Turbo-Basic XL_, simply type:
|
|
|
|
basicParser samples/sample-1.txt
|
|
|
|
This will generate a `sample-1.bas` file in the same folder.
|
|
|
|
If on the other hand you want a minimized listing file in ATASCII format (suitable
|
|
for `ENTER` into _Atari BASIC_, type:
|
|
|
|
basicParser -l -A samples/sample-1.txt
|
|
|
|
This will generate a `sample-1.lst` file in the same folder.
|
|
|
|
There are more sample programs, located in the `samples` folder that illustrate
|
|
the free-form input format.
|
|
|
|
|
|
## Input listing format
|
|
|
|
The parser accepts standard listings for _Atari BASIC_ or _Trubo-Basic XL_
|
|
programs, with Atari or ASCII end of lines.
|
|
|
|
All the standard abbreviations available in the original interpreters are also
|
|
accepted.
|
|
|
|
As with _Turbo-Basic XL_, the input is case insensitive (uppercase, lowercase
|
|
and mixed case is supported).
|
|
|
|
|
|
### Line numbers
|
|
|
|
You can omit line numbers, only lines that are target to `GOTO` / `GOSUB` /
|
|
`THEN` needs them. If you use only labels, no line numbers are needed.
|
|
|
|
Also, line numbers can appear alone in a line, for better readability.
|
|
|
|
|
|
### Comments
|
|
|
|
Comments can be started by `'` in addition to the _Turbo-Basic XL_ `.`, `--`
|
|
or `rem`. In short listing an tokenized output formats all comments are
|
|
removed unless the `-k` option is given.
|
|
|
|
All comment types are supported in _Atari BASIC_ mode.
|
|
|
|
|
|
### Special characters inside string constants
|
|
|
|
Inside strings, special characters can be specified by using a backslash
|
|
followed by an hexadecimal number in upper-case, (i.e., `"\00\A0"` produces a
|
|
string with a "heart" and an inverse space "♥█"), this allows editing special
|
|
characters on any standard editor.
|
|
|
|
Note that to force a backslash before a valid hex number, you can use two
|
|
backslashes (i.e., ``"123\\456"`` produces ``123\456``).
|
|
|
|
|
|
### Extended string constants
|
|
|
|
There is support for extended strings, with embedded character names.
|
|
|
|
Extended strings start with with `["` and ends with `"]`, and can contain:
|
|
|
|
- Special characters with `{name}` or `{count*name}`, with count a decimal
|
|
number and name from the list:
|
|
`heart`, `rbranch`, `rline`, `tlcorner`, `lbranch`, `blcorner`, `udiag`,
|
|
`ddiag`, `rtriangle`, `brblock`, `ltriangle`, `trblock`, `tlblock`,
|
|
`tline`, `bline`, `blblock`, `clubs`, `brcorner`, `hline`, `cross`, `ball`,
|
|
`bbar`, `lline`, `bbranch`, `tbranch`, `lbar`, `trcorner`, `esc`, `up`,
|
|
`down`, `left`, `right`, `diamond`, `spade`, `vline`, `clr`,
|
|
`del`, `ins`, `tbar`, `rbar`, `eol`, `bell`.
|
|
|
|
- Inverse video characters surrounded by `~`.
|
|
|
|
- Multiple lines, you can terminate the string in a different line than the
|
|
start. Note that this will embed end-of-line characters in the string, so it
|
|
will only work in tokenized output, not short-listing output.
|
|
|
|
|
|
### Parameters and local variables for `PROC`
|
|
|
|
Arguments follow the `PROC` label after a comma, and local variables follow
|
|
after a semicolon:
|
|
```purebasic
|
|
D = 3
|
|
EXEC Testing, D+5, "Hello"
|
|
PRINT D
|
|
PROC Testing, A, B$(10); D
|
|
D = A + 1
|
|
PRINT D; " and "; B$
|
|
ENDPROC
|
|
```
|
|
|
|
As the example shows, string variables must include the dimensioned length,
|
|
as the parser adds a `DIM` at the start of the program to initialize. The
|
|
dimensioned length must be an integer, a `$define` or a `%` number.
|
|
|
|
Also, setting the value of variable "D" inside the procedure does not alter
|
|
the value of the variable "D" outside the procedure.
|
|
|
|
The parser transform this construct by creating new variables that hold the
|
|
parameters and local variables, so the resulting procedures don't support
|
|
recursion.
|
|
|
|
|
|
### Syntax from _Turbo-Basic XL_ in _Atari BASIC_
|
|
|
|
Some of the extra statements from _Turbo-Basic XL_ are supported even in _Atari
|
|
BASIC_ output mode, those are converted to equivalent forms:
|
|
|
|
- Multi-line `IF`/`ENDIF` statements are converted to `IF`/`THEN`.
|
|
|
|
- The `%0` to `%3` tokens are converted to the numbers 0 to 3.
|
|
|
|
- `PUT` without I/O channel is converted to `PUT #16`. This relies on a bug
|
|
in _Atari BASIC_ that makes I/O channel 16 equal to 0.
|
|
|
|
- String constants are converted to decimal constants.
|
|
|
|
|
|
### Parsing directives
|
|
|
|
There are parsing *directives* added, that consist on lines starting with a
|
|
dollar sign `$`. A list of available directives is documented bellow.
|
|
|
|
|
|
## Program Usage
|
|
|
|
basicParser [options] [-o output] filenames
|
|
|
|
Options:
|
|
|
|
- `-n nun` Sets the maximum line length before splitting lines to `num`.
|
|
Note that if a single statement is longer than this, the line
|
|
is output anyway.
|
|
The default is 120 characters (the standard Atari Editor limit)
|
|
|
|
- `-l` Output long (readable) listing, suitable for editing, with standard
|
|
end of lines and lowercase statements.
|
|
|
|
- `-s` Output a short, minimized listing, with ATASCII end of lines. The
|
|
default output file name is the same as input with `.lst` extension
|
|
added.
|
|
|
|
- `-b` Output a binary tokenized file instead of a listing. The default
|
|
output file name is the same as input with `.bas` extension added. Note
|
|
that this is the default behaviour.
|
|
|
|
- `-A` Accept (and produce) standard _Atari BASIC_ language, without the
|
|
extended statements and syntax. Note that some of the optimizations are
|
|
specific to _Turbo-Basic XL_ and won't run in this mode.
|
|
|
|
- `-x` In binary output mode, writes null variable names, making the program
|
|
unlistable. This options does nothing on listing output.
|
|
|
|
- `-f` In binary output mode, writes the full variable names, this eases
|
|
debugging the program. In short listing mode, keeps the names of
|
|
variables with less than two characters, renaming all longer or invalid
|
|
names.
|
|
|
|
- `-k` In binary output mode, keeps comments in the output. Note that only
|
|
standard comments are included, not new style (`'`) comments.
|
|
|
|
- `-a` In long output, replace Atari characters in comments with
|
|
approximating characters.
|
|
|
|
- `-v` Shows more parsing information, like name of renamed variables.
|
|
(verbose mode)
|
|
|
|
- `-q` Don't show any parsing output, only errors. (quiet mode)
|
|
|
|
- `-o` Sets the output file name. By default, the output is the name of the
|
|
input with `.lst` (listing) or `.bas` (tokenized) extension. If the
|
|
given name starts with a dot, use as output file name extension.
|
|
|
|
- `-c` Output to standard output instead of a file.
|
|
|
|
- `-O` Enables parser optimizations to produce smaller or faster code. Without
|
|
and argument enables all optimizations, an argument can be given
|
|
similar to the `optimize` directive in the code, see bellow for the
|
|
possible options. The option can be specified multiple times, an
|
|
example for producing short listings is `-O -O -convert_percent -O
|
|
-const_replace`
|
|
|
|
- `-h` Shows help and exit.
|
|
|
|
|
|
## Parser directives
|
|
|
|
Directives add extra features to the parser, much like C and C++. Directives
|
|
start with a dollar as the first non blank character on a line, and continue
|
|
up to the end of the line.
|
|
|
|
Bellow is a description of available directives.
|
|
|
|
### `$options` directive.
|
|
|
|
The options directive alter the way the parsing is done, accepting a list of
|
|
comma separated options, valid for the current file. Valid options:
|
|
|
|
- `mode=compatible`: Disable features to be more compatible with the
|
|
_Turbo-Basic XL_ parser.
|
|
- `mode=extended`: Makes the parser to accept more extended features.
|
|
- `mode=default`: Returns the parser to the default mode.
|
|
- `optimize` or `+optimize`: Allows the parser to optimize the output to
|
|
produce smaller or faster code.
|
|
- `-optimize`: Disable the optimizations.
|
|
- `optimize=+`*suboption*: Enable the particular optimization option.
|
|
- `optimize=-`*suboption*: Disable the particular optimization option.
|
|
|
|
The optimization sub-options are:
|
|
|
|
- `const_folding`: Replace operations on constants with the result.
|
|
- `convert_percent`: Replace small integers with the `%*` equivalent, this is
|
|
only available in _Turbo-Basic XL_ mode.
|
|
- `commute`: Swap arguments to binary operations to minimize runtime.
|
|
- `line_numbers`: Remove all BASIC line numbers that are unused.
|
|
- `const_replace`: Replace repeated constant values (numeric or string) with
|
|
a variable initialized to the value. The initialization code is added
|
|
before any statement in the program, and tries to use the minimum number
|
|
of bytes posible.
|
|
- `fixed_vars`: This is the complement of the `const_replace` option, tries to
|
|
identify variables with a fixed value in the whole program and removes the
|
|
variable. Use this optimization when converting original basic listings, as
|
|
reversing the constant replacing gives a simpler listing and allows to apply
|
|
further optimizations. Note that currently this option can produce bad
|
|
results, as it does not follows the program flow and can't detect if a
|
|
variable is used before the first assignment, so it is not enabled by
|
|
default. You need to check each removed variable, as printed in the output
|
|
and in the comments in the resulting program.
|
|
- `then_goto`: Searches `IF` statements with `THEN GOTO` and removes the `GOTO`
|
|
statement, replacing with the line number alone.
|
|
Note: If the line number is not a constant, the resulting program will be
|
|
executed and listed correctly by both _Atari BASIC_ and _Turbo-Basic XL_, but
|
|
can't be entered because of an original parser limitation. Therefore, this
|
|
conversion is only done for constant values when the output is a short listing.
|
|
|
|
Example: `IF X THEN GOTO 100` becomes `IF X THEN 100`
|
|
- `if_goto`: Performs the same optimization as `then_goto`, but also replaces
|
|
instances of multi-line `IF` statements containing a `GOTO` with `THEN` and
|
|
the target line number.
|
|
|
|
This optimization is not enabled by default because it can produce larger
|
|
code by forcing a newline in the file.
|
|
|
|
Example:
|
|
```
|
|
IF X
|
|
GOTO 100
|
|
ENDIF
|
|
```
|
|
becomes
|
|
```
|
|
IF X THEN 100
|
|
```
|
|
|
|
Note that options can be changed at any place in the file, this is an example
|
|
of changing the parser mode in the middle of the file:
|
|
|
|
```purebasic
|
|
' Example program using directives
|
|
$ options optimize, mode=default
|
|
error1 = 2
|
|
? error1 : ' This is parsed like Turbo-Basic XL, as ? ERR OR 1
|
|
|
|
$options mode = extended
|
|
? error1 : ' This is parsed as ? error1
|
|
Printa : ' This is a parsing error.
|
|
```
|
|
|
|
A good optimization mode for producing short listings is:
|
|
|
|
```
|
|
$options +optimize, optimize=-convert_percent-const_replace
|
|
```
|
|
|
|
The above line instructs the parser to avoid converting numbers to `%` values
|
|
and the replacement of constants, producing a smaller listing. Note that
|
|
replacement of constants can be beneficial, so try enabling the optimization
|
|
and running with "-v" option to see what variables are good candidates for
|
|
replacement.
|
|
|
|
### `$define` directive.
|
|
|
|
This directive defines new symbols that are replaced at parsing time with the
|
|
values, like C macros.
|
|
|
|
Replacement names are prefixed by `@` to differentiate from variables, and
|
|
as variables, string defines end in `$`, the syntax of the directive is:
|
|
|
|
`$define` *defineName* `=` *value*
|
|
|
|
Keep in mind that as the value is replaced each time the variable is used, it
|
|
is probably best to assign them to a variable instead if the value will be
|
|
used multiple times, and you should enable optimizations so that the usage is
|
|
simplified at parsing time.
|
|
|
|
This is an example usage of the `$define` directive:
|
|
|
|
```purebasic
|
|
' Example usage of defines
|
|
$options +optimize
|
|
$define Message$ = "Hello world!"
|
|
$define PCOLR0 = $2C0
|
|
|
|
print @Message$ : ' Replaced by: ? "Hello world!"
|
|
print len(@Message$) : ' Replaced by: ? 12
|
|
poke @PCOLR0+2, $1F : ' Replaced by: POKE 706,31
|
|
```
|
|
|
|
### `$incbin` directive.
|
|
|
|
This directive allows including data from a binary file to a new string
|
|
definition. The content of the file is read at parsing time and the full
|
|
content is stored in the define. The syntax of the directive is:
|
|
|
|
`$incbin` *defineName$* `, "`*fileName*`"` [ , *offset* [, *length* ] ]
|
|
|
|
The optional *offset* parameter specifies a starting offset in bytes for
|
|
the included data, and the optional *length* parameter specifies the number
|
|
of bytes to read. If *length* is not given, the file read completely.
|
|
|
|
This is an example usage of the `$incbin` directive:
|
|
|
|
```purebasic
|
|
$options +optimize
|
|
$incbin asmBin$, "myasm.bin"
|
|
|
|
asmRut = adr( @asmBin$ ) : ' Store address in variable to use multiple times.
|
|
? usr(asmRut, 1, 2) : ' Call routine. Should be relocatable and less than 242 bytes.
|
|
```
|
|
|
|
### `$incdata` directive.
|
|
|
|
This directive allows including data from a binary file to a `DATA` BASIC
|
|
statement. The content of the file is read at parsing time and the full content
|
|
is stored as is. The syntax of the directive is:
|
|
|
|
`$incdata` `"`*fileName*`"` [ , *offset* [, *length* ] ]
|
|
|
|
The optional *offset* parameter specifies a starting offset in bytes for the
|
|
included data, and the optional *length* parameter specifies the number of
|
|
bytes to read. If *length* is not given, the file read completely.
|
|
|
|
Note that you can use this directive to store arbitrary bytes inside the
|
|
statement, but BASIC parses the actual data at `READ` time.
|
|
|
|
|
|
## Limitations and Incompatibilities
|
|
|
|
There are some incompatibilities in the way the source is interpreted with the
|
|
standard _Turbo-Basic XL_ and _Atari BASIC_ parsers:
|
|
|
|
- The ASCII LF character (hexadecimal $10) is interpreted as end of line in
|
|
addition to the ATASCI EOL (hexadecimal $9B). This means that in `DATA`
|
|
statements and comments the LF character is not accepted.
|
|
|
|
- The parsing of special characters inside strings means that a valid hexadecimal
|
|
sequence (`\**`, with `*` an hexadecimal number in uppercase) or two backslashes
|
|
are interpreted differently.
|
|
|
|
- Extra statements after an `IF`/`THEN`/`LineNumber` are converted to a comment,
|
|
with the exception of `DATA` statements.
|
|
In the original, those statements are never executed, so this is not a problem
|
|
with proper code.
|
|
|
|
- Any string is accepted as a variable name, even if it is already an statement,
|
|
function name or operator.
|
|
|
|
The following code is valid:
|
|
|
|
```purebasic
|
|
PRINTED = 0 : ' Invalid in Atari BASIC, as starts with "PRINT"
|
|
DONE = 3 : ' Invalid in Turbo-Basic XL, as starts with "DO"
|
|
```
|
|
|
|
This relaxed handling of variable naming creates an incompatibility, as the
|
|
first example above is parsed differently as the standard _Atari BASIC_,
|
|
where it means "`PRINT (ED = 0)`" instead of "`LET PRINTED = 0`".
|
|
|
|
Note that currently, even full statements are accepted as variable names,
|
|
but avoid using them as they could produce hard to understand errors.
|
|
|
|
- In long format listing output, `IF`/`THEN` are converted to `IF`/`ENDIF`
|
|
statements. This introduces an incompatibility with the following code:
|
|
|
|
```purebasic
|
|
FOR A = 0 TO 2
|
|
? "A="; A; " - ";
|
|
IF A <> 0
|
|
? "1";
|
|
IF A = 1 THEN ELSE
|
|
? "2";
|
|
ENDIF
|
|
? " -"
|
|
NEXT A
|
|
```
|
|
|
|
This code should produce the following at output:
|
|
|
|
```
|
|
A=0 - 2 -
|
|
A=1 - 1 -
|
|
A=2 - 12 -
|
|
```
|
|
|
|
After conversion, the `ELSE` is associated with the second `IF` instead
|
|
of the first, giving the wrong result.
|
|
|
|
- Parsing of `TIME$=` statement allows a space between `TIME$` and the equals
|
|
sign, but in _Turbo-Basic XL_ this gives an error.
|
|
|
|
|
|
## Compilation
|
|
|
|
To compile from source, you need `gawk` and `peg`, both are available in any
|
|
recent Debian or Ubuntu Linux distro, install with:
|
|
|
|
apt-get install gawk peg
|
|
|
|
To compile, simply type `make` in the sources folder, a folder `build` will be
|
|
created with the executable program inside.
|
|
|
|
|