Re: BASIC for the CBM-II/8088

From: Michał Pleban <*lists_at_michau.name*> Date: Thu, 05 Jul 2018 14:59:49 +0200 Message-ID: <5B3E1645.4010603@michau.name> · lists_at_michau.name

Baltissen, GJPAA (Ruud) wrote:

> So far I wrote what you can call the editor: the part that handles
> cursor movements, prints the typed chars on the screen and reads the
> line where the cursor is when "Enter" is pressed. 

I understand that you are wrting this for a 8088 CPU? If so, how do you
access the screen data? Or do you just allow entering one line at a time
(like MS-DOS command line)?

> I know the original Commodore BASIC saves the lines in memory using
> tokens to replace the original text. I have been thinking about to skip
> this step. The disadvantages are that:
> - I will need more memory to save the lines, but not that much more IMHO
> - I will loose speed because the program line needs to be checked first
> at run time

The time needed to tokenize one line as the user is entering it is
neglible - you may spend 100 or 200 ms on it, but the user will not even
notice this delay as he is happily typing the next line. Whereas you
will save lots of time on execution, and that's what the user will notice.

> Another option is to compile the program completely and to run the
> resulting executable. The main disadvantage: I'll need memory to store
> it. Storing it on and running it from disk is an option but when this
> disk is a floppy disk then I have my doubts: speed. Again comment on
> this and other ideas are welcome.

Writing a BASIC compiler is quite a complicated affair. If you want to
have fun coding it in assembler, you may do it, but it will be lots of work.

> Storing the variables. Again I don't know exactly how BASIC saves it
> variables. What I do know is that BASIC shortens a given variable to
> only two characters. It simply means that Commodore BASIC sees the
> variables BYTE1 and BYTE2 as one and the same variable. I only found out
> my self yeeeears ago after a lot of frustration. But what is a
> reasonable length then? What ever length I will choose, I'll give every
> variable its own code. I'm not sure at this moment how it will look
> like. But this code is going to be stored in the tokenised instead of
> the original name as Commodore BASIC does. This will make the program
> shorter.

The first solution that came to my mind is to store some kind of hash of
the variable name in place of the name itself. The hash will obviously
have a fixed length. However, you need to choose the hash length wisely
to make possible vairable name collisions unlilkely enough.

Also, Commodore BASIC does _not_ tokenize variable names in the program.
They are always stored verbatim, and you should do the same.

> And how much memory should be reserved for a variable? The length of a
> byte, integer or real is known. The length of a string can vary.
> Choosing a fixed length has the advantage that we don't have a need for
> garbage collection (I think). But the disadvantage id that we probably
> will need more memory. And again comment on this and other ideas are
> welcome.

I would definitely advise against allocating a fixed length for strings.

> Where to store everything? First we need to know whether we are dealing
> with a 128 KB or better. When the machine is started, BASIC and another
> needed file is stored in the first 64 KB. My idea is to store the
> variables on a 128 KB machine in the rest of the memory of the first
> segment, about 32 KB. Or maybe the other way around? (but I don't think so)

If you are coding for the 8088, you really don't need to limit yourself
rigidly to the two or four 64 kB segments. 8088 has quite flexible
segment registers, so you can divide the memory more easily. What I
would do is to store the program text growing upwards from the bottom of
memory, and the variables growing downwards from the top of memory.

Regards,
Michau.