BASIC Interpreter, Part IV

This is a continuation of Part III. This time we will add some mathematical operators, the ability to save and load programs, and support numerical variables.

To enable mathematical operations, we will create a subclass of DoubleExpression that takes two DoubleExpressions as inputs, as well as a character code signifying which operation is to be performed. Because we want to be able to support variable expressions a little later, we can’t evaluate the operation when it is entered. Instead, we will keep pointers to the child expressions, and evaluate them when the operation’s value is required. To do this, we will need the ability to get the numerical value of the child expressions. We already have a member function called value on the Expression class and its inheritors, but that returns a string, not a double. In retrospect, perhaps our naming convention was not the best, so let’s go back and change it. We will rename the print function to list, since it is to support the LIST statement, and rename the value function to print, since its purpose is to support the PRINT statement. Here is the new expression.h header file:

#ifndef _EXPRESSION_H_
#define _EXPRESSION_H_

#include <string>

/*
Base class used for storing and evaluating data items
*/
class Expression {
public:
	virtual const std::string print() const;		// return a string version for PRINT
	virtual const std::string list() const;			// printable version suitable for LIST
};

#endif

Change the function names in the source file, expression.cpp, also:

#include "expression.h"

// return the text value
const std::string Expression::print() const{
	return std::string("BASE EXPRESSION");
}

// return a string for printing
const std::string Expression::list() const{
	return std::string("BASE EXPRESSION");
}

Make the same substitutions in the StringExpression header and source files, and in the Print::execute and Print::list functions.

For DoubleExpression, make the same substitutions, add a new value function, and make them all virtual. Here are the new function declarations in the header file, doubleexpression.h:

	virtual const std::string print() const;	// return the stored value
	virtual const std::string list() const;		// printable version
	virtual double value() const;			// numerical evaluation

Here are the updated member functions in doubleexpression.cpp:

// return the text value
const std::string DoubleExpression::print() const{
	return std::to_string(d);
}

// return a string for printing
const std::string DoubleExpression::list() const{
	return print();
}

// return the value
double DoubleExpression::value() const{
	return d;
}

Now we’re ready to add our new class, OperatorExpression. As described earlier, here is the header file, operatorexpression.h:

#ifndef _OPERATOREXPRESSION_H_
#define _OPERATOREXPRESSION_H_

#include "doubleexpression.h"

/*
Class for performing mathematical operations on DoubleExpressions.
*/
class OperatorExpression : public DoubleExpression {
public:
	OperatorExpression(DoubleExpression *a, DoubleExpression *b, char op);
	~OperatorExpression();				// delete the sub-expressions
	
	const std::string print() const;		// return the stored value
	const std::string list() const;			// printable version
	double value() const;				// value of performed operation

private:
	DoubleExpression *a, *b;			// expressions on which to operate
	char op;					// operation to perform
};

#endif

The implementation is actually quite simple. We have a delete operator function to clean up child expressions. The print function will return a string version of the numerical evaluation. The list function will print out the sub-expressions, separated by the operator character. The bulk of the logic is in the value function, where we switch on the op member variable and perform the appropriate mathematical operation. Here is the source code, operatorexpression.cpp:

#include <cmath>

#include "operatorexpression.h"

OperatorExpression::OperatorExpression(DoubleExpression *a, DoubleExpression *b, char op) : DoubleExpression(0) {
	this->a = a;
	this->b = b;
	this->op = op;
}

OperatorExpression::~OperatorExpression(){
	delete a;
	delete b;
}

const std::string OperatorExpression::print() const{
	return std::to_string(value());
}

const std::string OperatorExpression::list() const{
	return a->list() + op + b->list();
}

double OperatorExpression::value() const{
	switch( op ){
		case '+':
			return a->value() + b->value();
		case '-':
			return a->value() - b->value();
		case '*':
			return a->value() * b->value();
		case '/':
			return a->value() / b->value();
		case '^':
			return exp(log(a->value()) * b->value());
	}
}

Now that we have added support in our C++ code, we need to add it in our Bison and flex files. Starting with the Bison input file, basic.y, add the new header file to the includes part of the C declarations:

#include "operatorexpression.h"

Add a new token type for DoubleExpressions:

// token type definition
%union {
	int iVal;
	double dVal;
	char *sVal;
	Program *progVal;
	Expression *eVal;
	DoubleExpression *dxVal;
	std::vector<Expression*> *eList;
}

Add some new token types to support our operators:

%token PLUS
%token MINUS
%token MULT
%token DIV
%token EXP

And add token types for the new non-terminal symbols we’ll be creating. We’ll create a hierarchy of expression types to enforce operator precedence (note: Bison also supports defining operator precedence directly):

%type <dxVal> doubleExpr
%type <dxVal> addExpr
%type <dxVal> mulExpr
%type <dxVal> expExpr
%type <dxVal> term

The philosophy for creating the operator precedence by rule hierarchy is to start with the lowest precedence operator, and let it be either an instance of the next-higher rule, or two instance of the next-higher rule separated by the current level operator. E.g., an add is a mul, or a mul + mul, or a mul – mul. This ensures that higher-precedence operators will be evaluated first.

Our old definition for the expr rule had it as either a STRING or a DOUBLE terminal symbol. We are going to replace the DOUBLE symbol with a non-terminal symbol, the doubleExpr rule, which will expand into the operations we need to support. Here are the updated rules:

expr:
	STRING			{
					$$ = new StringExpression($1);
					free($1);	// malloced in basic.l
				}
	| doubleExpr
	
doubleExpr:
	addExpr
;

addExpr:
	mulExpr
	| mulExpr PLUS mulExpr	{ $$ = new OperatorExpression($1, $3, '+'); }
	| mulExpr MINUS mulExpr	{ $$ = new OperatorExpression($1, $3, '-'); }
;

mulExpr:
	expExpr
	| expExpr MULT expExpr	{ $$ = new OperatorExpression($1, $3, '*'); }
	| expExpr DIV expExpr	{ $$ = new OperatorExpression($1, $3, '/'); }
;

expExpr:
	term
	| term EXP term		{ $$ = new OperatorExpression($1, $3, '^'); }
;

term:
	DOUBLE			{ $$ = new DoubleExpression($1); }

With our new tokens defined in the Bison input file, we need to update the flex input file. Add a declaration for the new DoubleExpression in the C declarations section:

class DoubleExpression;

And add rules for the new tokens. These are escaped because some of these characters have special meaning in the regular expression pattern matching, but we just want the literal character:

\+				{ return PLUS; }
-				{ return MINUS; }
\*				{ return MULT; }
\/				{ return DIV; }
\^				{ return EXP; }

Lastly, update the makefile with the new OperatorExpression source files:

all:	basic.tab.c lex.yy.c \
		basic.h basic.cpp \
		program.h program.cpp \
		print.h print.cpp \
		expression.h expression.cpp \
		stringexpression.h stringexpression.cpp \
		doubleexpression.h doubleexpression.cpp \
		operatorexpression.h operatorexpression.cpp
	g++ basic.tab.c lex.yy.c program.cpp basic.cpp print.cpp expression.cpp \
	stringexpression.cpp doubleexpression.cpp operatorexpression.cpp \
	-o basic

Now you can print out mathematical expressions using operators! Here is a sample session:

Welcome to BASIC!
>10 print 2+2*3^7
>20 print "whoa!"
>list
10 PRINT 2.000000+2.000000*3.000000^7.000000
20 PRINT "whoa!"
>run
4376.000000
whoa!

The next step is to add support for variables. Support in the parser is pretty simple; it is added into the term rule, as an alternative to a literal double; and a new statement: LET. In the lexical analyzer, the rule is also pretty simple: a variable is an alpha character optionally followed by a single digit.

To perform variable storage and access operations, we will add a map to our Basic singleton instance that maps a string to a double, as well as function to assign and resolve variables.

In the header file, basic.h, include the standard header for the string class:

#include <string>

Add two new public member variables:

	void assign(std::string var, double value);	// assign a value to a variable
	double resolve(std::string var);		// return variable value

And add a private member variable to store the variables:

	std::map<std::string, double> vars;	// store variables

Here are the two new function implementations in basic.cpp:

// assign a value to a variable
void Basic::assign(string var, double value){
	// delete value if var already exists
	map<string, double>::iterator it = vars.find(var);
	if( it != vars.end() ){
		vars.erase(it);
	}
	
	vars.insert(pair<string, double>(var, value));
}

// return variable value
double Basic::resolve(string var){
	map<string, double>::iterator it = vars.find(var);
	if( it != vars.end() ){
		return it->second;
	}
}

I also made a tweak to the Basic::remove function: instead of calling lines.erase(index);, I changed it to lines.erase(it);, since the former version requires searching through the map again, whereas the latter version makes use of the iterator we already found, making it slightly faster.

Now we need to add a new class to represent our variables, a subclass of DoubleExpression called, naturally, VariableExpression. This class will implement the virtual functions defined in its parent class, using a string member variable to look up the value in the Basic singleton instance. Here is the header file, variableexpression.h:

#ifndef _VARIABLEXPRESSION_H_
#define _VARIABLEXPRESSION_H_

#include <string>

#include "doubleexpression.h"
#include "basic.h"

/*
This class is used to access variable inside an expression
*/
class VariableExpression : public DoubleExpression {
public:
	VariableExpression(char *name);

	const std::string print() const;		// return the stored value
	const std::string list() const;			// printable version
	double value() const;					// numerical evaluation

private:
	std::string var;
};

#endif

The implementation is pretty simple. The constructor takes a string to use as a lookup index, print() returns a string-version of the stored value, list() returns the variable name, and value() looks up and returns the stored value from the Basic singleton instance. Here is variableexpression.cpp:

#include "variableexpression.h"

VariableExpression::VariableExpression(char *name) : DoubleExpression(0) {
	var = name;
}

// return the stored value
const std::string VariableExpression::print() const{
	return std::to_string(value());
}

// printable version
const std::string VariableExpression::list() const{
	return var;
}

// numerical evaluation
double VariableExpression::value() const{
	return Basic::instance()->resolve(var);
}

In order to assign a value to a variable, we will use the LET statement, which looks like this:

LET X = 1
LET Y1 = 2
LET Z = X+Y1

To do this, we need to create a new subclass of Program. This class will implement the virtual functions list() and execute(), and provide storage for the variable name and expression whose value will be assigned. We need to keep a pointer to the expression and not just evaluate it immediately because it could contain references to other variables whose value we will not know until we run the program. Here is the header file, let.h:

#ifndef _LET_H_
#define _LET_H_

#include <string>

#include "program.h"
#include "doubleexpression.h"
#include "basic.h"

/*
The Let class provides variable assignment capability.
*/
class Let : public Program {
public:
	Let(char *var, const DoubleExpression *expression);	// create a new LET assignment
	~Let();												// clean up

	void execute() const;				// run this line of the program
	void list(std::ostream& os) const;	// list this line
	
private:
	std::string var;
	const DoubleExpression *expression;
};

#endif

The implementation of this class is also pretty simple. The constructor stores the variable name and expression; the destructor deletes the expression (since the variable name is not stored in a pointer, we don’t need to delete it); the function execute(), which is called at program runtime, evaluates the expression and calls the Basic singleton instance to store it, and list() prints out the statement line. Here is the source of let.cpp:

#include "let.h"

Let::Let(char *var, const DoubleExpression *expression){
	this->var = var;
	this->expression = expression;
}

// clean up the pointer
Let::~Let(){
	delete expression;
}

// store the value of the Expression in the Basic vars map
void Let::execute() const{
	Basic::instance()->assign(var, expression->value());
}

// list this LET statement
void Let::list(std::ostream& os) const{
	os << "LET " << var << " = " << expression->list();
}

OK, time to update Bison and flex. In your Bison input file, include the header files for the new VariableExpression and Let classes:

#include "variablexpression.h"
#include "let.h"

Add new constant tokens to support the LET statement:

%token LET
%token EQUAL

and a new terminal symbol to represent variables:

%token <sVal> VAR

Since we have added a new subclass of Program, we need to update its rule:

program:
	PRINT exprList			{ $$ = new Print($2); }
	| LET VAR EQUAL doubleExpr	{
						$$ = new Let($2, $4);
						free($2);	// malloced in basic.l
					}
;

We also need to update the term rule, which can now be either a literal double, or a variable name:

term:
	DOUBLE			{ $$ = new DoubleExpression($1); }
	| VAR			{
					$$ = new VariableExpression($1);
					free($1);
				}
;

In the flex input file, we need to add scanners for the new terminal token types we have defined. Per the BASIC documentation we are following, a variable name consists of an alphabetic character optionally followed by a single digit. Let’s add an ALPHA rule by our existing DIGIT rule:

ALPHA    [A-Z]	
DIGIT    [0-9]

Here are the new scanners:

{ALPHA}{DIGIT}?	{ yylval.sVal = strdup(yytext); return VAR; }
LET		{ return LET; }
\=		{ return EQUAL; }

The VAR scanner needs to return the actual text value it read, as well as the token type; the other two just need to return the token type.

Now, add the let.* and variableexpression.* dependencies to the make file, and run it:

Welcome to BASIC!
>10 let a = 1
>20 let b = a*2
>30 print a, b
>list
10 LET a = 1.000000
20 LET b = a*2.000000
30 PRINT a, b
>run
1.000000 2.000000

The last thing we want to add right now is the ability to save and load programs. To do this, we will implement the BASIC commands outline in the guide: SAVE, UNSAVE, NEW, and OLD. To save a program, we will repurpose our existing list functionality but directed into a file output stream instead of standard output. Likewise, to read a saved file, we will change the yyin file pointer to point to a file that we open, instead of standard input. There will be some minor changes to the Bison and flex input files to look for these commands, but the heavy lifting will take place in the Basic class.

In the header file, basic.h, we need to include <iostream>, which we will use in a modification to list(). We also add four new public functions corresponding to the new commands we are implementing, as well as a private function to erase all lines of the current program, and a private variable containing the name of the current program.

#ifndef _BASIC_H_
#define _BASIC_H_

#include <map>
#include <string>
#include <iostream>

#include "program.h"
#include "doubleexpression.h"

/*
This is a singleton class which contains all the program lines.
*/
class Basic {
public:
	void add(int index, const Program *program);		// add a new line to the program
	void list(std::ostream &out);				// list out all the existing lines
	void execute();						// run the program
	void remove(int index);					// remove program line
	void saveProgram();					// save active program to disk
	void unsaveProgram();					// delete saved program from disk
	void newProgram();					// start a new program
	void oldProgram();					// load program from disk
	
	static Basic *instance();				// access the singleton instance

	void assign(std::string var, double value);		// assign a value to a variable
	double resolve(std::string var);			// return variable value
	
private:
	void erase();						// clear stored program lines
	
	std::map<int, const Program*> lines;			// store the lines in a map
	std::map<std::string, double> vars;			// store variables
	std::string name;					// name of active program
	
	static Basic *b;					// singleton instance
};

#endif

Now, for the implementation. At the top of basic.cpp, include <fstream> and <cstdio>. We will use these for reading and writing files. Also, make a declaration of the external variable yyin (the input stream used by flex and Bison), since we will be re-directing it for file operations:

extern "C" FILE *yyin;

Since we changed the method signature of list to take an ostream as an argument, we need to update the implementation to use that instead of std::cout:

// print out the program lines
void Basic::list(std::ostream &out){
	for( map<int, const Program *>::iterator it = lines.begin(); it!= lines.end(); ++it ){
		out << it->first << " "; it->second->list(out);
		out << std::endl;
	}
}

Our first new function, saveProgram(), opens a file output stream with the member variable name and calls the new list() function:

// save active program to disk
void Basic::saveProgram(){
	std::ofstream out;
	out.open(name + ".bas");
	list(out);
}

The function unsaveProgram() deletes the currently-named program from disk:

// delete saved program from disk
void Basic::unsaveProgram(){
	::remove((name + ".bas").c_str());
}

The newProgram() function erases whatever program lines are currently in memory, prompts the user for a program name, and then eats the newline character to prevent flex from printing out a prompt character when it reads it:

// start a new program
void Basic::newProgram(){
	erase();
	std::cout << "Enter a program name: "; std::cin >> name;
	std::cin.ignore();	// consume the newline character
}

The final command function, oldProgram(), also starts by erasing the currently stored program and prompting the user for a program name. Then it attempts to open a file with that name, and points the yyin file pointer to it for flex/Bison. If it fails to open a file with that name, it restores yyin to the standard input and reports the error:

// load program from disk
void Basic::oldProgram(){
	erase();
	std::cout << "Enter program to read: "; std::cin >> name;
	yyin = fopen((name + ".bas").c_str(), "r");
	if( !yyin ){
		std::cout << "ERROR: could not read file: " << name << ".bas\n";
		yyin = stdin;
	}
}

Lastly, we have our utility function, erase(). This iterates through the stored program and deletes the pointers, then empties the map:

// clear stored program lines
void Basic::erase(){
	for( map<int, const Program *>::iterator it = lines.begin(); it!= lines.end(); ++it ){
		delete it->second;
	}
	lines.clear();
}

That’s all the C++ class code, now we just need to update the Bison and flex input files to use the new functionality. In basic.y, start by declaring another C function, yywrap. This is used in Bison to handle when we switch input files:

extern "C" int yywrap();

Next, add four constant tokens for our new commands:

%token SAVE
%token UNSAVE
%token NEW
%token OLD

We update the stmt rule to use these, as well as our modified list() function:

stmt:
	LINE			{ Basic::instance()->remove($1); }
	| LINE program		{ Basic::instance()->add($1, $2); }
	| RUN			{ Basic::instance()->execute(); }
	| LIST			{ Basic::instance()->list(std::cout); }
	| NEW			{ Basic::instance()->newProgram(); }
	| OLD			{ Basic::instance()->oldProgram(); }
	| SAVE			{ Basic::instance()->saveProgram(); }
	| UNSAVE		{ Basic::instance()->unsaveProgram(); }
;

Since we now have the concept of named programs, we will call newProgram() when our interpreter starts up. Modify the main() function to do this:

int main(int argc, char **argv){
	std::cout << "Welcome to BASIC!\n";
	Basic::instance()->newProgram();
	std::cout << '>';
	do {
		yyparse();
	} while( !feof(yyin) );
	
	return 0;
}

Lastly, we need to add the yywrap function we declared at the beginning of the file. This function gets called by flex when it reaches the end of the current input it’s reading. If yywrap returns 1, flex ends parsing; if it returns 0, flex expects yyin to be pointed to a new input file, and continues parsing. If yywrap is called while yyin is the standard input, that means the user has sent the end-of-file signal, so we should stop parsing. If it points to any other input file, that means we have finished reading an input file being loaded using the oldProgram() function, so we should close that input file and point yyin back to standard input to continue processing user commands:

int yywrap(){
	if( yyin == stdin )
		return 1;
	fclose(yyin);
	yyin = stdin;
	return 0;
}

Almost done! There are just a couple small changes to make to the flex input file, and we will be done with this part of the tutorial. First, since we are now defining a yywrap() function, we need to remove the noyywrap option given at the top of file:

%option case-insensitive

We could now compile and run our interpreter, but if we read in an old program, we would get a prompt character printed out every time the scanner read a new line! So we need to modify our scanner so it only prints prompt characters when it’s reading from the standard input:

\n		{
			if( yyin == stdin )
				std::cout << ">";
			return ENDL;
		}

Excellent! Now you can save and load programs!

Welcome to BASIC!
Enter a program name: first
>10 print "first!"
>20 let a = 1
>30 print a
>save
>new
Enter a program name: second
>10 print "second!"
>20 let b = 2
>30 print b
>save
>old
Enter program to read: first
>run
first!
1.000000

The complete code for this part can be found here: https://github.com/VisceralLogic/basic/tree/part-4

Continue to Part V.

2 thoughts on “BASIC Interpreter, Part IV

  1. Pingback: BASIC Interpeter, Part III | Visceral Logic Programming

  2. Pingback: BASIC Interpreter, Part V | Visceral Logic Programming

Leave a Reply

Your email address will not be published. Required fields are marked *