BASIC Interpeter, Part III

This is a continuation of Part II. In this part, we will add support for numerical as well as string expressions, support multiple expression in a PRINT statement, and add functionality for deleting program lines.

Let’s start with the Basic class header file, basic.h. All we need to do here is add a member function for deleting lines from the stored program:

	void remove(int index);	// remove program line

For the class file, basic.cpp, since we already have code in in the add function to remove a line if it already exists, we will move that into the new remove function, and then call remove from add. Here are the changes:

// add a program line at index, overwriting if it exists
void Basic::add(int index, const Program *program){
	// see if index already exists, if so delete it
	remove(index);
	
	lines.insert(std::pair<int, const Program *>(index, program));
}

// remove an existing line from the program
void Basic::remove(int index){
	map<int, const Program*>::iterator it = lines.find(index);
	if( it != lines.end() ){
		const Program *old = it->second;
		delete old;
		lines.erase(index);
	}	
}

The Program base class doesn’t change at all. The derived Print class has a minor change: we will be sub-classing Expression to support numerical and string expressions, so we need to make its functions virtual and use pointers in the Print class so the derived class member functions will be called. In the header file, print.h, change the constructor signature to use Expression *:

	Print(const std::vector<Expression*> *exprList);	// create with a vector of expressions to print

Also change the member variable declaration exprList to match:

	const std::vector<Expression*> *exprList;	// store the expressions here

In the class file, print.cpp, we have to update the method signatures to use pointers, and make a small change to how we access the elements in the vector. Because the elements are now pointers instead of actual objects, instead of using the array subscript method, we have to use the at() function. Here is the updated class file:

#include 

#include "print.h"

using std::endl;
using std::cout;

// constructor for Print class
Print::Print(const std::vector<Expression*> *exprList){
	this->exprList = exprList;
}

// prints out each expression to std::cout
void Print::execute() const{
	for( int i = 0; i < exprList->size()-1; i++ ){
		cout << exprList->at(i)->value() << ' ';
	}
	cout << exprList->at(exprList->size()-1)->value() << endl;
}

// lists the expressions, as they were originally given
void Print::list(ostream& os) const{
	os << "PRINT ";
	for( int i = 0; i < exprList->size()-1; i++ ){
		os << exprList->at(i)->print() << ", ";
	}
	os << exprList->at(exprList->size()-1)->print();
}

The Expression class has a few changes. Since we will be creating subclasses of it, its member functions need to be virtual. Also, since we will only be using subclasses, we no longer need the member variable text. Here is the header file, expression.h:

#ifndef _EXPRESSION_H_
#define _EXPRESSION_H_

#include <string>

/*
Base class used for storing and evaluating data items
*/
class Expression {
public:
	virtual const std::string value() const;			// return the stored value
	virtual const std::string print() const;			// printable version
};

#endif

The class file, expression.cpp, likewise becomes simplified:

#include "expression.h"

// return the text value
const std::string Expression::value() const{
	return std::string("BASE EXPRESSION");
}

// return a string for printing
const std::string Expression::print() const{
	return std::string("BASE EXPRESSION");
}

Our first subclass of Expression is StringExpression, which is basically the same as our original implementation of Expression. Here is the header file, stringexpression.h:

#ifndef _STRINGEXPRESSION_H_
#define _STRINGEXPRESSION_H_

#include "expression.h"

/*
Class used for storing a text value
*/
class StringExpression : public Expression {
public:
	StringExpression(const char *text);		// take a string as input
	
	const std::string value() const;		// return the stored value
	const std::string print() const;		// printable version
	
private:
	std::string text;				// data storage
};

#endif

Likewise, the implementation in stringexpression.cpp duplicates the functionality of the old Expression class:

#include "stringexpression.h"

// create a new StringExpression, storing its text
StringExpression::StringExpression(const char *text){
	this->text = std::string(text);
}

// return the text value
const std::string StringExpression::value() const{
	return text;
}

// return a string for printing
const std::string StringExpression::print() const{
	return '"' + text + '"';
}

The class DoubleExpression is very similar to StringExpression, with the only difference being it stores a numerical value instead of a string value. Here is the header file, doubleexpression.h:

#ifndef _DOUBLEEXPRESSION_H_
#define _DOUBLEEXPRESSION_H_

#include "expression.h"

/*
Class used for storing a numerical value
*/
class DoubleExpression : public Expression {
public:
	DoubleExpression(double d);			// take a double as input
	
	const std::string value() const;		// return the stored value
	const std::string print() const;		// printable version
	
private:
	double d;					// data storage
};

#endif

The implementation file, doublexpression.cpp, is naturally also very similar. Note that since printable version for calling LIST on a numerical expression is the same as its text value, the print function simply makes a call to value:

#include "doubleexpression.h"

// create a new DoubleExpression, storing its value
DoubleExpression::DoubleExpression(double d){
	this->d = d;
}

// return the text value
const std::string DoubleExpression::value() const{
	return std::to_string(d);
}

// return a string for printing
const std::string DoubleExpression::print() const{
	return value();
}

OK, now that we’ve got our classes set up, let’s update our Bison input file. The first change is to add our two new header files in the C declarations section:

#include "stringexpression.h"
#include "doubleexpression.h"

Next, in the Bison declarations, we need to add some new token types to the %union. We need to support reading doubles, Expression *s, and std::vector<Expression*> *s:

%union {
	int iVal;
	double dVal;
	char *sVal;
	Program *progVal;
	Expression *eVal;
	std::vector<Expression*> *eList;
}

Since we will now be able to enter a comma-separated list of expressions for the PRINT statement, we need to add a new constant token:

%token COMMA

We also have a new terminal symbol token type for reading doubles:

%token <dVal> DOUBLE

And we will round out the Bison declarations with two new non-terminal symbols to use our single expression and expression list types:

%type <eList> exprList
%type <eVal> expr

Now that we’ve got the symbol types defined, we need to update the grammar rules. The input and line rules remain the same, but we will change the program rule and add two new rules for reading expressions: exprList and expr. Here are the new rules:

stmt:
	LINE			{ Basic::instance()->remove($1); }
	| LINE program		{ Basic::instance()->add($1, $2); }
	| RUN			{ Basic::instance()->execute(); }
	| LIST			{ Basic::instance()->list(); }
;

program:
	PRINT exprList		{ $$ = new Print($2); }
;

exprList:
	expr			{ $$ = new std::vector<Expression*>(1, $1); }
	| exprList COMMA expr	{
					$1->push_back($3);
					$$ = $1;
				}
;

expr:
	STRING			{
					$$ = new StringExpression($1);
					free($1);	// malloced in basic.l
				}
	| DOUBLE		{ $$ = new DoubleExpression($1); }
;

We have added one new option to the stmt rule: a single LINE terminal symbol can be input to delete that line from the stored program. The program rule is simpler now: it reads a PRINT symbol followed by a new non-terminal symbol exprList, which we defined in our Bison declarations section to be of type std::vector<Expression*> *. We can pass this symbol to our updated Print constructor.

Now we have our new rules. The exprList rule is satisfied by either a single expr, or by recursion a sequence of exprs separated by commas (the terminal symbol COMMA we added). When Bison encounters the first matching expr, it creates a new std::vector<Expression *>, which matches the eList symbol type we defined, and populates it with the single Expression it read. As additional expression separated by commas are matched, they are pushed onto the end of the vector (the exprList symbol in the first matching position), and then that vector is set as the result of the rule.

The expr rule matches either a STRING or a DOUBLE terminal symbol, and builds the appropriate Expression derived class, as appropriate. For the case of the string, we again need to free the terminal symbol value, since we explicitly allocated it in the flex scanner.

Something that would be nice is if we had a prompt character at the beginning of each input line when running our BASIC interpreter, to make the input and output lines a little more distinguishable. So, in the main function, before the while loop, print out a welcome statement (we will change the flex scanner so that it will print out another prompt character every time it reads a new line):

	std::cout << "Welcome to BASIC!\n>";

OK, on to the flex scanner! We need to add support for the new terminal symbols we added in Bison, as well as a couple other minor changes. First, it gets annoying having to enter the BASIC commands using capital letters. Fortunately for us, flex provides an option to make the input matching case-insensitive, so we don’t need to redefine our rules; simply add the case-insensitive option to the first line:

%option noyywrap case-insensitive

We’re going to be doing some printing directly from our flex scanner now, so be sure to include iostream. We also have defined a token type that uses std::vector, so we need to include that header:

#include <iostream>
#include <vector>

The next change is to add a declaration of the Expression class, since it is used in the symbol type definitions that Bison exports. The flex code never uses this class, so instead of including its header file, we just declare it:

class Program;
class Expression;

Since we now need to be able to read in a double-type numerical value, we will be using the pattern [0-9] many times, so we will create a flex declaration to use DIGIT in its place. While this isn’t necessary, it can make the patterns a little more clear. Put this line after the end of the C declarations (%}) and before the start of the pattern matching (%%):

%}

DIGIT    [0-9]

%%

Now we can use the pattern {DIGIT} wherever we would use [0-9]. We will make use of this for both the LINE symbol and the DOUBLE symbol. Here are the updated rules:

[ \t]+	;	// skip white space
^{DIGIT}+	{ yylval.iVal = atoi(yytext); return LINE; }
\"[^\"\n]*\"	{	// remove the quote marks from either end
			yylval.sVal = (char *)malloc(sizeof(char)*strlen(yytext)-1);
			strncpy(yylval.sVal, yytext+1, strlen(yytext)-2);
			yylval.sVal[strlen(yytext)-2] = '\0';	// terminate the string!
			return STRING;
		}
(({DIGIT}+"."?)|("."{DIGIT}+)){DIGIT}*(E-?{DIGIT}+)?	{ yylval.dVal = atof(yytext); return DOUBLE; }
PRINT		{ return PRINT; }
RUN		{ return RUN; }
LIST		{ return LIST; }
,		{ return COMMA; }
\n		{ std::cout << ">"; return ENDL; }
.	;	// skip any characters that aren't part of a recognized pattern

The updated rule that matches a LINE symbol now requires the sequence of digits to be at the beginning of a line. This ensure that flex won’t match a numerical expression and return it as a LINE symbol.

The new rule for matching a DOUBLE is a little bit tricky. There are two ways to start this pattern: either one or more digits followed by an optional dot, or a starting dot followed by one or more digits. After the start, there may be zero or more digits, optionally followed by a literal E with an optional minus sign followed by one or more digits. Essentially, this rule requires that there be one or more digits, on either side of an optional
dot, and support for exponential notation followed. Here are some examples of inputs that this will match:

123
123.
.123
123.123
123E123
123E-123
123.123E-123
.123E-123
etc.

Since we have a new COMMA terminal symbol defined in Bison to support comma-separated expression lists, we need to match that here (line 32). Since we want to have a prompt character show up at the beginning of each new input line, we updated the ENDL match to print one out every time we read a new line character. Lastly, we have a new rule, .+ ;, which silently reads and discards any characters that it can’t match to previous rules; this prevents some unwanted input for reaching Bison, where it could cause a parsing error.

And of course, we need to update our makefile to include our new header and source files. I’ve been compiling on macOS, but on Linux we also need to tell g++ to use the C++11 standard, by adding the -std=c++11 argument. Here is the updated all rule:

all:	basic.tab.c lex.yy.c \
		basic.h basic.cpp \
		program.h program.cpp \
		print.h print.cpp \
		expression.h expression.cpp \
		stringexpression.h stringexpression.cpp \
		doubleexpression.h doubleexpression.cpp
	g++ -std=c++11 basic.tab.c lex.yy.c program.cpp basic.cpp print.cpp expression.cpp \
	stringexpression.cpp doubleexpression.cpp \
	-o basic

Here is an example session made from this Part III code:

Welcome to BASIC!
>10 print "Hello"
>20 print 123, .123, 123.123E123
>30 print "Big Number!"
>list
10 PRINT "Hello"
20 PRINT 123.000000, 0.123000, 123122999999999995047514223697997329420208162078904722947470938484432926767618625996126391524302018196895898346062018145943552.000000
30 PRINT "Big Number!"
>run
Hello
123.000000 0.123000 123122999999999995047514223697997329420208162078904722947470938484432926767618625996126391524302018196895898346062018145943552.000000
Big Number!
>20
>list
10 PRINT "Hello"
30 PRINT "Big Number!"
>run
Hello
Big Number!
>

As you can see, we’ll want to clean up the numerical printing a bit, but otherwise, things are looking good! The complete source files from this part are available here: https://github.com/VisceralLogic/basic/tree/part-3

Continue to Part IV.

2 thoughts on “BASIC Interpeter, Part III

  1. Pingback: BASIC Interpreter, Part II | Visceral Logic Programming

  2. Pingback: BASIC Interpreter, Part IV | Visceral Logic Programming

Leave a Reply

Your email address will not be published. Required fields are marked *