This is a continuation of Part III. This time we will add some mathematical operators, the ability to save and load programs, and support numerical variables.
To enable mathematical operations, we will create a subclass of DoubleExpression
that takes two DoubleExpression
s as inputs, as well as a character code signifying which operation is to be performed. Because we want to be able to support variable expressions a little later, we can’t evaluate the operation when it is entered. Instead, we will keep pointers to the child expressions, and evaluate them when the operation’s value is required. To do this, we will need the ability to get the numerical value of the child expressions. We already have a member function called value
on the Expression
class and its inheritors, but that returns a string
, not a double
. In retrospect, perhaps our naming convention was not the best, so let’s go back and change it. We will rename the print
function to list
, since it is to support the LIST
statement, and rename the value
function to print
, since its purpose is to support the PRINT
statement. Here is the new expression.h header file:
#ifndef _EXPRESSION_H_ #define _EXPRESSION_H_ #include <string> /* Base class used for storing and evaluating data items */ class Expression { public: virtual const std::string print() const; // return a string version for PRINT virtual const std::string list() const; // printable version suitable for LIST }; #endif
Change the function names in the source file, expression.cpp, also:
#include "expression.h" // return the text value const std::string Expression::print() const{ return std::string("BASE EXPRESSION"); } // return a string for printing const std::string Expression::list() const{ return std::string("BASE EXPRESSION"); }
Make the same substitutions in the StringExpression
header and source files, and in the Print::execute
and Print::list
functions.
For DoubleExpression
, make the same substitutions, add a new value
function, and make them all virtual. Here are the new function declarations in the header file, doubleexpression.h:
virtual const std::string print() const; // return the stored value virtual const std::string list() const; // printable version virtual double value() const; // numerical evaluation
Here are the updated member functions in doubleexpression.cpp:
// return the text value const std::string DoubleExpression::print() const{ return std::to_string(d); } // return a string for printing const std::string DoubleExpression::list() const{ return print(); } // return the value double DoubleExpression::value() const{ return d; }
Now we’re ready to add our new class, OperatorExpression
. As described earlier, here is the header file, operatorexpression.h:
#ifndef _OPERATOREXPRESSION_H_ #define _OPERATOREXPRESSION_H_ #include "doubleexpression.h" /* Class for performing mathematical operations on DoubleExpressions. */ class OperatorExpression : public DoubleExpression { public: OperatorExpression(DoubleExpression *a, DoubleExpression *b, char op); ~OperatorExpression(); // delete the sub-expressions const std::string print() const; // return the stored value const std::string list() const; // printable version double value() const; // value of performed operation private: DoubleExpression *a, *b; // expressions on which to operate char op; // operation to perform }; #endif
The implementation is actually quite simple. We have a delete
operator function to clean up child expressions. The print
function will return a string
version of the numerical evaluation. The list
function will print out the sub-expressions, separated by the operator character. The bulk of the logic is in the value
function, where we switch on the op
member variable and perform the appropriate mathematical operation. Here is the source code, operatorexpression.cpp:
#include <cmath> #include "operatorexpression.h" OperatorExpression::OperatorExpression(DoubleExpression *a, DoubleExpression *b, char op) : DoubleExpression(0) { this->a = a; this->b = b; this->op = op; } OperatorExpression::~OperatorExpression(){ delete a; delete b; } const std::string OperatorExpression::print() const{ return std::to_string(value()); } const std::string OperatorExpression::list() const{ return a->list() + op + b->list(); } double OperatorExpression::value() const{ switch( op ){ case '+': return a->value() + b->value(); case '-': return a->value() - b->value(); case '*': return a->value() * b->value(); case '/': return a->value() / b->value(); case '^': return exp(log(a->value()) * b->value()); } }
Now that we have added support in our C++ code, we need to add it in our Bison and flex files. Starting with the Bison input file, basic.y, add the new header file to the includes part of the C declarations:
#include "operatorexpression.h"
Add a new token type for DoubleExpression
s:
// token type definition %union { int iVal; double dVal; char *sVal; Program *progVal; Expression *eVal; DoubleExpression *dxVal; std::vector<Expression*> *eList; }
Add some new token types to support our operators:
%token PLUS %token MINUS %token MULT %token DIV %token EXP
And add token types for the new non-terminal symbols we’ll be creating. We’ll create a hierarchy of expression types to enforce operator precedence (note: Bison also supports defining operator precedence directly):
%type <dxVal> doubleExpr %type <dxVal> addExpr %type <dxVal> mulExpr %type <dxVal> expExpr %type <dxVal> term
The philosophy for creating the operator precedence by rule hierarchy is to start with the lowest precedence operator, and let it be either an instance of the next-higher rule, or two instance of the next-higher rule separated by the current level operator. E.g., an add is a mul, or a mul + mul, or a mul – mul. This ensures that higher-precedence operators will be evaluated first.
Our old definition for the expr
rule had it as either a STRING
or a DOUBLE
terminal symbol. We are going to replace the DOUBLE
symbol with a non-terminal symbol, the doubleExpr
rule, which will expand into the operations we need to support. Here are the updated rules:
expr: STRING { $$ = new StringExpression($1); free($1); // malloced in basic.l } | doubleExpr doubleExpr: addExpr ; addExpr: mulExpr | mulExpr PLUS mulExpr { $$ = new OperatorExpression($1, $3, '+'); } | mulExpr MINUS mulExpr { $$ = new OperatorExpression($1, $3, '-'); } ; mulExpr: expExpr | expExpr MULT expExpr { $$ = new OperatorExpression($1, $3, '*'); } | expExpr DIV expExpr { $$ = new OperatorExpression($1, $3, '/'); } ; expExpr: term | term EXP term { $$ = new OperatorExpression($1, $3, '^'); } ; term: DOUBLE { $$ = new DoubleExpression($1); }
With our new tokens defined in the Bison input file, we need to update the flex input file. Add a declaration for the new DoubleExpression
in the C declarations section:
class DoubleExpression;
And add rules for the new tokens. These are escaped because some of these characters have special meaning in the regular expression pattern matching, but we just want the literal character:
\+ { return PLUS; } - { return MINUS; } \* { return MULT; } \/ { return DIV; } \^ { return EXP; }
Lastly, update the makefile with the new OperatorExpression
source files:
all: basic.tab.c lex.yy.c \ basic.h basic.cpp \ program.h program.cpp \ print.h print.cpp \ expression.h expression.cpp \ stringexpression.h stringexpression.cpp \ doubleexpression.h doubleexpression.cpp \ operatorexpression.h operatorexpression.cpp g++ basic.tab.c lex.yy.c program.cpp basic.cpp print.cpp expression.cpp \ stringexpression.cpp doubleexpression.cpp operatorexpression.cpp \ -o basic
Now you can print out mathematical expressions using operators! Here is a sample session:
Welcome to BASIC! >10 print 2+2*3^7 >20 print "whoa!" >list 10 PRINT 2.000000+2.000000*3.000000^7.000000 20 PRINT "whoa!" >run 4376.000000 whoa!
The next step is to add support for variables. Support in the parser is pretty simple; it is added into the term
rule, as an alternative to a literal double; and a new statement: LET
. In the lexical analyzer, the rule is also pretty simple: a variable is an alpha character optionally followed by a single digit.
To perform variable storage and access operations, we will add a map
to our Basic
singleton instance that maps a string
to a double
, as well as function to assign and resolve variables.
In the header file, basic.h, include the standard header for the string
class:
#include <string>
Add two new public member variables:
void assign(std::string var, double value); // assign a value to a variable double resolve(std::string var); // return variable value
And add a private member variable to store the variables:
std::map<std::string, double> vars; // store variables
Here are the two new function implementations in basic.cpp:
// assign a value to a variable void Basic::assign(string var, double value){ // delete value if var already exists map<string, double>::iterator it = vars.find(var); if( it != vars.end() ){ vars.erase(it); } vars.insert(pair<string, double>(var, value)); } // return variable value double Basic::resolve(string var){ map<string, double>::iterator it = vars.find(var); if( it != vars.end() ){ return it->second; } }
I also made a tweak to the Basic::remove
function: instead of calling lines.erase(index);
, I changed it to lines.erase(it);
, since the former version requires searching through the map again, whereas the latter version makes use of the iterator we already found, making it slightly faster.
Now we need to add a new class to represent our variables, a subclass of DoubleExpression
called, naturally, VariableExpression
. This class will implement the virtual functions defined in its parent class, using a string
member variable to look up the value in the Basic
singleton instance. Here is the header file, variableexpression.h:
#ifndef _VARIABLEXPRESSION_H_ #define _VARIABLEXPRESSION_H_ #include <string> #include "doubleexpression.h" #include "basic.h" /* This class is used to access variable inside an expression */ class VariableExpression : public DoubleExpression { public: VariableExpression(char *name); const std::string print() const; // return the stored value const std::string list() const; // printable version double value() const; // numerical evaluation private: std::string var; }; #endif
The implementation is pretty simple. The constructor takes a string
to use as a lookup index, print()
returns a string-version of the stored value, list()
returns the variable name, and value()
looks up and returns the stored value from the Basic
singleton instance. Here is variableexpression.cpp:
#include "variableexpression.h" VariableExpression::VariableExpression(char *name) : DoubleExpression(0) { var = name; } // return the stored value const std::string VariableExpression::print() const{ return std::to_string(value()); } // printable version const std::string VariableExpression::list() const{ return var; } // numerical evaluation double VariableExpression::value() const{ return Basic::instance()->resolve(var); }
In order to assign a value to a variable, we will use the LET
statement, which looks like this:
LET X = 1 LET Y1 = 2 LET Z = X+Y1
To do this, we need to create a new subclass of Program
. This class will implement the virtual functions list()
and execute()
, and provide storage for the variable name and expression whose value will be assigned. We need to keep a pointer to the expression and not just evaluate it immediately because it could contain references to other variables whose value we will not know until we run the program. Here is the header file, let.h:
#ifndef _LET_H_ #define _LET_H_ #include <string> #include "program.h" #include "doubleexpression.h" #include "basic.h" /* The Let class provides variable assignment capability. */ class Let : public Program { public: Let(char *var, const DoubleExpression *expression); // create a new LET assignment ~Let(); // clean up void execute() const; // run this line of the program void list(std::ostream& os) const; // list this line private: std::string var; const DoubleExpression *expression; }; #endif
The implementation of this class is also pretty simple. The constructor stores the variable name and expression; the destructor deletes the expression (since the variable name is not stored in a pointer, we don’t need to delete it); the function execute()
, which is called at program runtime, evaluates the expression and calls the Basic
singleton instance to store it, and list()
prints out the statement line. Here is the source of let.cpp:
#include "let.h" Let::Let(char *var, const DoubleExpression *expression){ this->var = var; this->expression = expression; } // clean up the pointer Let::~Let(){ delete expression; } // store the value of the Expression in the Basic vars map void Let::execute() const{ Basic::instance()->assign(var, expression->value()); } // list this LET statement void Let::list(std::ostream& os) const{ os << "LET " << var << " = " << expression->list(); }
OK, time to update Bison and flex. In your Bison input file, include the header files for the new VariableExpression
and Let
classes:
#include "variablexpression.h" #include "let.h"
Add new constant tokens to support the LET
statement:
%token LET %token EQUAL
and a new terminal symbol to represent variables:
%token <sVal> VAR
Since we have added a new subclass of Program
, we need to update its rule:
program: PRINT exprList { $$ = new Print($2); } | LET VAR EQUAL doubleExpr { $$ = new Let($2, $4); free($2); // malloced in basic.l } ;
We also need to update the term
rule, which can now be either a literal double, or a variable name:
term: DOUBLE { $$ = new DoubleExpression($1); } | VAR { $$ = new VariableExpression($1); free($1); } ;
In the flex input file, we need to add scanners for the new terminal token types we have defined. Per the BASIC documentation we are following, a variable name consists of an alphabetic character optionally followed by a single digit. Let’s add an ALPHA
rule by our existing DIGIT
rule:
ALPHA [A-Z] DIGIT [0-9]
Here are the new scanners:
{ALPHA}{DIGIT}? { yylval.sVal = strdup(yytext); return VAR; } LET { return LET; } \= { return EQUAL; }
The VAR
scanner needs to return the actual text value it read, as well as the token type; the other two just need to return the token type.
Now, add the let.* and variableexpression.* dependencies to the make file, and run it:
Welcome to BASIC! >10 let a = 1 >20 let b = a*2 >30 print a, b >list 10 LET a = 1.000000 20 LET b = a*2.000000 30 PRINT a, b >run 1.000000 2.000000
The last thing we want to add right now is the ability to save and load programs. To do this, we will implement the BASIC commands outline in the guide: SAVE
, UNSAVE
, NEW
, and OLD
. To save a program, we will repurpose our existing list functionality but directed into a file output stream instead of standard output. Likewise, to read a saved file, we will change the yyin
file pointer to point to a file that we open, instead of standard input. There will be some minor changes to the Bison and flex input files to look for these commands, but the heavy lifting will take place in the Basic
class.
In the header file, basic.h, we need to include <iostream>
, which we will use in a modification to list()
. We also add four new public functions corresponding to the new commands we are implementing, as well as a private function to erase all lines of the current program, and a private variable containing the name of the current program.
#ifndef _BASIC_H_ #define _BASIC_H_ #include <map> #include <string> #include <iostream> #include "program.h" #include "doubleexpression.h" /* This is a singleton class which contains all the program lines. */ class Basic { public: void add(int index, const Program *program); // add a new line to the program void list(std::ostream &out); // list out all the existing lines void execute(); // run the program void remove(int index); // remove program line void saveProgram(); // save active program to disk void unsaveProgram(); // delete saved program from disk void newProgram(); // start a new program void oldProgram(); // load program from disk static Basic *instance(); // access the singleton instance void assign(std::string var, double value); // assign a value to a variable double resolve(std::string var); // return variable value private: void erase(); // clear stored program lines std::map<int, const Program*> lines; // store the lines in a map std::map<std::string, double> vars; // store variables std::string name; // name of active program static Basic *b; // singleton instance }; #endif
Now, for the implementation. At the top of basic.cpp, include <fstream>
and <cstdio>
. We will use these for reading and writing files. Also, make a declaration of the external variable yyin
(the input stream used by flex and Bison), since we will be re-directing it for file operations:
extern "C" FILE *yyin;
Since we changed the method signature of list to take an ostream
as an argument, we need to update the implementation to use that instead of std::cout
:
// print out the program lines void Basic::list(std::ostream &out){ for( map<int, const Program *>::iterator it = lines.begin(); it!= lines.end(); ++it ){ out << it->first << " "; it->second->list(out); out << std::endl; } }
Our first new function, saveProgram()
, opens a file output stream with the member variable name
and calls the new list()
function:
// save active program to disk void Basic::saveProgram(){ std::ofstream out; out.open(name + ".bas"); list(out); }
The function unsaveProgram()
deletes the currently-named program from disk:
// delete saved program from disk void Basic::unsaveProgram(){ ::remove((name + ".bas").c_str()); }
The newProgram()
function erases whatever program lines are currently in memory, prompts the user for a program name, and then eats the newline character to prevent flex from printing out a prompt character when it reads it:
// start a new program void Basic::newProgram(){ erase(); std::cout << "Enter a program name: "; std::cin >> name; std::cin.ignore(); // consume the newline character }
The final command function, oldProgram()
, also starts by erasing the currently stored program and prompting the user for a program name. Then it attempts to open a file with that name, and points the yyin
file pointer to it for flex/Bison. If it fails to open a file with that name, it restores yyin
to the standard input and reports the error:
// load program from disk void Basic::oldProgram(){ erase(); std::cout << "Enter program to read: "; std::cin >> name; yyin = fopen((name + ".bas").c_str(), "r"); if( !yyin ){ std::cout << "ERROR: could not read file: " << name << ".bas\n"; yyin = stdin; } }
Lastly, we have our utility function, erase()
. This iterates through the stored program and deletes the pointers, then empties the map:
// clear stored program lines void Basic::erase(){ for( map<int, const Program *>::iterator it = lines.begin(); it!= lines.end(); ++it ){ delete it->second; } lines.clear(); }
That’s all the C++ class code, now we just need to update the Bison and flex input files to use the new functionality. In basic.y, start by declaring another C function, yywrap
. This is used in Bison to handle when we switch input files:
extern "C" int yywrap();
Next, add four constant tokens for our new commands:
%token SAVE %token UNSAVE %token NEW %token OLD
We update the stmt
rule to use these, as well as our modified list()
function:
stmt: LINE { Basic::instance()->remove($1); } | LINE program { Basic::instance()->add($1, $2); } | RUN { Basic::instance()->execute(); } | LIST { Basic::instance()->list(std::cout); } | NEW { Basic::instance()->newProgram(); } | OLD { Basic::instance()->oldProgram(); } | SAVE { Basic::instance()->saveProgram(); } | UNSAVE { Basic::instance()->unsaveProgram(); } ;
Since we now have the concept of named programs, we will call newProgram()
when our interpreter starts up. Modify the main()
function to do this:
int main(int argc, char **argv){ std::cout << "Welcome to BASIC!\n"; Basic::instance()->newProgram(); std::cout << '>'; do { yyparse(); } while( !feof(yyin) ); return 0; }
Lastly, we need to add the yywrap
function we declared at the beginning of the file. This function gets called by flex when it reaches the end of the current input it’s reading. If yywrap
returns 1, flex ends parsing; if it returns 0, flex expects yyin
to be pointed to a new input file, and continues parsing. If yywrap
is called while yyin
is the standard input, that means the user has sent the end-of-file signal, so we should stop parsing. If it points to any other input file, that means we have finished reading an input file being loaded using the oldProgram()
function, so we should close that input file and point yyin
back to standard input to continue processing user commands:
int yywrap(){ if( yyin == stdin ) return 1; fclose(yyin); yyin = stdin; return 0; }
Almost done! There are just a couple small changes to make to the flex input file, and we will be done with this part of the tutorial. First, since we are now defining a yywrap()
function, we need to remove the noyywrap
option given at the top of file:
%option case-insensitive
We could now compile and run our interpreter, but if we read in an old program, we would get a prompt character printed out every time the scanner read a new line! So we need to modify our scanner so it only prints prompt characters when it’s reading from the standard input:
\n { if( yyin == stdin ) std::cout << ">"; return ENDL; }
Excellent! Now you can save and load programs!
Welcome to BASIC! Enter a program name: first >10 print "first!" >20 let a = 1 >30 print a >save >new Enter a program name: second >10 print "second!" >20 let b = 2 >30 print b >save >old Enter program to read: first >run first! 1.000000
The complete code for this part can be found here: https://github.com/VisceralLogic/basic/tree/part-4
Continue to Part V.
Pingback: BASIC Interpeter, Part III | Visceral Logic Programming
Pingback: BASIC Interpreter, Part V | Visceral Logic Programming