{"id":39,"date":"2018-09-07T20:47:48","date_gmt":"2018-09-08T04:47:48","guid":{"rendered":"http:\/\/www.viscerallogic.com\/programming\/blog\/?p=39"},"modified":"2018-09-07T20:50:04","modified_gmt":"2018-09-08T04:50:04","slug":"basic-interpreter-part-i","status":"publish","type":"post","link":"https:\/\/www.viscerallogic.com\/programming\/blog\/basic-interpreter-part-i\/","title":{"rendered":"BASIC Interpreter, Part I"},"content":{"rendered":"<p>This is the first of a multi-post tutorial on using flex and Bison to write a programming language interpreter, in this case, BASIC. We&#8217;ll be using the <a href=\"http:\/\/www.bitsavers.org\/pdf\/dartmouth\/BASIC_Oct64.pdf\" target=\"_blank\" rel=\"noopener\">1964 BASIC manual<\/a> from Dartmouth as the starting point language reference. All code is available at this GitHub repository: <a href=\"https:\/\/github.com\/VisceralLogic\/basic\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/VisceralLogic\/basic<\/a>. The interpreter will be written in C++.<!--more--><\/p>\n<p>Why use flex\/Bison instead of writing everything from scratch?\u00a0 If you&#8217;ve written a parser\/grammar interpreter before (e.g.,\u00a0<a href=\"http:\/\/www.viscerallogic.com\/programming\/blog\/?p=3\">Numerical Expression Solver in Java<\/a>), you know that even a simple language can get involved.\u00a0 Leveraging flex and Bison allows you to focus on the core language implementation, and not have to re-invent the wheel for parsing.\u00a0Bison is a GNU utility for generating programs to parse a programmer-defined language grammar.\u00a0 A manual with tutorials is here:\u00a0<a href=\"http:\/\/dinosaur.compilertools.net\/bison\/index.html\" target=\"_blank\" rel=\"noopener\">http:\/\/dinosaur.compilertools.net\/bison\/index.html<\/a>.\u00a0Flex is a utility for reading input files and converting them into lexical tokens, useful for consumption by Bison.\u00a0 The same site has the manual and tutorial for flex: <a href=\"http:\/\/dinosaur.compilertools.net\/flex\/index.html\" target=\"_blank\" rel=\"noopener\">http:\/\/dinosaur.compilertools.net\/flex\/index.html<\/a><\/p>\n<p>Here is a good introduction to both flex and Bison, from which we will start our program: <a href=\"http:\/\/aquamentus.com\/flex_bison.html\" target=\"_blank\" rel=\"noopener\">http:\/\/aquamentus.com\/flex_bison.html<\/a><\/p>\n<p>To start, we need to determine what the language features we need to implement are. The &#8217;64 manual describes a non-interactive language, so that is where we will start. Programs will be entered one line at a time, with a line number at the beginning of each line, and then the program will be run. There is no interactive user input in this language.<\/p>\n<p>Example input:<\/p>\n<pre>10 FOR X = 1 TO 100\r\n20 PRINT X, SQR(X)\r\n30 NEXT X\r\n40 END<\/pre>\n<p>Here are the 15 commands we will need to be able to interpret to offer a complete implementation:<\/p>\n<pre>LET\r\nREAD\r\nDATA\r\nPRINT\r\nGOTO\r\nIF-THEN\r\nFOR\r\nNEXT\r\nEND\r\nSTOP\r\nDEF\r\nGOSUB\r\nRETURN\r\nDIM\r\nREM<\/pre>\n<p>We will start by creating a program that reads just two lines: either a line number followed by a PRINT command, or the RUN statement to execute the stored program. We will start by creating the flex and Bison input files to make sure we have the basics working.<\/p>\n<p>The flex input file will need to recognize a few different types of tokens: integers for a line number, strings delimited by double quotes, the literals PRINT and RUN, and new line characters. It will ignore other white spaces, and pass anything it doesn&#8217;t recognize on to the Bison parser. Here is the initial flex input file:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"c\">%option noyywrap\r\n\r\n%{\r\n#include &lt;cstdio&gt;\r\nusing namespace std;\r\n\r\n#define YY_DECL extern \"C\" int yylex()\r\n\r\n#include \"basic.tab.h\"\r\n%}\r\n\r\n%%\r\n\r\n[ \\t]+\t;\t\t\/\/ skip white space\r\n[0-9]+\t\t\t{ yylval.iVal = atoi(yytext); return LINE; }\r\n\\\"[^\\\"\\n]*\\\"\t\t{ yylval.sVal = strdup(yytext); return STRING; }\r\nPRINT\t\t\t{ return PRINT; }\r\nRUN\t\t\t{ return RUN; }\r\n\\n\t\t\treturn ENDL;\r\n\r\n%%\r\n<\/pre>\n<p>The first section of the file is for <em>definitions<\/em>, and comes before the first <em>%%<\/em>. We denote C declarations with <em>%{<\/em> and <em>%}<\/em>, and everything inside those braces is ignored by flex and passed on to the C program it generates. The second section, coming after the first <em>%%<\/em>, provides the flex <em>rules<\/em>. Flex will follow whichever rule matches the most text, or if multiple match the same amount, the first of those rules.<\/p>\n<p>The first rule, <em>[ \\t]+ ;<\/em>, matches one or more spaces or tabs, and silently discards them. The second rule, <em>[0-9]+<\/em>, is designed for reading line numbers, and matches one ore more digits. It converts the text it has read into an integer using the <em>atoi<\/em> function, and returns the <em>LINE<\/em> token (which will be defined in the Bison input file. The next rule, <em>\\&#8221;[^\\&#8221;\\n]*\\&#8221;<\/em>, matches strings; it looks for an initial double quote, and then reads all subsequent characters except a new line or double quote, until it reaches a closing double quote. This text is returned as-is, as a <em>STRING<\/em> token. The rules <em>PRINT<\/em> and <em>RUN<\/em> match that text exactly, and return the corresponding token. The final rule, <em>\\n<\/em>, reads a new line character and returns the <em>ENDL<\/em> token.<\/p>\n<p>The Bison input file is a bit more complicated.\u00a0 It starts similarly in the first section with C declarations, and then token definitions.\u00a0 The second section contains the grammar rules.\u00a0 The final section contains additional C code, which in our initial iteration includes the &lt;em&gt;main&lt;\/em&gt; function definition:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"c\">\/\/ A BASIC grammar interpreter\r\n\r\n%{\r\n#include &lt;cstdio&gt;\r\n#include &lt;iostream&gt;\r\nusing namespace std;\r\n\r\nextern \"C\" int yylex();\r\nextern \"C\" int yyparse();\r\nextern \"C\" FILE *yyin;\r\n\r\nvoid yyerror(const char *s);\r\n%}\r\n\r\n\/\/ token type definition\r\n%union {\r\n\tint iVal;\r\n\tchar *sVal;\r\n}\r\n\r\n\/\/ constant tokens\r\n%token PRINT\r\n%token RUN\r\n%token ENDL\r\n\r\n\/\/ terminal symbols\r\n%token &lt;iVal&gt; LINE\r\n%token &lt;sVal&gt; STRING\r\n\r\n%% \/* Grammar rules and actions follow *\/\r\n\r\ninput:\r\n\t\/* empty *\/\r\n\t| input line\r\n;\r\n\r\nline:\r\n\tENDL\r\n\t| stmt ENDL\r\n;\r\n\r\nstmt:\r\n\tLINE program\t\t{ cout &lt;&lt; \"&gt; Programming line \" &lt;&lt; $1 &lt;&lt; \" ^^^\" &lt;&lt; endl; }\r\n\t| RUN\t\t\t{ cout &lt;&lt; \"&gt; running...\" &lt;&lt; endl; }\r\n;\r\n\r\nprogram:\r\n\tPRINT STRING\t\t{ cout &lt;&lt; \"&gt;\\tPRINT \" &lt;&lt; $2 &lt;&lt; endl; }\r\n;\r\n\r\n%%\r\n\r\nint main(int argc, char **argv){\r\n\tdo {\r\n\t\tyyparse();\r\n\t} while( !feof(yyin) );\r\n\t\r\n\treturn 0;\r\n}\r\n\r\nvoid yyerror(const char *s){\r\n\tcout &lt;&lt; \"Error: \" &lt;&lt; s &lt;&lt; endl;\r\n}<\/pre>\n<p>Since we will be parsing tokens of multiple types, we use the <em>%union<\/em> token definition. To start with, the only types whose value we need are integers (<em>int iVal<\/em>) for line numbers, and strings (<em>char *sVal<\/em>). The constant tokens <em>PRINT<\/em>, <em>RUN<\/em>, and <em>ENDL<\/em>, do not need values, they represent themselves. The terminal symbols <em>LINE<\/em> and <em>STRING<\/em> use the token type values defined previously. These five tokens will be exported to the file <em>basic.tab.h<\/em>, which is imported by the flex input file to maintain consistency in token definitions.<\/p>\n<p>Next is the grammar rules section, where the heart of the Bison file lies. The first rule defined is what Bison will attempt to match. It can be defined in terms of sub-rules. For our BASIC grammar, the primary rule is:<\/p>\n<pre>input:\r\n\t\/* empty *\/\r\n\t| input line\r\n;<\/pre>\n<p>This means the non-terminal symbol <em>input<\/em> matches either an empty input, or the non-terminal symbol <em>input<\/em> followed by the non-terminal symbol <em>line<\/em>. This is a recursive rule, which allows multiple lines to be read. For example, a <em>line<\/em> non-terminal symbol matches this rule by being <em>empty<\/em> (which is a valid <em>input<\/em>) followed by a <em>line<\/em>, so this matches the second case. This single <em>line<\/em> symbol is therefore a valid <em>input<\/em> symbol, and can be followed by another <em>line<\/em> to again reduce to the second case.<\/p>\n<p>Our second rule defines the <em>line<\/em> non-terminal symbol as either a blank line (represented by the terminal symbol <em>ENDL<\/em> as read by our flex analyzer), or a <em>stmt<\/em> non-terminal followed by the <em>ENDL<\/em> terminal symbol. This ensures that we will be reading in one line of input at a time:<\/p>\n<pre>line:\r\n\tENDL\r\n\t| stmt ENDL\r\n;<\/pre>\n<p>Things start to get a little more interesting with our third rule, for the <em>stmt<\/em> symbol. We define this as either a line number (the terminal symbol <em>LINE<\/em>) followed by a <em>program<\/em> symbol, or the terminal symbol <em>RUN<\/em>. In either case, we take an action to provide output as to what is being matched, so that we can verify what our program is doing:<\/p>\n<pre>stmt:\r\n\tLINE program\t\t{ cout &lt;&lt; \"&gt; Programming line \" &lt;&lt; $1 &lt;&lt; \" ^^^\" &lt;&lt; endl; }\r\n\t| RUN\t\t\t{ cout &lt;&lt; \"&gt; running...\" &lt;&lt; endl; }\r\n;<\/pre>\n<p>Our final rule defines the single acceptable pattern to make a <em>program<\/em> symbol: the terminal symbol <em>PRINT<\/em> followed by a <em>STRING<\/em>. If this is matched, we will provide feedback to that effect:<\/p>\n<pre>program:\r\n\tPRINT STRING\t\t{ cout &lt;&lt; \"&gt;\\tPRINT \" &lt;&lt; $2 &lt;&lt; endl; }\r\n;<\/pre>\n<p>The <em>$N<\/em> entries in the actions of the last two rules represent the <em>Nth<\/em> token being matched. As long as the token type has a value associated with it from the tokens declaration section, its value can be accessed.<\/p>\n<p>The final section of our Bison input file provides a simple <em>main<\/em> function that calls the Bison-generated function <em>yyparse()<\/em> until the input file (which defaults to <em>stdin<\/em>) reaches the end, and an error function that is called when Bison is unable to successfully match a rule:<\/p>\n<pre>int main(int argc, char **argv){\r\n\tdo {\r\n\t\tyyparse();\r\n\t} while( !feof(yyin) );\r\n\t\r\n\treturn 0;\r\n}\r\n\r\nvoid yyerror(const char *s){\r\n\tcout &lt;&lt; \"Error: \" &lt;&lt; s &lt;&lt; endl;\r\n}<\/pre>\n<p>The program is now ready to be built and run. Since the flex input file depends on <em>basic.tab.h<\/em>, we must first run Bison to generate this file. Call bison with the <em>-d<\/em> option to generate the header file:<\/p>\n<pre>&gt; bison -d basic.y<\/pre>\n<p>This generates the two files <em>basic.tab.c<\/em>, which contains the grammar parsing code, and <em>basic.tab.h<\/em>, which defines the tokens for consumption by flex. Next run flex to generate the output file <em>lex.yy.c<\/em>:<\/p>\n<pre>&gt; flex basic.l<\/pre>\n<p>Compile these with the <em>g++<\/em> program:<\/p>\n<pre>&gt; g++ basic.tab.c lex.yy.c -o basic<\/pre>\n<p>Running these commands over and over can get tedious, and it is easy to forget to run one, especially as we add more files. So it will make your life easier to use a <em>make<\/em> file, such as this one:<\/p>\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"no-highlight\">.PHONY: all clean\r\n\r\nall: basic.tab.c lex.yy.c\r\n\tg++ basic.tab.c lex.yy.c -o basic\r\n\r\nbasic.tab.c: basic.y\r\n\tbison -d basic.y\r\n\t\r\nbasic.tab.h: basic.y\r\n\tbison -d basic.y\r\n\r\nlex.yy.c: basic.tab.h basic.l\r\n\tflex basic.l\r\n\r\nclean:\r\n\trm basic.tab.c basic.tab.h lex.yy.c basic<\/pre>\n<p>Save this as <em>Makefile<\/em> in the same directory, and you can run it with the command <em>make<\/em>. Call <em>make clean<\/em> to remove all the generated files.<\/p>\n<p>Run your BASIC program by calling <em>.\/basic<\/em> at the command line, type ctrl-D to generate an end-of-file signal to finish. You should be able to enter lines such as the following, with anything else printing a syntax error message:<\/p>\n<pre>10 PRINT \"hello\"\r\n20 PRINT \"world\"\r\n00000 PRINT \"!\"\r\nRUN<\/pre>\n<p>If you enter the above lines, you should see the following output:<\/p>\n<pre>10 PRINT \"hello\"\r\n&gt;\tPRINT \"hello\"\r\n&gt; Programming line 10 ^^^\r\n20 PRINT \"world\"\r\n&gt;\tPRINT \"world\"\r\n&gt; Programming line 20 ^^^\r\n00000 PRINT \"!\"\r\n&gt;\tPRINT \"!\"\r\n&gt; Programming line 0 ^^^\r\nRUN\r\n&gt; running...\r\n<\/pre>\n<p>As you can see from the output, the <em>program<\/em> non-terminal symbol is being matched and evaluated prior to the <em>stmt<\/em> non-terminal symbol, resulting in the <em>&gt; PRINT &#8220;hello&#8221;<\/em> being printed before the <em>&gt; Programming line 10 ^^^<\/em>. You can also see that the input <em>00000<\/em> is being converted to the integer <em>0<\/em> by flex, and passed to Bison in that way.<\/p>\n<p>All the files used in this example are available for download here: <a href=\"https:\/\/github.com\/VisceralLogic\/basic\/tree\/part-1\" target=\"_blank\" rel=\"noopener\">https:\/\/github.com\/VisceralLogic\/basic\/tree\/part-1<\/a><\/p>\n<p>In the <a href=\"https:\/\/www.viscerallogic.com\/programming\/blog\/basic-interpreter-part-ii\/\">Part II<\/a>, we will begin expanding this program.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is the first of a multi-post tutorial on using flex and Bison to write a programming language interpreter, in this case, BASIC. We&#8217;ll be using the 1964 BASIC manual from Dartmouth as the starting point language reference. All code is available at this GitHub repository: https:\/\/github.com\/VisceralLogic\/basic. The interpreter will be written in C++.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[12,16,13,15],"tags":[],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p9npkn-D","jetpack_likes_enabled":true,"jetpack-related-posts":[{"id":46,"url":"https:\/\/www.viscerallogic.com\/programming\/blog\/basic-interpreter-part-ii\/","url_meta":{"origin":39,"position":0},"title":"BASIC Interpreter, Part II","date":"September 7, 2018","format":false,"excerpt":"This is a continuation of Part I. In Part II, we will add functionality to store and run BASIC programs, although still limited to the PRINT statement. We will also be adding a LIST statement to print out the currently stored program. We will be adding a number of new\u2026","rel":"","context":"In &quot;BASIC&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":90,"url":"https:\/\/www.viscerallogic.com\/programming\/blog\/basic-interpreter-part-iv\/","url_meta":{"origin":39,"position":1},"title":"BASIC Interpreter, Part IV","date":"September 7, 2018","format":false,"excerpt":"This is a continuation of Part III. This time we will add some mathematical operators, the ability to save and load programs, and support numerical variables. To enable mathematical operations, we will create a subclass of DoubleExpression that takes two DoubleExpressions as inputs, as well as a character code signifying\u2026","rel":"","context":"In &quot;BASIC&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":72,"url":"https:\/\/www.viscerallogic.com\/programming\/blog\/basic-interpeter-part-iii\/","url_meta":{"origin":39,"position":2},"title":"BASIC Interpeter, Part III","date":"September 7, 2018","format":false,"excerpt":"This is a continuation of Part II. In this part, we will add support for numerical as well as string expressions, support multiple expression in a PRINT statement, and add functionality for deleting program lines. Let's start with the Basic class header file, basic.h. All we need to do here\u2026","rel":"","context":"In &quot;BASIC&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":111,"url":"https:\/\/www.viscerallogic.com\/programming\/blog\/basic-interpreter-part-v\/","url_meta":{"origin":39,"position":3},"title":"BASIC Interpreter, Part V","date":"September 7, 2018","format":false,"excerpt":"This is a continuation of Part IV. In Part V we will add support for parentheses in mathematical expressions, add the GOTO and END statements, and implement the remaining program storage statements: CATALOG, SCRATCH, and RENAME. Now that we have a framework for supporting numerical operations, adding parentheses is not\u2026","rel":"","context":"In &quot;BASIC&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":131,"url":"https:\/\/www.viscerallogic.com\/programming\/blog\/basic-interpreter-part-vi\/","url_meta":{"origin":39,"position":4},"title":"BASIC Interpreter, Part VI","date":"September 7, 2018","format":false,"excerpt":"This is a continuation of Part V. In the previous section, we added support for the GOTO statement. Since we don't yet have any control logic, that is not very useful, but it does lay the infrastructure for one of the features we will add this time: IF-THEN. We will\u2026","rel":"","context":"In &quot;BASIC&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":149,"url":"https:\/\/www.viscerallogic.com\/programming\/blog\/basic-interpreter-part-vii\/","url_meta":{"origin":39,"position":5},"title":"Basic Interpreter, Part VII","date":"September 7, 2018","format":false,"excerpt":"This is a continuation of Part VI. This time, we will be adding the FOR-NEXT loop and unary negation. The FOR-NEXT loop takes a couple forms: FOR <VAR> = <START> TO <STOP> ... NEXT <VAR> or FOR <VAR> = <START> TO <STOP> STEP <STEP> ... NEXT <VAR> In the former\u2026","rel":"","context":"In &quot;BASIC&quot;","img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.viscerallogic.com\/programming\/blog\/wp-json\/wp\/v2\/posts\/39"}],"collection":[{"href":"https:\/\/www.viscerallogic.com\/programming\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.viscerallogic.com\/programming\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.viscerallogic.com\/programming\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.viscerallogic.com\/programming\/blog\/wp-json\/wp\/v2\/comments?post=39"}],"version-history":[{"count":8,"href":"https:\/\/www.viscerallogic.com\/programming\/blog\/wp-json\/wp\/v2\/posts\/39\/revisions"}],"predecessor-version":[{"id":161,"href":"https:\/\/www.viscerallogic.com\/programming\/blog\/wp-json\/wp\/v2\/posts\/39\/revisions\/161"}],"wp:attachment":[{"href":"https:\/\/www.viscerallogic.com\/programming\/blog\/wp-json\/wp\/v2\/media?parent=39"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.viscerallogic.com\/programming\/blog\/wp-json\/wp\/v2\/categories?post=39"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.viscerallogic.com\/programming\/blog\/wp-json\/wp\/v2\/tags?post=39"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}