4. Example: Implementing a simple calculator

This section shows how to implement a simple calculator language using the Typped package. It is an evaluating calculator with a simple read-evaluate-print loop (REPL). It implements basic math operations. It does not have many predefined functions, but it would be simple to add them according to the pattern of the current ones. The juxtaposition operator is defined for multiplication, so any two terms next to each other (by default separated by a space character) are multiplied together. Simple variables can also be set and used.

This is a top-down description of the code, even though the actual file is in the reverse order. The working code can be found in the file example_calculator.py. Just run that file to start up the calculator. A few the lines of that file have been left out of the code listed below for readability.

The Python code has been broken up into various high-level functions for readability and to simplify the display and discussion of the code. These high-level functions will be called “procedures” in the discussion, to avoid confusion with the functions in the calculator language or with lower-level Python functions inside the parser.

4.1. The main procedure

The main procedure is called define_and_run_basic_calculator. It is called as the last line of the calculator module.

import math
import operator
import typped as pp

def define_and_run_basic_calculator():
    """Get a parser, define the calculator language, and start the REP loop."""
    parser = pp.PrattParser()
    define_basic_calculator(parser)
    read_eval_print_loop(parser)

The code first defines a PrattParser instance. It then passes that instance to a procedure which sets up the parser with the grammar and evaluation functions for the calculator language.

Finally, the read-evaluate-print loop is called, passed the parser instance as an argument. The three procedures above will be described shortly. First, to give a preview of what the calculator does, here is a simple dialog of the program running:

Enter ^C to exit, and 'toggle' to toggle tree display.
> 5
5.0
> 5 + 5
10.0
> 5 * (4 - 3)
5.0
> # A comment line.
> 4 4 # Juxtaposition with space is multiplication.
16.0
> 5 (log(3) - 9)/44.90
-0.8798872729768252
> pi # Pi and e are predefined constants.
3.141592653589793
> tau = 2 pi # Simple variables can be assigned.
6.283185307179586
> sin(tau)
-2.4492935982947064e-16
> toggle
> 2 pi cos(tau) - 1E3 # Scientific notation.

<k_minus,'-'>
    <k_jop,None>
        <k_jop,None>
            <k_float,'2'>
            <k_identifier,'pi'>
        <k_cos,'cos'>
            <k_identifier,'tau'>
    <k_float,'1E3'>

-993.7168146928204
> 25 - 5.0^2

<k_minus,'-'>
    <k_float,'25'>
    <k_double_ast,'^'>
        <k_float,'5.0'>
        <k_float,'2'>

0.0
> 25 - 5.0**2

<k_minus,'-'>
    <k_float,'25'>
    <k_double_ast,'**'>
        <k_float,'5.0'>
        <k_float,'2'>

0.0
>
Bye.

Notice that when the toggle command is given the program prints out the expression tree for the expressions. Each line is a token in the tree. The representation <k_minus,'-'> is for the token that was assigned the string label k_minus. The convention of starting token labels with k_ is generally used in the code. The string '-' is the value of the token, i.e., the actual symbol in the parsed expression text that the lexer matched as being a k_minus token.

The top line of the tree representation is the root of the tree. Indented lines below are for children. Each level of indentation is another level of children in the tree.

4.2. The read-evaluate-print loop

Continuing with the top-down presentation, the REP loop is shown next. The code is basic Python, and can be skimmed by people familiar with the language. The code shows how a Typped parser is used at the higher level.

The cmd module in the standard Python library can also be used to write the REP loop. The example file also has an alternative version of the procedure which is implemented using that library.

def read_eval_print_loop(parser):
    """Implement the REP loop."""
    import readline

    try:
        read_input = raw_input # Python 2.
    except NameError:
        read_input = input # Python 3.

    print("Enter ^C to exit, and 'toggle' to toggle tree display.")

    show_tree = False # Toggled in the loop below.
    while True:
        try:
            line = read_input("> ")
        except (KeyboardInterrupt, EOFError):
            print("\nBye.")
            break
        if not line:
            continue
        if line == "toggle":
            show_tree = not show_tree
        elif line.strip().startswith("#"): # Tries to parse empty line.
            continue

        try:
            parse_tree = parser.parse(line)
            eval_value = parse_tree.eval_subtree()
        except (ValueError, ZeroDivisionError,
                pp.ParserException, pp.LexerException) as e:
            print(e)
            continue

        if show_tree:
            print("\n", parse_tree.tree_repr(), sep="")
        print(eval_value)

The code starts by importing readline. Just importing that module provides nice features for the Python input command, such as command history with the up and down arrows. The code then prints a prompt and waits for the user to enter a line, which should contain an expression in the calculator language.

Notice that ^C can be used to exit the program. If the user types in the command toggle it will toggle the printing of expression trees for the user-entered expressions.

The passed-in parser argument is used inside a try loop in order to catch errors and continue running. As with all Typped parsing operations, the full expression tree for the expression that was input by the user created by this line:

parse_tree = parser.parse(line)

Here line is the user’s input. The value returned from parse is a token instance, which the parse function has converted into the root node of an expression tree of tokens. These are the expression trees that were displayed in the above dialog after the toggle command was issued.

After the expression tree is returned it is evaluated with the line parse_tree.eval_subtree(), which is a recursive evaluation function started at the root of the expression tree. Evaluation functions are provided when the grammar for the language is defined, in the next section.

Finally, the values are printed out and the loop continues.

4.3. Defining the grammar

The only high-level procedure left to describe is the define_basic_calculator procedure. This is the procedure that really shows how to set up and use the PrattParser class (at least the basic usage). To keep the procedure from being too long it has been broken up into several sub-procedures doing particular tasks. This is the top-level procedure:

def define_basic_calculator(parser):
    """Define the calculator language in the parser instance."""
    define_general_tokens_and_literals(parser)
    define_functions_and_operators(parser)
    define_juxtaposition_operators(parser)
    define_assignment_operator(parser)
    define_comments(parser)
    define_semicolon_separator(parser)

Each procedure does what the name implies. The code for each sub-procedure, in sequence, will be shown and discussed next. The first procedure defines some general tokens and literals in the calculator language:

def define_general_tokens_and_literals(parser):
    """Define some general tokens and literals in the calculator language.
    Other tokens such as for functions in the language will be defined
    later."""
    #
    # Tokens.
    #

    parser.def_default_whitespace()

    tok = parser.def_token
    tok("k_plus", r"\+")
    tok("k_minus", r"\-")
    tok("k_fslash", r"/")
    tok("k_ast", r"\*")
    tok("k_lpar", r"\(")
    tok("k_rpar", r"\)")
    tok("k_lbrac", r"\[")
    tok("k_rbrac", r"\]")
    tok("k_comma", r",")
    tok("k_bang", r"!")
    tok("k_equals", r"=")
    tok("k_double_ast", r"(?:\*\*|\^)") # Note ^ is a synonym for **.

    # This token definition for a float is based on the regex from
    # https://docs.python.org/2/library/re.html#simulating-scanf
    #   r"[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?"
    # But if we used the Python doc form exactly then 4 -4 would be interpreted
    # as a multiplication jop for rather than correctly, as subtraction.  So
    # the [+-] part is left off and is implemented as a prefix operator instead.
    tok("k_float", r"(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?")

    #
    # Literals.
    #

    parser.def_literal("k_float", eval_fun=lambda t: float(t.value))

We now have tokens for operators and other basic symbols in the language. Notice that ^ and ** are both defined to produce the token labeled double_ast. An alternate way to do this would be to define two separate tokens and give them the same function definition.

Floating point literals are defined and provided with an evaluation function. This evaluation function just takes a token with label k_float as its argument t and converts the string value (returned by the lexer) into a Python float. The floating point value is returned.

The next procedure contains a group of definitions for the calculator language which will define almost all of the functions in the language. This includes standard functions like sin and operators like * and !. The definitions are made using built-in methods of the PrattParser class. Note in particular the precedence values assigned to the operators.

Every function is also provided with an evaluation function, which, at evaluation time, runs the Python version of the function on the arguments. The evaluation functions are passed a node t, which is a node in the final expression tree. The children of the node are then accessed as t[0], t[1], etc., depending on how many arguments there are. The evaluations are top-down recursive, and call the method eval_subtree on the child or children (i.e., to evaluate the function arguments).

def define_functions_and_operators(parser):
    """Define the all the functions and operators for the calculator.
    Evaluation functions are also supplied for each one.  Parentheses and
    brackets are also defined here, since they have a precedence in the order
    of evaluations."""

    #
    # Parens and brackets, highest precedence (since they have a head function).
    #

    parser.def_bracket_pair("k_lpar", "k_rpar",
                            eval_fun=lambda t: t[0].eval_subtree())
    parser.def_bracket_pair("k_lbrac", "k_rbrac",
                            eval_fun=lambda t: t[0].eval_subtree())

    #
    # Standard functions.
    #

    parser.def_token("k_sin", r"sin")
    parser.def_stdfun("k_sin", "k_lpar", "k_rpar", "k_comma", num_args=1,
                      eval_fun=lambda t: math.sin(t[0].eval_subtree()))
    parser.def_token("k_cos", r"cos")
    parser.def_stdfun("k_cos", "k_lpar", "k_rpar", "k_comma", num_args=1,
                      eval_fun=lambda t: math.cos(t[0].eval_subtree()))
    parser.def_token("k_sqrt", r"sqrt")
    parser.def_stdfun("k_sqrt", "k_lpar", "k_rpar", "k_comma", num_args=1,
                      eval_fun=lambda t: math.sqrt(t[0].eval_subtree()))

    # Note that log is overloaded because different numbers of arguments are
    # specified.  The two versions have different eval funs.
    parser.def_token("k_log", r"log")
    parser.def_stdfun("k_log", "k_lpar", "k_rpar", "k_comma", num_args=1,
                      eval_fun=lambda t: math.log(t[0].eval_subtree()))
    parser.def_stdfun("k_log", "k_lpar", "k_rpar", "k_comma", num_args=2,
               eval_fun=lambda t: math.log(t[0].eval_subtree(), t[1].eval_subtree()))

    #
    # Basic operators, from highest to lowest precedence.
    #

    parser.def_prefix_op("k_plus", 50,
                         eval_fun=lambda t: operator.pos(t[0].eval_subtree()))
    parser.def_prefix_op("k_minus", 50,
                         eval_fun=lambda t: operator.neg(t[0].eval_subtree()))

    parser.def_postfix_op("k_bang", 40, allow_ignored_before=False,
                          eval_fun=lambda t: math.factorial(t[0].eval_subtree()))

    parser.def_infix_op("k_double_ast", 30, "right",
            eval_fun=lambda t: operator.pow(t[0].eval_subtree(), t[1].eval_subtree()))

    parser.def_infix_op("k_ast", 20, "left",
            eval_fun=lambda t: operator.mul(t[0].eval_subtree(), t[1].eval_subtree()))
    parser.def_infix_op("k_fslash", 20, "left",
            eval_fun=lambda t: operator.truediv(t[0].eval_subtree(), t[1].eval_subtree()))

    parser.def_infix_op("k_plus", 10, "left",
            eval_fun=lambda t: operator.add(t[0].eval_subtree(), t[1].eval_subtree()))
    parser.def_infix_op("k_minus", 10, "left",
            eval_fun=lambda t: operator.sub(t[0].eval_subtree(), t[1].eval_subtree()))

The definitions above actually define the log function twice, with a different number of arguments each time. This results in function overloading. Each overload can have a different evaluation function. In this case the two-place version takes an extra argument giving the base, like in the Python math library (which uses a default parameter value of math.e for the single-argument form).

At this point we have a working calculator. The code up to this point can already be run to do basic operations. The next three procedures just add extra features to the calculator.

The previous procedure defined all the usual arithmetic functions, but it did not define the juxtaposition operator. This procedure defines the juxtaposition operator as a synonym for multiplication.

def define_juxtaposition_operators(parser):
    """Define the juxtaposition operator (jop) as synonym for multiplication."""

    jop_required_token = "k_space" # Can be set to None to not require any whitespace.
    parser.def_jop_token("k_jop", jop_required_token)
    parser.def_jop(20, "left", # Same precedence and assoc. as ordinary multiplication.
            eval_fun=lambda t: operator.mul(t[0].eval_subtree(), t[1].eval_subtree()))

The jop_required_token argument to the method def_jop_token is a token which is required to be present in order for a juxtaposition operator to be inferred. The setting above requires a space between two tokens in order for a jop to possibly be inferred. After these definitions strings like 2 sin(3.3) can be evaluated with implicit multiplication.

Next, the grammar for and implementation of simple assignment statements is defined for the calculator language. Two symbols, for pi and e are predefined to the associated math constants.

def define_assignment_operator(parser):
    """Define assignment and reading of simple variables."""

    parser.calculator_symbol_dict = {} # Store symbol dict as a new parser attribute.
    symbol_dict = parser.calculator_symbol_dict

    symbol_dict["pi"] = math.pi # Predefine pi.
    symbol_dict["e"] = math.e # Predefine e.

    # Note that on_ties for identifiers is set to -1, so that when string
    # lengths are equal defined function names will take precedence over
    # identifiers (which are only defined as a group regex).
    parser.def_token("k_identifier", r"[a-zA-Z_](?:\w*)", on_ties=-1)
    parser.def_literal("k_identifier",
            eval_fun=lambda t: symbol_dict.get(t.value, 0.0))

    def eval_assign(t):
        """Evaluate the identifier token `t` and save the value in `symbol_dict`."""
        rhs = t[1].eval_subtree()
        symbol_dict[t[0].value] = rhs
        return rhs

    parser.def_infix_op("k_equals", 5, "right",
                precond_fun=lambda tok, lex: lex.peek(-1).token_label == "k_identifier",
                precond_label="lhs must be identifier",
                eval_fun=eval_assign)

Notice that the def_infix_op method is used to define the = operator for assignment. A preconditions function, along with a unique label for it, is used to restrict the definition to the case where the left hand side of the assignment is an identifier.

Now expressions like sin(2 pi), x = 5, and x^2 can be used in the calculator language. Uninitialized variables default to zero, and the variables pi and e are predefined. Assignment returns the assigned value as its operator value, so, while it is probably not a good idea, you can have expressions like sin(tau = 2 pi).

The next feature which will be added to the calculator language is comments. Comments are just like comments in Python. They are defined by defining a token with a regex that recognizes comments, and telling the lexer to ignore all such tokens.

def define_comments(parser):
    """Define comments in the calculator.  Everything from '#' to EOL is a
    comment.  Defined using an ignored token pattern."""

    parser.def_ignored_token("k_comment_to_EOL", r"\#[^\r\n]*$", on_ties=10)

The language has now been defined and the calculator can be run as shown above in the interactive dialog. As one final feature, we will define the semicolon operator as a statement separator. This can be done simply by using an infix operator with a very low precedence value.

def define_semicolon_separator(parser):
    """Define semicolon to separate expressions, returning the value of the last one."""

    def eval_semicolon(t):
        t[0].eval_subtree()
        return t[1].eval_subtree()

    parser.def_token("k_semicolon", r";")
    parser.def_literal("k_semicolon")
    parser.def_infix_op("k_semicolon", 1, "right",
                        eval_fun=eval_semicolon)

Now expressions like w = sqrt(4); w^2 can be used.