5. Implementing typed languages

The Typped package provides a fairly general type-definition and type-checking mechanism. It is optional, however, and can be ignored for untyped languages. Types can be used for detecting syntax errors (either dynamically or statically), for overloading operators, for validating data, etc.

When types are defined and set up properly (using either the built-in methods of the PrattParser class or custom methods) the resulting expression trees will be automatically type-checked at parse-time. To disable type-checking altogether the argument skip_type_checking=True can be passed to the initializer of the PrattParser instance.

Only simple types are currently implemented (though the mechanism can easily be made more general). As of now type are essentially just string labels which are equivalent iff the labels are the same. At some point type heirarchies, type conversions, etc., may be added.

Each construct registered with a parser instance can have type information associated with it. Generally this is specified by giving the type of the root node in the expression subtree for the parsed construct as well as the types of the children of that node. This type signature is then checked against the actual types in the parse subtree returned by the handler function for the construct.

The type system allows for operator overloading, including optional overloading on return types. The default is to allow overloading only on argument/child types. Overloading currently has the mild restriction that all overloads defined for an infix operator must share the same precedence value. To disable all overloading, set overload_on_arg_types=False in the PrattParser initializer. To allow overloading on return types as well as argument types, set overload_on_ret_types=True in the initializer. The default is to allow overloading on argument types but not on return types.

Operator overloading is used for constructs which parse the same (i.e., via the same handler function and with the same triggering conditions) but which can have multiple possible type signatures. Type resolution is then based on the actual arguments found at parse-time. Overloading of a construct with a new type signature (and possibly with a corresponding evaluation function and/or an AST data item) is done by explicitly calling the overload method of a construct instance.

The nodes in the final parsed expression tree have an attribute actual_sig giving the actual type signature, an attribute original_formal_sig giving the matching original signature, and an attribute expanded_formal_sig giving the intermediate, expanded form of the original signature. The evaluation functions and AST data for a construct are stored keyed by the original signature which was passed in at the time when that overload was defined.

Type information for a construct (e.g., a function evaluation subexpression) should generally be defined relative to the token which ends up as the subtree root in the expression/parse tree for the construct. Type-resolution and checking is done as soon as possible on the nodes of the parse tree as it is constructed, bottom-up. Checking makes sure that the types of the children of a node (if any) match the declared argument types, and that the types of the nodes themselves match their declared value type. Note that if overloading on return types is allowed then a second downward pass can also be required to resolve the signatures.

5.1. Declaring types

Types themselves are instances of the TypeObject class. They are declared by calling the def_type method of a PrattParser instance:

my_int = parser.def_type("t_int")
my_float = parser.def_type("t_float")

As of now these types are essentially just string labels, and type equivalence is based on string equivalence. At some point the code may be extended to allow more complex type objects (see the module pratt_types.)

Type objects can be passed to functions which take val_type and arg_types arguments. These include the builtin parsing methods as well as def_construct calls for defining custom constructs. They also include overload calls, which add a new type signature to an existing construct.

floor_construct = parser.def_stdfun("k_floor", "k_lpar", "k_rpar", "k_comma"
                                    val_type=my_int, arg_types=[my_float])
floor_construct.overload(val_type=my_int, arg_types=[my_int])

5.2. Type signatures

A type signature is an instance of the TypeSig class. It is basically a container class for a return or value type, called the val_type, and a list or tuple of argument types, called the arg_types.

The built-in parsing methods of the PrattParser class take arguments which correspond to the val_type and arg_types specifications, which are then used to create a TypeSig object. Many users will not need to explicitly use the TypeSig class. The basic conventions for wildcards, etc., apply to the val_types and arg_types arguments since they are passed to the initializer for a TypeSig object.

The type specifications for a nodes (token instances) in parsed expression trees are stored as instances of the TypeSig class. This data structure stores the value type of the node as an attribute val_type, and stores the types of the children nodes (i.e., any function arguments) as a tuple arg_types.

Using the type t_int defined earlier, we might have:

sig = TypeSig(t_int, [t_int, t_int])

This specifies a function (in the abstract sense) which takes two integer arguments and returns an integer. In terms of the corresponding parse-tree node, it specifies that the node itself has a value of type t_int, and that it has two child nodes which also both have values of type t_int. This might be used, for example, for a function of two variables such as gcd which takes two integers and returns another integer. Type signatures are associated with the handler function of the token which ends up as the root of the parse-subtree representing the expression.

If the arg_types list (or iterable) is instead an individual type (i.e., a TypeObject instance) rather than a list then any number of arguments/children are allowed, and they must all have that type. For example, this would be the type specification for a function taking any number if t_int arguments and returning a t_int:

sig = TypeSig(t_int, t_int)

For more control over variable numbers of arguments, the last group of types in an arg_types list can be wrapped in a Varargs instance. Then those particular arguments will be repeated any number of times to match the number of actual arguments. For example:

sig = TypeSig(t_int, [t_int, Varargs(t_int, t_float)])

The Varargs initializer can also take a keyword argument exact_repeat which, if set false, truncates any repeated arguments if necessary to match the number of actual arguments. The default is to raise an exception.

Wildcards, which match any type, are also allowed in type specifications. They can be defined by passing the value None as the type object instead of a predefined type. As a val_type this wildcard matches any type at the node, and in place of the arg_types list it declares that no checking is to be done on the children. The following three forms are equivalent, and specify that no type checking will be done (i.e., everything matches):

TypeSig(None, None) == TypeSig(None) == TypeSig()

Generally, if one argument is passed to initialize a TypeSig it is assumed to be the val_type:

TypeSig("t_int", None) == TypeSig("t_int")

The None wildcards can also be used inside the arg_types list to specify arguments which are not type-checked. This allows the number of arguments to be checked, and possibly some but not all arguments. For example, this signature specifies a function which takes exactly one argument but is otherwise unchecked:

TypeSig(None, [None])

This one takes exactly three arguments, all unchecked:

TypeSig(None, [None, None, None]) == TypeSig(None, [None]*3)

Finally, this type specification is for a function that takes two arguments, with the first a t_int and the second unchecked, returning a value that is unchecked:

TypeSig(None, [t_int, None])

Note that while paramaters like val_types and arg_types which are passed to a TypeSig interpret None as matching anything, passing None to a function parameter that expects an explicit TypeSig instance is interpreted as an undefined signature.

5.3. Overview of type-checking

Type checking is automatically performed just after the head or tail handler function of a construct has been called and has returned a subtree. The subtree is checked for types before it is returned to the recursive_parse routine. The val_types of the root node’s children are compared to the arg_types defined for the construct. (Overloading on return types also requires a second pass down the tree later.)

Setting the val_type of a construct defines the type of the root node of the subtree returned by the construct’s handler function. So, for example, setting the val_type for a token-literal construct defines the type of that token (in that preconditions context). Setting the val_type of an infix operator construct defines the type which is returned by the operator.

Setting the arg_types of a construct specifies what the val_types of the children of the returned root node should be. These are automatically checked, resolving overloading if possible.

In some cases the handler might need to influence the type-checking or node processing. The attribute process_and_check_kwargs of the root node can be set by a handler function before the node is returned. It should be passed a dict containing keyword arguments and values of the process_and_check_node function.

See the built-in methods of the PrattParser class for examples of how to define general constructs which check types.

5.4. Implementation details

See this page for low-level implementation details: