Implementing typed languages ============================ The Typped package provides a fairly general type-definition and type-checking mechanism. It is optional, however, and can be ignored for untyped languages. Types can be used for detecting syntax errors (either dynamically or statically), for overloading operators, for validating data, etc. When types are defined and set up properly (using either the built-in methods of the ``PrattParser`` class or custom methods) the resulting expression trees will be automatically type-checked at parse-time. To disable type-checking altogether the argument ``skip_type_checking=True`` can be passed to the initializer of the ``PrattParser`` instance. Only simple types are currently implemented (though the mechanism can easily be made more general). As of now type are essentially just string labels which are equivalent iff the labels are the same. At some point type heirarchies, type conversions, etc., may be added. Each construct registered with a parser instance can have type information associated with it. Generally this is specified by giving the type of the root node in the expression subtree for the parsed construct as well as the types of the children of that node. This type signature is then checked against the actual types in the parse subtree returned by the handler function for the construct. The type system allows for operator overloading, including optional overloading on return types. The default is to allow overloading only on argument/child types. Overloading currently has the mild restriction that all overloads defined for an infix operator must share the same precedence value. To disable all overloading, set ``overload_on_arg_types=False`` in the ``PrattParser`` initializer. To allow overloading on return types as well as argument types, set ``overload_on_ret_types=True`` in the initializer. The default is to allow overloading on argument types but not on return types. Operator overloading is used for constructs which parse the same (i.e., via the same handler function and with the same triggering conditions) but which can have multiple possible type signatures. Type resolution is then based on the actual arguments found at parse-time. Overloading of a construct with a new type signature (and possibly with a corresponding evaluation function and/or an AST data item) is done by explicitly calling the ``overload`` method of a construct instance. The nodes in the final parsed expression tree have an attribute ``actual_sig`` giving the actual type signature, an attribute ``original_formal_sig`` giving the matching original signature, and an attribute ``expanded_formal_sig`` giving the intermediate, expanded form of the original signature. The evaluation functions and AST data for a construct are stored keyed by the original signature which was passed in at the time when that overload was defined. Type information for a construct (e.g., a function evaluation subexpression) should generally be defined relative to the token which ends up as the subtree root in the expression/parse tree for the construct. Type-resolution and checking is done as soon as possible on the nodes of the parse tree as it is constructed, bottom-up. Checking makes sure that the types of the children of a node (if any) match the declared argument types, and that the types of the nodes themselves match their declared value type. Note that if overloading on return types is allowed then a second downward pass can also be required to resolve the signatures. Declaring types --------------- Types themselves are instances of the ``TypeObject`` class. They are declared by calling the ``def_type`` method of a ``PrattParser`` instance:: my_int = parser.def_type("t_int") my_float = parser.def_type("t_float") As of now these types are essentially just string labels, and type equivalence is based on string equivalence. At some point the code may be extended to allow more complex type objects (see the module ``pratt_types``.) Type objects can be passed to functions which take ``val_type`` and ``arg_types`` arguments. These include the builtin parsing methods as well as ``def_construct`` calls for defining custom constructs. They also include ``overload`` calls, which add a new type signature to an existing construct. :: floor_construct = parser.def_stdfun("k_floor", "k_lpar", "k_rpar", "k_comma" val_type=my_int, arg_types=[my_float]) floor_construct.overload(val_type=my_int, arg_types=[my_int]) Type signatures --------------- A type signature is an instance of the ``TypeSig`` class. It is basically a container class for a return or value type, called the ``val_type``, and a list or tuple of argument types, called the ``arg_types``. The built-in parsing methods of the ``PrattParser`` class take arguments which correspond to the ``val_type`` and ``arg_types`` specifications, which are then used to create a ``TypeSig`` object. Many users will not need to explicitly use the ``TypeSig`` class. The basic conventions for wildcards, etc., apply to the ``val_types`` and ``arg_types`` arguments since they are passed to the initializer for a ``TypeSig`` object. The type specifications for a nodes (token instances) in parsed expression trees are stored as instances of the ``TypeSig`` class. This data structure stores the value type of the node as an attribute ``val_type``, and stores the types of the children nodes (i.e., any function arguments) as a tuple ``arg_types``. Using the type ``t_int`` defined earlier, we might have: .. code-block:: python sig = TypeSig(t_int, [t_int, t_int]) This specifies a function (in the abstract sense) which takes two integer arguments and returns an integer. In terms of the corresponding parse-tree node, it specifies that the node itself has a value of type ``t_int``, and that it has two child nodes which also both have values of type ``t_int``. This might be used, for example, for a function of two variables such as ``gcd`` which takes two integers and returns another integer. Type signatures are associated with the handler function of the token which ends up as the root of the parse-subtree representing the expression. If the ``arg_types`` list (or iterable) is instead an individual type (i.e., a ``TypeObject`` instance) rather than a list then any number of arguments/children are allowed, and they must all have that type. For example, this would be the type specification for a function taking any number if ``t_int`` arguments and returning a ``t_int``: .. code-block:: python sig = TypeSig(t_int, t_int) For more control over variable numbers of arguments, the last group of types in an ``arg_types`` list can be wrapped in a ``Varargs`` instance. Then those particular arguments will be repeated any number of times to match the number of actual arguments. For example: .. code-block:: python sig = TypeSig(t_int, [t_int, Varargs(t_int, t_float)]) The ``Varargs`` initializer can also take a keyword argument ``exact_repeat`` which, if set false, truncates any repeated arguments if necessary to match the number of actual arguments. The default is to raise an exception. Wildcards, which match any type, are also allowed in type specifications. They can be defined by passing the value ``None`` as the type object instead of a predefined type. As a ``val_type`` this wildcard matches any type at the node, and in place of the ``arg_types`` list it declares that no checking is to be done on the children. The following three forms are equivalent, and specify that no type checking will be done (i.e., everything matches): .. code-block:: python TypeSig(None, None) == TypeSig(None) == TypeSig() Generally, if one argument is passed to initialize a ``TypeSig`` it is assumed to be the ``val_type``: .. code-block:: python TypeSig("t_int", None) == TypeSig("t_int") The ``None`` wildcards can also be used inside the ``arg_types`` list to specify arguments which are not type-checked. This allows the number of arguments to be checked, and possibly some but not all arguments. For example, this signature specifies a function which takes exactly one argument but is otherwise unchecked: .. code-block:: python TypeSig(None, [None]) This one takes exactly three arguments, all unchecked: .. code-block:: python TypeSig(None, [None, None, None]) == TypeSig(None, [None]*3) Finally, this type specification is for a function that takes two arguments, with the first a ``t_int`` and the second unchecked, returning a value that is unchecked: .. code-block:: python TypeSig(None, [t_int, None]) Note that while paramaters like ``val_types`` and ``arg_types`` which are passed to a ``TypeSig`` interpret ``None`` as matching anything, passing ``None`` to a function parameter that expects an explicit ``TypeSig`` instance is interpreted as an undefined signature. Overview of type-checking ------------------------- Type checking is automatically performed just after the head or tail handler function of a construct has been called and has returned a subtree. The subtree is checked for types before it is returned to the ``recursive_parse`` routine. The ``val_types`` of the root node's children are compared to the ``arg_types`` defined for the construct. (Overloading on return types also requires a second pass down the tree later.) Setting the ``val_type`` of a construct *defines* the type of the root node of the subtree returned by the construct's handler function. So, for example, setting the ``val_type`` for a token-literal construct defines the type of that token (in that preconditions context). Setting the ``val_type`` of an infix operator construct defines the type which is returned by the operator. Setting the ``arg_types`` of a construct specifies what the ``val_types`` of the children of the returned root node should be. These are automatically checked, resolving overloading if possible. In some cases the handler might need to influence the type-checking or node processing. The attribute ``process_and_check_kwargs`` of the root node can be set by a handler function before the node is returned. It should be passed a dict containing keyword arguments and values of the ``process_and_check_node`` function. See the built-in methods of the ``PrattParser`` class for examples of how to define general constructs which check types. Implementation details ---------------------- See this page for low-level implementation details: :doc:`type_checking_implementation_details`