9. Lookbehind and other extra data

Another minor generalization to Pratt parsing is the use of “lookbehind” information. A Pratt parser can use lookahead information from the lexer in defining preconditions, etc. In some cases lookbehind information, looking at the previous processed_left values for the current subexpression, could be useful.

This is a simple modification, and is currently implemented. In the recursive_parse function, whenever the processed_left variable is assigned a new value, that value is also appended to a list called lookbehind. This list is temporarily set as an attribute of the triggering token, and so can be accessed as tok.extra_data.lookbehind in both the handler functions and in the preconditions functions.

Since the lookbehind tokens have already been processed they allow the preconditions to make use of information such as resolved type information (not just token label information). Of course you could already look at the left variable in a tail handler and see the type of the subexpression for, say, the type of the left operand of an operator.

Note that the lookbehind list contains references, not copies, and so the previous values will generally be modified versions of what they were when they were first appended to the list. The main thing that the lookbehind list tells you is how many subexpressions precede the current one at its same level in the recursion. In theory, the whole head versus tail distinction could be eliminated and replaced with preconditions on whether or not the lookbehind list is empty. The distinction between head and tail handlers is useful in practice, however, and so has been kept.

Lookbehind information is not a feature which will be commonly used, but it may have some use cases.

In addition to lookbehind information, a namedtuple containing other information that users might want to access during parsing is temporarily set as the attribute extra_data of a triggering token. The current subexpression precedence is available as subexp_prec. Also available is a list constructs which contains all the constructs for all the previous sub-subexpressions of the subexpression. They are appended just after they are dispatched, and so the current construct is available in head or tail handler functions.