10.8. typped.register_grammar_with_parser

This module contains the functions which handle the recursive descent parsing of the rules of a Grammar object instance. Grammar objects are defined in the typped.ebnf_classes_and_operators module.

In ths module only the function register_rule_handlers_with_parser is formally exposed to be called externally (and it is only called from the ebnf_classes_and_operators module).

The first argument to register_rule_handlers_with_parser is a parser instance. The function modifies the parser to parse the grammar rule which is also passed in.

Note that because the processing uses the parser instance variables pstate_stack and disable_pstate_processing this parsing is not thread-safe for multiple parses on the same parser (but the parser in general probably is not, either).

10.8.1. Rules and guidelines

  • No left recursion. Some token must be consumed before any recursive call. The below example is NOT OK because the left recursion will never stop:

    arglist = Rule("arglist") + Tok("k_comma") + Rule("arg") | Rule("arg") # NOT OK
    

    Right recursion is OK:

    arglist = Rule("arg") + Tok("k_comma") + Rule("arglist") | Rule("arg") # OK
    
  • Cases are evaluated in sequential order, so if cases have the same prefix the longer-prefix cases should be put first. This modified version of the OK rule above will never select the second case. It will always parse a single “arg” rule and be satisfied:

    arglist = Rule("arg") | Rule("arg") + Tok("k_comma") + Rule("arglist") # NOT OK
    

10.8.2. Precedences

Precedences not yet implemented.

Assume for now that precedences only apply to token literals. Here are two possible ways to consider that they will be used:

num_expr      = k_number + Rule("operator") + num_expr | k_number
operator      = Tok("k_ast")[10] | Tok("k_plus")[20]

num_expr         = k_number + Rule("op_and_right_arg") | k_number
op_and_right_arg = Tok("k_ast")[10] + num_expr | Tok("k_plus")[20] + num_expr

Note that in the second case the parsing for num_expr must recognize that the rule has a precedence which is defined in a different expression. But, there is not a single, fixed precedence value that can be propagated backward. The parsing needs to handle this in some way or else rule out this kind of case.

Now consider parse("5 + 3 * 2").

When the handler for a nonterminal runs it processes all tokens simply by reading them (effectively as heads) and processes nonterminals by pushing that label on the pstate stack and calling recursive_parse. Only the precedences of tokens matter in this sequence, since they are the only ones a Pratt parser would ever see. When recursive_parse(subexp_prec) is called for a nonterminal it is passed the subexp_prec of the current subexpression. So the information is essentially just forwarded by nonterminal handlers.

Inside each nonterminal handler, then, we need to mimic the basic form of recursive_parse, starting when a token with nonzero precedence is read. That precedence is pushed as the new subexp_prec. We keep track of the recursive_parse loop condition and fail if we encounter a “loop break” before the end of the rule case being handled. Nonzero precedence tokens also take the current processed_left as their left child and all in the rest of the case as their right child.

TODO: Consider using some kind of peekahead precedence values for nonterminals: Its precedence is the same as that of its peek token, i.e., the next non-virtual token that is actually in the lexer. This is easy to do for ordinary precedence, since prec() is a function (or can be a property). This is to allow a single nonterminal for a whole group of operators (in different cases) with different precedences. (Note that these would only process the right part of the operation, but should then make the previous result into the left operand.) BUT: Grammar precedences are not the same as the ordinary Pratt precedences (as of now, anyway) and for future purposes we need to consider how this interacts with the lookahead peek supposing that precedences become construct attributes as planned.

10.8.3. Code

typped.register_grammar_with_parser.register_rule_handlers_with_parser(parser, nonterm_label, grammar)[source]

Register production rules for all the cases of the nonterminal nonterm_label, as defined in the Grammar object grammar.

This is run to initially process a caselist. It partitions the caselist for the nonterm_label nonterminal/pstate into separate caselists for cases which start with the same thing and may require backtracking. It does the following:

  1. Look up the caselist for nonterm_label in the grammar object.

  2. Loop over that caselist, making a sub-caselists for cases which start with the same thing (i.e., which have the same first set). Start with the first case, and proceed until the initial caselist is fully processed and converted to a collection of sub-caselists. (All production rule starts are currently considered the same, but when the first-set is computed then they will be grouped with others having the same first-set.)

    These sub-caselists which start with a token are stored in a dict token_literal_start_cases, keyed by the token label of the beginning token. Those which start with a nonterminal are saved in a list nonterm_start_cases.

  3. For each sub-caselist created above, call def_handlers_for_first_case_of_nonterminal with that caselist.

typped.register_grammar_with_parser.def_null_string_handler_for_first_item_of_caselist(parser, nonterm_label, null_token, caselist, peek_token_label=None)[source]

Define the head and tail handlers for the null-string token that is called to parse a production rule (i.e., it is called when the precondition that the state label nonterm_label is on the top of the pstate_stack is satisfied).

These handlers handle all the production rule cases of the caselist for the nonterminal nonterminal, backtracking on failure and trying the next case, etc. They act very much like the usual recursive descent function for parsing a production rule, except that to make a recursive call to handle a sub-production they push that production label onto the pstate_stack and then call recursive_parse. Then the handler for that production will be called, doing a search over its cases, etc., returning the value to the calling level.

The peek_token_label is used to set a precondition on cases which start with a particular kind of token. This can also be used to implement first-sets.

The label of the starting nonterminal is assumed to have initially been pushed on the pstate_stack in order to do grammar-based processing.

typped.register_grammar_with_parser.nonterminal_handler_factory(nonterm_label, caselist)[source]

Return a generic head handler with nonterm_label and caselist bound in the closure.

typped.register_grammar_with_parser.generic_nonterminal_handler(nonterm_label, caselist, tok, lex)[source]

The head handler assigned to the first token of the first case for a caselist. It tries all the other cases if it fails.

typped.register_grammar_with_parser.next_token_literal_with_type_assignment(lex)[source]

Read a token and set its types.

typped.register_grammar_with_parser.parse_case(case, lex, pstate_stack, tok, nonterm_label)[source]

Try to parse the particular case represented by the ItemList passed as the case parameter. Raise an exception on failure.

typped.register_grammar_with_parser.partition_caselist(caselist)[source]

A utility routine to partition a caselist into sub-caselists. Case lists here are ordinary Python lists of Case instances, not full CaseList objects (which are really only needed for the overloaded-operator grammar processing).

Return a defaultdict of caselists holding the ones which start with token literal, keyed by the token labels. Also return a list of caselists for those that start with a nonterminal.

typped.register_grammar_with_parser.always_raise_branch_fail_head_handler(tok, lex)[source]

This is a lower-priority head handler that is triggered by null-string tokens when all their actual precondition functions fail. It signals branch failure in the search tree.

typped.register_grammar_with_parser.lower_priority_fail_if_pstate_stack_nonempty_precond(tok, lex)[source]

A precondition to fail if no other stack-based precondition function matches when nonterm_label is at the top of the pstate_stack.

typped.register_grammar_with_parser.get_precond_funs(nonterm_label, peek_token_label)[source]

Get the precondition functions for registering null-space tokens. The nonterm_label and peek_token_label values are saved in the function closure.

exception typped.register_grammar_with_parser.BranchFail[source]

Bases: typped.shared_settings_and_exceptions.ParserException

Used in backtracking when parsing production rules. Raised when a case of a rule fails.