10.8. typped.register_grammar_with_parser¶
This module contains the functions which handle the recursive descent parsing
of the rules of a Grammar
object instance. Grammar objects are defined in
the typped.ebnf_classes_and_operators
module.
In ths module only the function register_rule_handlers_with_parser
is
formally exposed to be called externally (and it is only called from the
ebnf_classes_and_operators
module).
The first argument to register_rule_handlers_with_parser
is a parser
instance. The function modifies the parser to parse the grammar rule which is
also passed in.
Note that because the processing uses the parser instance variables pstate_stack
and disable_pstate_processing
this parsing is not thread-safe for
multiple parses on the same parser (but the parser in general probably is not,
either).
10.8.1. Rules and guidelines¶
No left recursion. Some token must be consumed before any recursive call. The below example is NOT OK because the left recursion will never stop:
arglist = Rule("arglist") + Tok("k_comma") + Rule("arg") | Rule("arg") # NOT OK
Right recursion is OK:
arglist = Rule("arg") + Tok("k_comma") + Rule("arglist") | Rule("arg") # OK
Cases are evaluated in sequential order, so if cases have the same prefix the longer-prefix cases should be put first. This modified version of the OK rule above will never select the second case. It will always parse a single “arg” rule and be satisfied:
arglist = Rule("arg") | Rule("arg") + Tok("k_comma") + Rule("arglist") # NOT OK
10.8.2. Precedences¶
Precedences not yet implemented.
Assume for now that precedences only apply to token literals. Here are two possible ways to consider that they will be used:
num_expr = k_number + Rule("operator") + num_expr | k_number
operator = Tok("k_ast")[10] | Tok("k_plus")[20]
num_expr = k_number + Rule("op_and_right_arg") | k_number
op_and_right_arg = Tok("k_ast")[10] + num_expr | Tok("k_plus")[20] + num_expr
Note that in the second case the parsing for num_expr
must recognize that the rule
has a precedence which is defined in a different expression. But, there is not
a single, fixed precedence value that can be propagated backward. The parsing
needs to handle this in some way or else rule out this kind of case.
Now consider parse("5 + 3 * 2")
.
When the handler for a nonterminal runs it processes all tokens simply by
reading them (effectively as heads) and processes nonterminals by pushing that
label on the pstate stack and calling recursive_parse
. Only the precedences
of tokens matter in this sequence, since they are the only ones a Pratt parser
would ever see. When recursive_parse(subexp_prec)
is called for a
nonterminal it is passed the subexp_prec
of the current subexpression. So
the information is essentially just forwarded by nonterminal handlers.
Inside each nonterminal handler, then, we need to mimic the basic form of
recursive_parse
, starting when a token with nonzero precedence is read. That
precedence is pushed as the new subexp_prec
. We keep track of the
recursive_parse
loop condition and fail if we encounter a “loop break” before
the end of the rule case being handled. Nonzero precedence tokens also take
the current processed_left
as their left child and all in the rest of the
case as their right child.
TODO: Consider using some kind of peekahead precedence values for nonterminals:
Its precedence is the same as that of its peek token, i.e., the next
non-virtual token that is actually in the lexer. This is easy to do for
ordinary precedence, since prec()
is a function (or can be a property). This
is to allow a single nonterminal for a whole group of operators (in different
cases) with different precedences. (Note that these would only process the
right part of the operation, but should then make the previous result into the
left operand.) BUT: Grammar precedences are not the same as the ordinary Pratt
precedences (as of now, anyway) and for future purposes we need to consider how
this interacts with the lookahead peek supposing that precedences become
construct attributes as planned.
10.8.3. Code¶
-
typped.register_grammar_with_parser.
register_rule_handlers_with_parser
(parser, nonterm_label, grammar)[source]¶ Register production rules for all the cases of the nonterminal
nonterm_label
, as defined in theGrammar
objectgrammar
.This is run to initially process a caselist. It partitions the caselist for the
nonterm_label
nonterminal/pstate into separate caselists for cases which start with the same thing and may require backtracking. It does the following:Look up the caselist for
nonterm_label
in the grammar object.Loop over that caselist, making a sub-caselists for cases which start with the same thing (i.e., which have the same first set). Start with the first case, and proceed until the initial caselist is fully processed and converted to a collection of sub-caselists. (All production rule starts are currently considered the same, but when the first-set is computed then they will be grouped with others having the same first-set.)
These sub-caselists which start with a token are stored in a dict
token_literal_start_cases
, keyed by the token label of the beginning token. Those which start with a nonterminal are saved in a listnonterm_start_cases
.For each sub-caselist created above, call
def_handlers_for_first_case_of_nonterminal
with that caselist.
-
typped.register_grammar_with_parser.
def_null_string_handler_for_first_item_of_caselist
(parser, nonterm_label, null_token, caselist, peek_token_label=None)[source]¶ Define the head and tail handlers for the null-string token that is called to parse a production rule (i.e., it is called when the precondition that the state label
nonterm_label
is on the top of thepstate_stack
is satisfied).These handlers handle all the production rule cases of the caselist for the nonterminal
nonterminal
, backtracking on failure and trying the next case, etc. They act very much like the usual recursive descent function for parsing a production rule, except that to make a recursive call to handle a sub-production they push that production label onto thepstate_stack
and then callrecursive_parse
. Then the handler for that production will be called, doing a search over its cases, etc., returning the value to the calling level.The
peek_token_label
is used to set a precondition on cases which start with a particular kind of token. This can also be used to implement first-sets.The label of the starting nonterminal is assumed to have initially been pushed on the
pstate_stack
in order to do grammar-based processing.
-
typped.register_grammar_with_parser.
nonterminal_handler_factory
(nonterm_label, caselist)[source]¶ Return a generic head handler with
nonterm_label
andcaselist
bound in the closure.
-
typped.register_grammar_with_parser.
generic_nonterminal_handler
(nonterm_label, caselist, tok, lex)[source]¶ The head handler assigned to the first token of the first case for a caselist. It tries all the other cases if it fails.
-
typped.register_grammar_with_parser.
next_token_literal_with_type_assignment
(lex)[source]¶ Read a token and set its types.
-
typped.register_grammar_with_parser.
parse_case
(case, lex, pstate_stack, tok, nonterm_label)[source]¶ Try to parse the particular case represented by the
ItemList
passed as thecase
parameter. Raise an exception on failure.
-
typped.register_grammar_with_parser.
partition_caselist
(caselist)[source]¶ A utility routine to partition a caselist into sub-caselists. Case lists here are ordinary Python lists of
Case
instances, not fullCaseList
objects (which are really only needed for the overloaded-operator grammar processing).Return a defaultdict of caselists holding the ones which start with token literal, keyed by the token labels. Also return a list of caselists for those that start with a nonterminal.
-
typped.register_grammar_with_parser.
always_raise_branch_fail_head_handler
(tok, lex)[source]¶ This is a lower-priority head handler that is triggered by null-string tokens when all their actual precondition functions fail. It signals branch failure in the search tree.
-
typped.register_grammar_with_parser.
lower_priority_fail_if_pstate_stack_nonempty_precond
(tok, lex)[source]¶ A precondition to fail if no other stack-based precondition function matches when
nonterm_label
is at the top of thepstate_stack
.
-
typped.register_grammar_with_parser.
get_precond_funs
(nonterm_label, peek_token_label)[source]¶ Get the precondition functions for registering null-space tokens. The
nonterm_label
andpeek_token_label
values are saved in the function closure.
-
exception
typped.register_grammar_with_parser.
BranchFail
[source]¶ Bases:
typped.shared_settings_and_exceptions.ParserException
Used in backtracking when parsing production rules. Raised when a case of a rule fails.