我已经编写了一个可以解决这个问题的生成器,但我想知道实现不合规则的最佳方法。
不久: 偏离规则 意味着在这种情况下,缩进被识别为语法元素。
这是伪代码的越位规则,用于制作以可用形式捕获缩进的标记器,我不想按语言限制答案:
token NEWLINE
matches r"\n\ *"
increase line count
pick up and store the indentation level
remember to also record the current level of parenthesis
procedure layout tokens
level = stack of indentation levels
push 0 to level
last_newline = none
per each token
if it is NEWLINE put it to last_newline and get next token
if last_newline contains something
extract new_level and parenthesis_count from last_newline
- if newline was inside parentheses, do nothing
- if new_level > level.top
push new_level to level
emit last_newline as INDENT token and clear last_newline
- if new_level == level.top
emit last_newline and clear last_newline
- otherwise
while new_level < level.top
pop from level
if new_level > level.top
freak out, indentation is broken.
emit last_newline as DEDENT token
clear last_newline
emit token
while level.top != 0
emit token as DEDENT token
pop from level
comments are ignored before they are getting into the layouter
layouter lies between a lexer and a parser
此布局不会生成多个NEWLINE,并且在出现缩进时不会生成NEWLINE。因此,解析规则仍然非常简单。我认为这是非常好的,但请告知是否有更好的方法来完成它。
在使用这一段时间之后,我注意到在DEDENT之后无论如何都可以发出新行,这样你就可以将表达式与NEWLINE分开,同时将INDENT DEDENT保留为表达式的预告片。