Lexical Analysis
Informal Sketch¶
lexical analysis 的目标就是将 string 拆分(识别)成 tokens,而 token 又会根据用途被进一步分类,例如分为 identifier、integer、keyword、whitespace 等,最终 lexical analysis 会生成 a stream of tokens.
设计一个 lexical analyzer 可以分为以下两个步骤:
- Define a finite set of tokens
- Tokens describes all items of interest
- Choice of tokens depends on language & design of parser
- Describe which strings belong to each token
最终 lexer 会返回 token-lexeme pairs,并丢弃对 parser 无用的 pairs 例如 whitespace.
Abstract
- The goal of lexical analysis is to
- Partition the input string into lexemes
- Identify the tokens of each lexeme
- Left-to-right scan => lookahead sometimes needed
Regular Languages¶
识别 lexemes 的所有 formalisms 中最流行的是 regular languages.
Syntax v.s. Semantics
(感性理解一下)