mirror of
https://github.com/vczh-libraries/Release.git
synced 2026-05-21 13:18:28 +08:00
Update TODO_ParserGen.md
This commit is contained in:
+132
-109
@@ -1,4 +1,6 @@
|
||||
# Goal
|
||||
# ParserGen 2.0
|
||||
|
||||
## Goal
|
||||
|
||||
* Parsing
|
||||
* Explicily declare the boundary of ambiguity resolving (e.g. on EXPR or on STAT)
|
||||
@@ -15,7 +17,7 @@
|
||||
* Low overhead AST with reflection
|
||||
* Optional creating AST from a pool
|
||||
|
||||
# AST Definition (compatible with Workflow)
|
||||
## AST Definition (compatible with Workflow)
|
||||
|
||||
```
|
||||
class CLASS_NAME [: BASE_CLASS]
|
||||
@@ -25,128 +27,149 @@ class CLASS_NAME [: BASE_CLASS]
|
||||
}
|
||||
```
|
||||
|
||||
## Configurations
|
||||
### Configurations
|
||||
|
||||
- Include files in generated C++ header
|
||||
- Depended AST definition files
|
||||
- Visitors selected to generate
|
||||
- Optional reflection support
|
||||
- All AST constructors are protected
|
||||
- Generated factory class
|
||||
- If AST object pool is enabled
|
||||
- reflection is disabled
|
||||
- `Ptr<T>` for all AST types are generated will enumerated `Cast` function.
|
||||
- Use generated RTTI constructions (e.g. enum class tag for type)
|
||||
* Include files in generated C++ header
|
||||
* Depended AST definition files
|
||||
* Visitors selected to generate
|
||||
* Optional reflection support
|
||||
* All AST constructors are protected
|
||||
* Generated factory class
|
||||
* If AST object pool is enabled
|
||||
* reflection is disabled
|
||||
* `Ptr<T>` for all AST types are generated will enumerated `Cast` function.
|
||||
* Use generated RTTI constructions (e.g. enum class tag for type)
|
||||
|
||||
## Types
|
||||
### Types
|
||||
|
||||
- `Token`: In the previous version, `Token` is a value type, now it is a reference type.
|
||||
- `CLASS-NAME`: Another class.
|
||||
- `TYPE[]`: Array, whose element is not allowed to be another array.
|
||||
* `Token`: In the previous version, `Token` is a value type, now it is a reference type.
|
||||
* `CLASS-NAME`: Another class.
|
||||
* `TYPE[]`: Array, whose element is not allowed to be another array.
|
||||
|
||||
## MISC
|
||||
### MISC
|
||||
|
||||
- Define a `ToString` algorithm with customizable configurations.
|
||||
* Define a `ToString` algorithm with customizable configurations.
|
||||
|
||||
# Lexical Analyzer
|
||||
## Lexical Analyzer
|
||||
|
||||
- Pair name with regular expressions.
|
||||
- Extendable tokens.
|
||||
- For example, recognize `R"[^\s(]\(` and invoke a callback function to determine the end of the string
|
||||
- Pair a name with the token subset, and give a default name to a token full set
|
||||
* Pair name with regular expressions.
|
||||
* Extendable tokens.
|
||||
* For example, recognize `R"[^\s(]\(` and invoke a callback function to determine the end of the string
|
||||
* Pair a name with the token subset, and give a default name to a token full set
|
||||
|
||||
# Error Messages
|
||||
## Error Messages
|
||||
|
||||
- Generate error messages in C++ code
|
||||
* Generate error messages in C++ code
|
||||
|
||||
# Syntax Analyzer
|
||||
## Syntax Analyzer
|
||||
|
||||
- Priority of loops:
|
||||
- `+[ RULE ]` means if `RULE` succeeds, skipping `RULE` is not considered even if the rest doesn't parse.
|
||||
- `-[ RULE ]` means only if skipping `RULE` makes the clause not able to parse, the result of having RULE is not discarded.
|
||||
- `[ RULE ]` means keep both result
|
||||
- `+{ RULE }`, `-{ RULE }`, `{ RULE }` are similar, but `{ RULE }` may generate more than two results, meanwhile others only generate one result.
|
||||
- Being able to change token subset during parsing.
|
||||
- Being able to specify a error message when a certain action fails.
|
||||
- Generate SAX-like parser, with a default handler to create AST.
|
||||
* Priority of loops:
|
||||
* `+[ RULE ]` means if `RULE` succeeds, skipping `RULE` is not considered even if the rest doesn't parse.
|
||||
* `-[ RULE ]` means only if skipping `RULE` makes the clause not able to parse, the result of having RULE is not discarded.
|
||||
* `[ RULE ]` means keep both result
|
||||
* `+{ RULE }`, `-{ RULE }`, `{ RULE }` are similar, but `{ RULE }` may generate more than two results, meanwhile others only generate one result.
|
||||
* Being able to change token subset during parsing.
|
||||
* Being able to specify a error message when a certain action fails.
|
||||
* Generate SAX-like parser, with a default handler to create AST.
|
||||
* Generate each **POOLED** tuple struct type for
|
||||
* Loop body. Delimitered list is considered as [ITEM {DELIMITER ITEM}]
|
||||
* Loop records a pointer to the reversed linked list of the last item during calculation
|
||||
* Loop records an array of items as the result
|
||||
* Alternative as `Union<Ts...>` storing `{TYPE-FLAG, ITEM*}` (value type)
|
||||
* Optional as `Optional<T>` storing `{ITEM*}` (value type)
|
||||
* Sequencial as `{A, B, C ...}` with generated field names (value type)
|
||||
* Type is rule or rule fragment, not the result AST type
|
||||
* If there are multiple fields in same type, appended with an index of the position in the rule (optionals, alternatives and loops are packed as one)
|
||||
* If a tuple is created directly from a rule, there will be a static field to indicate which rule does it come from
|
||||
* Rule reference as `Reference<Ts...>` storing `{RULE, FRAGMENT, TYPE-FLAG, ITEM*}`
|
||||
* A `Reference<Ts...>` are aliased
|
||||
* Consider about forward declarations
|
||||
* All types have an un-templated partner so that the core SAX-like instruction execution doesn't need to know concrete types
|
||||
|
||||
## Supported EBNF
|
||||
### Supported EBNF
|
||||
|
||||
- TOKEN [`:` PROPERTY-NAME]
|
||||
- RULE [`:` PROPERTY-NAME]
|
||||
- Optional:
|
||||
- `+[` EBNF `]`
|
||||
- `-[` EBNF `]`
|
||||
- `[` EBNF `]`
|
||||
- Loop:
|
||||
- `+{` EBNF `}`
|
||||
- `-{` EBNF `}`
|
||||
- `{` EBNF `}`
|
||||
- `with{` PROPERTY-ASSIGNMENT ... `}`
|
||||
|
||||
## EBNF Program
|
||||
* TOKEN [`:` PROPERTY-NAME]
|
||||
* RULE [`:` PROPERTY-NAME]
|
||||
* Optional:
|
||||
* `+[` EBNF `]`
|
||||
* `-[` EBNF `]`
|
||||
* `[` EBNF `]`
|
||||
* Loop:
|
||||
* `+{` EBNF `}`
|
||||
* `-{` EBNF `}`
|
||||
* `{` EBNF `}`
|
||||
* `with{` PROPERTY-ASSIGNMENT ... `}`
|
||||
|
||||
- RULE {`::=` CLAUSE `as` CLASS-NAME} `;`
|
||||
- Consider a syntax here to switch token set
|
||||
### EBNF Program
|
||||
|
||||
## ToString Algorithm Requirements
|
||||
- Every clause should create an AST node. `EXP ::= '(' !EXP ')'` is not allowed, except that this clause has only one node.
|
||||
- Every rule-name node should be assigned to a property. Token nodes are optional but those properties will be auto-generated.
|
||||
- Loops cannot be embedded in another loop. It doesn't limit the syntax, but it limit the shape of AST.
|
||||
* RULE {`::=` CLAUSE `as` CLASS-NAME} `;`
|
||||
* Consider a syntax here to switch token set
|
||||
|
||||
## Development Project Structure
|
||||
- Original ParserGen code will be separated from Vlpp.
|
||||
- **Development Steps**
|
||||
### ToString Algorithm Requirements
|
||||
|
||||
* Every clause should create an AST node. `EXP ::= '(' !EXP ')'` is not allowed, except that this clause has only one node.
|
||||
* Every rule-name node should be assigned to a property. Token nodes are optional but those properties will be auto-generated.
|
||||
* Loops cannot be embedded in another loop. It doesn't limit the syntax, but it limit the shape of AST.
|
||||
|
||||
### Development Project Structure
|
||||
|
||||
* Original ParserGen code will be separated from Vlpp.
|
||||
* **Development Steps**
|
||||
1. Symbols for ParserGen AST
|
||||
2. Manually: symbols -> `ParserGen AST described in C++`
|
||||
3. Manually: ParserGen Syntax described in `ParserGen AST described in C++` -> `ParserGen Parser described in C++`
|
||||
2. Manually: symbols -> `ParserGen AST declaration in C++`
|
||||
3. Manually: ParserGen Syntax described in `ParserGen AST declaration in C++` -> `ParserGen Parser declaration in C++`
|
||||
4. Integrate
|
||||
- **AstGen**:
|
||||
- **Goal**: given symbols and generated C++ code for AST
|
||||
- **Produce** (from unit test):
|
||||
- Generated source code (AST part): declaration, visitors, builder, reflection for ParserGen input
|
||||
- AST symbols and C++ code generation.
|
||||
- Generate visitors.
|
||||
- Generate easy builder.
|
||||
- Generate reflection.
|
||||
- **Execution**:
|
||||
- **Goal**: given instructions and parse input text with SAX-like callback
|
||||
- Parser-generated instructions serialization.
|
||||
- Execute instructions as a SAX-like parser, with notification on ambiguous node, error message generation and error recovering.
|
||||
- **Compiler** -> **AstGen**, **Execution**
|
||||
- **Goal**: input described using `Generated source code (AST part)` and generate instructions (text parser)
|
||||
- **Produce** (from unit test)
|
||||
- Generated C++ source code (parser part) for ParserGen input
|
||||
- Take the `ParserGen AST declaration` and generate instructions.
|
||||
- Generate the default handler to create AST for the SAX-like parser.
|
||||
- ToString algorithm.
|
||||
- Bidirection binding with AST the text.
|
||||
- **ParserGen** -> **Compiler**
|
||||
- **Goal**: CLI Tool
|
||||
- Integrate `Generated source code (AST part)`
|
||||
- Integrate `Generated source code (parser part)`
|
||||
- Handle command line arguments
|
||||
- **UnitTestAst**:
|
||||
- Unit test of **AstGen** building block and pool allocation etc.
|
||||
- **Produces** steps
|
||||
- Hand-written `AST for ParserGen` symbols.
|
||||
- Codegen symbols and get `Generated source code (AST part)` for ParserGen input
|
||||
- **UnitTestExecution**:
|
||||
- Unit test of **Execution**.
|
||||
- Assert directly on SAX-like parser.
|
||||
- **UnitTestCompiler**:
|
||||
- Unit test of **Compiler**, input are all **UnitTestExecution** test cases rewritten using the generated easy-builder for `AST for ParserGen` AST.
|
||||
- Assert on the ToString-ed AST. (shared)
|
||||
- **Produces** steps
|
||||
- Implementa ParserGen AST Input syntax using generated easy builder form `Generated source code (AST part)`
|
||||
- Serialize instructions and get `C++ source code (parser part)` for ParserGen input
|
||||
- **UnitTestParserGen**:
|
||||
- Unit test of **ParserGen**, input are all **UnitTestExecution** test cases rewritten in text format.
|
||||
- Assert on the ToString-ed AST. (shared)
|
||||
- Generate all parser in text format to C++ code
|
||||
- **UnitTest**:
|
||||
- Link all cpp files in all other unit test projects so that all test cases can be run in one F5.
|
||||
- Test all generated parsers in **UnitTestParserGen**.
|
||||
- Assert on the ToString-ed AST. (shared)
|
||||
- Since parser are written in different ways for different unit test projects, they are stored separately from unit test projects to share necessary files.
|
||||
- Do not really write a file if the generated content doesn't change.
|
||||
* **TODO**: Reorganize unit test projects to pure unit test and code generation steps
|
||||
* Code generation steps are also multiple projects
|
||||
* Because there are projects and the partner unit test that rely on generated code from depended projects
|
||||
* **AstGen**:
|
||||
* **Goal**: given symbols and generated C++ code for AST
|
||||
* **Produce** (from unit test):
|
||||
* Generated source code (AST part): declaration, visitors, builder, reflection for ParserGen input
|
||||
* AST symbols and C++ code generation.
|
||||
* Generate visitors.
|
||||
* Generate easy builder.
|
||||
* Generate reflection.
|
||||
* **Execution**:
|
||||
* **Goal**: given instructions and parse input text with SAX-like callback
|
||||
* An instruction could generate multiple continuations
|
||||
* Parser-generated instructions serialization.
|
||||
* Execute instructions as a SAX-like parser, with notification on ambiguous node, error message generation and error recovering.
|
||||
* If there is ambiguity, different callbacks could be called on the same position, and results could be discarded in the future execution.
|
||||
* **Compiler** -> **AstGen**, **Execution**
|
||||
* **Goal**: input described using `Generated source code (AST part)` and generate instructions (text parser)
|
||||
* **Produce** (from unit test)
|
||||
* Generated C++ source code (parser part) for ParserGen input
|
||||
* Take the `ParserGen AST declaration` and generate instructions.
|
||||
* Generate the default handler to create AST for the SAX-like parser.
|
||||
* ToString algorithm.
|
||||
* Bidirection binding with AST the text.
|
||||
* **ParserGen** -> **Compiler**
|
||||
* **Goal**: CLI Tool
|
||||
* Integrate `Generated source code (AST part)`
|
||||
* Integrate `Generated source code (parser part)`
|
||||
* Handle command line arguments
|
||||
* **UnitTestAst**:
|
||||
* Unit test of **AstGen** building block and pool allocation etc.
|
||||
* **Produces** steps
|
||||
* Hand-written `AST for ParserGen` symbols.
|
||||
* Codegen symbols and get `Generated source code (AST part)` for ParserGen input
|
||||
* **UnitTestExecution**:
|
||||
* Unit test of **Execution**.
|
||||
* Assert directly on SAX-like parser.
|
||||
* **UnitTestCompiler**:
|
||||
* Unit test of **Compiler**, input are all **UnitTestExecution** test cases rewritten using the generated easy-builder for `AST for ParserGen` AST.
|
||||
* Assert on the ToString-ed AST. (shared)
|
||||
* **Produces** steps
|
||||
* Implementa ParserGen AST Input syntax using generated easy builder form `Generated source code (AST part)`
|
||||
* Serialize instructions and get `C++ source code (parser part)` for ParserGen input
|
||||
* **UnitTestParserGen**:
|
||||
* Unit test of **ParserGen**, input are all **UnitTestExecution** test cases rewritten in text format.
|
||||
* Assert on the ToString-ed AST. (shared)
|
||||
* Generate all parser in text format to C++ code
|
||||
* **UnitTest**:
|
||||
* Link all cpp files in all other unit test projects so that all test cases can be run in one F5.
|
||||
* Test all generated parsers in **UnitTestParserGen**.
|
||||
* Assert on the ToString-ed AST. (shared)
|
||||
* Since parser are written in different ways for different unit test projects, they are stored separately from unit test projects to share necessary files.
|
||||
* Do not really write a file if the generated content doesn't change.
|
||||
|
||||
Reference in New Issue
Block a user