mirror of
https://github.com/vczh-libraries/Release.git
synced 2026-06-02 15:46:39 +08:00
Update TODO_ParserGen.md
This commit is contained in:
+131
-108
@@ -1,4 +1,6 @@
|
|||||||
# Goal
|
# ParserGen 2.0
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
* Parsing
|
* Parsing
|
||||||
* Explicily declare the boundary of ambiguity resolving (e.g. on EXPR or on STAT)
|
* Explicily declare the boundary of ambiguity resolving (e.g. on EXPR or on STAT)
|
||||||
@@ -15,7 +17,7 @@
|
|||||||
* Low overhead AST with reflection
|
* Low overhead AST with reflection
|
||||||
* Optional creating AST from a pool
|
* Optional creating AST from a pool
|
||||||
|
|
||||||
# AST Definition (compatible with Workflow)
|
## AST Definition (compatible with Workflow)
|
||||||
|
|
||||||
```
|
```
|
||||||
class CLASS_NAME [: BASE_CLASS]
|
class CLASS_NAME [: BASE_CLASS]
|
||||||
@@ -25,128 +27,149 @@ class CLASS_NAME [: BASE_CLASS]
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configurations
|
### Configurations
|
||||||
|
|
||||||
- Include files in generated C++ header
|
* Include files in generated C++ header
|
||||||
- Depended AST definition files
|
* Depended AST definition files
|
||||||
- Visitors selected to generate
|
* Visitors selected to generate
|
||||||
- Optional reflection support
|
* Optional reflection support
|
||||||
- All AST constructors are protected
|
* All AST constructors are protected
|
||||||
- Generated factory class
|
* Generated factory class
|
||||||
- If AST object pool is enabled
|
* If AST object pool is enabled
|
||||||
- reflection is disabled
|
* reflection is disabled
|
||||||
- `Ptr<T>` for all AST types are generated will enumerated `Cast` function.
|
* `Ptr<T>` for all AST types are generated will enumerated `Cast` function.
|
||||||
- Use generated RTTI constructions (e.g. enum class tag for type)
|
* Use generated RTTI constructions (e.g. enum class tag for type)
|
||||||
|
|
||||||
## Types
|
### Types
|
||||||
|
|
||||||
- `Token`: In the previous version, `Token` is a value type, now it is a reference type.
|
* `Token`: In the previous version, `Token` is a value type, now it is a reference type.
|
||||||
- `CLASS-NAME`: Another class.
|
* `CLASS-NAME`: Another class.
|
||||||
- `TYPE[]`: Array, whose element is not allowed to be another array.
|
* `TYPE[]`: Array, whose element is not allowed to be another array.
|
||||||
|
|
||||||
## MISC
|
### MISC
|
||||||
|
|
||||||
- Define a `ToString` algorithm with customizable configurations.
|
* Define a `ToString` algorithm with customizable configurations.
|
||||||
|
|
||||||
# Lexical Analyzer
|
## Lexical Analyzer
|
||||||
|
|
||||||
- Pair name with regular expressions.
|
* Pair name with regular expressions.
|
||||||
- Extendable tokens.
|
* Extendable tokens.
|
||||||
- For example, recognize `R"[^\s(]\(` and invoke a callback function to determine the end of the string
|
* For example, recognize `R"[^\s(]\(` and invoke a callback function to determine the end of the string
|
||||||
- Pair a name with the token subset, and give a default name to a token full set
|
* Pair a name with the token subset, and give a default name to a token full set
|
||||||
|
|
||||||
# Error Messages
|
## Error Messages
|
||||||
|
|
||||||
- Generate error messages in C++ code
|
* Generate error messages in C++ code
|
||||||
|
|
||||||
# Syntax Analyzer
|
## Syntax Analyzer
|
||||||
|
|
||||||
- Priority of loops:
|
* Priority of loops:
|
||||||
- `+[ RULE ]` means if `RULE` succeeds, skipping `RULE` is not considered even if the rest doesn't parse.
|
* `+[ RULE ]` means if `RULE` succeeds, skipping `RULE` is not considered even if the rest doesn't parse.
|
||||||
- `-[ RULE ]` means only if skipping `RULE` makes the clause not able to parse, the result of having RULE is not discarded.
|
* `-[ RULE ]` means only if skipping `RULE` makes the clause not able to parse, the result of having RULE is not discarded.
|
||||||
- `[ RULE ]` means keep both result
|
* `[ RULE ]` means keep both result
|
||||||
- `+{ RULE }`, `-{ RULE }`, `{ RULE }` are similar, but `{ RULE }` may generate more than two results, meanwhile others only generate one result.
|
* `+{ RULE }`, `-{ RULE }`, `{ RULE }` are similar, but `{ RULE }` may generate more than two results, meanwhile others only generate one result.
|
||||||
- Being able to change token subset during parsing.
|
* Being able to change token subset during parsing.
|
||||||
- Being able to specify a error message when a certain action fails.
|
* Being able to specify a error message when a certain action fails.
|
||||||
- Generate SAX-like parser, with a default handler to create AST.
|
* Generate SAX-like parser, with a default handler to create AST.
|
||||||
|
* Generate each **POOLED** tuple struct type for
|
||||||
|
* Loop body. Delimitered list is considered as [ITEM {DELIMITER ITEM}]
|
||||||
|
* Loop records a pointer to the reversed linked list of the last item during calculation
|
||||||
|
* Loop records an array of items as the result
|
||||||
|
* Alternative as `Union<Ts...>` storing `{TYPE-FLAG, ITEM*}` (value type)
|
||||||
|
* Optional as `Optional<T>` storing `{ITEM*}` (value type)
|
||||||
|
* Sequencial as `{A, B, C ...}` with generated field names (value type)
|
||||||
|
* Type is rule or rule fragment, not the result AST type
|
||||||
|
* If there are multiple fields in same type, appended with an index of the position in the rule (optionals, alternatives and loops are packed as one)
|
||||||
|
* If a tuple is created directly from a rule, there will be a static field to indicate which rule does it come from
|
||||||
|
* Rule reference as `Reference<Ts...>` storing `{RULE, FRAGMENT, TYPE-FLAG, ITEM*}`
|
||||||
|
* A `Reference<Ts...>` are aliased
|
||||||
|
* Consider about forward declarations
|
||||||
|
* All types have an un-templated partner so that the core SAX-like instruction execution doesn't need to know concrete types
|
||||||
|
|
||||||
## Supported EBNF
|
### Supported EBNF
|
||||||
|
|
||||||
- TOKEN [`:` PROPERTY-NAME]
|
* TOKEN [`:` PROPERTY-NAME]
|
||||||
- RULE [`:` PROPERTY-NAME]
|
* RULE [`:` PROPERTY-NAME]
|
||||||
- Optional:
|
* Optional:
|
||||||
- `+[` EBNF `]`
|
* `+[` EBNF `]`
|
||||||
- `-[` EBNF `]`
|
* `-[` EBNF `]`
|
||||||
- `[` EBNF `]`
|
* `[` EBNF `]`
|
||||||
- Loop:
|
* Loop:
|
||||||
- `+{` EBNF `}`
|
* `+{` EBNF `}`
|
||||||
- `-{` EBNF `}`
|
* `-{` EBNF `}`
|
||||||
- `{` EBNF `}`
|
* `{` EBNF `}`
|
||||||
- `with{` PROPERTY-ASSIGNMENT ... `}`
|
* `with{` PROPERTY-ASSIGNMENT ... `}`
|
||||||
|
|
||||||
## EBNF Program
|
### EBNF Program
|
||||||
|
|
||||||
- RULE {`::=` CLAUSE `as` CLASS-NAME} `;`
|
* RULE {`::=` CLAUSE `as` CLASS-NAME} `;`
|
||||||
- Consider a syntax here to switch token set
|
* Consider a syntax here to switch token set
|
||||||
|
|
||||||
## ToString Algorithm Requirements
|
### ToString Algorithm Requirements
|
||||||
- Every clause should create an AST node. `EXP ::= '(' !EXP ')'` is not allowed, except that this clause has only one node.
|
|
||||||
- Every rule-name node should be assigned to a property. Token nodes are optional but those properties will be auto-generated.
|
|
||||||
- Loops cannot be embedded in another loop. It doesn't limit the syntax, but it limit the shape of AST.
|
|
||||||
|
|
||||||
## Development Project Structure
|
* Every clause should create an AST node. `EXP ::= '(' !EXP ')'` is not allowed, except that this clause has only one node.
|
||||||
- Original ParserGen code will be separated from Vlpp.
|
* Every rule-name node should be assigned to a property. Token nodes are optional but those properties will be auto-generated.
|
||||||
- **Development Steps**
|
* Loops cannot be embedded in another loop. It doesn't limit the syntax, but it limit the shape of AST.
|
||||||
|
|
||||||
|
### Development Project Structure
|
||||||
|
|
||||||
|
* Original ParserGen code will be separated from Vlpp.
|
||||||
|
* **Development Steps**
|
||||||
1. Symbols for ParserGen AST
|
1. Symbols for ParserGen AST
|
||||||
2. Manually: symbols -> `ParserGen AST described in C++`
|
2. Manually: symbols -> `ParserGen AST declaration in C++`
|
||||||
3. Manually: ParserGen Syntax described in `ParserGen AST described in C++` -> `ParserGen Parser described in C++`
|
3. Manually: ParserGen Syntax described in `ParserGen AST declaration in C++` -> `ParserGen Parser declaration in C++`
|
||||||
4. Integrate
|
4. Integrate
|
||||||
- **AstGen**:
|
* **TODO**: Reorganize unit test projects to pure unit test and code generation steps
|
||||||
- **Goal**: given symbols and generated C++ code for AST
|
* Code generation steps are also multiple projects
|
||||||
- **Produce** (from unit test):
|
* Because there are projects and the partner unit test that rely on generated code from depended projects
|
||||||
- Generated source code (AST part): declaration, visitors, builder, reflection for ParserGen input
|
* **AstGen**:
|
||||||
- AST symbols and C++ code generation.
|
* **Goal**: given symbols and generated C++ code for AST
|
||||||
- Generate visitors.
|
* **Produce** (from unit test):
|
||||||
- Generate easy builder.
|
* Generated source code (AST part): declaration, visitors, builder, reflection for ParserGen input
|
||||||
- Generate reflection.
|
* AST symbols and C++ code generation.
|
||||||
- **Execution**:
|
* Generate visitors.
|
||||||
- **Goal**: given instructions and parse input text with SAX-like callback
|
* Generate easy builder.
|
||||||
- Parser-generated instructions serialization.
|
* Generate reflection.
|
||||||
- Execute instructions as a SAX-like parser, with notification on ambiguous node, error message generation and error recovering.
|
* **Execution**:
|
||||||
- **Compiler** -> **AstGen**, **Execution**
|
* **Goal**: given instructions and parse input text with SAX-like callback
|
||||||
- **Goal**: input described using `Generated source code (AST part)` and generate instructions (text parser)
|
* An instruction could generate multiple continuations
|
||||||
- **Produce** (from unit test)
|
* Parser-generated instructions serialization.
|
||||||
- Generated C++ source code (parser part) for ParserGen input
|
* Execute instructions as a SAX-like parser, with notification on ambiguous node, error message generation and error recovering.
|
||||||
- Take the `ParserGen AST declaration` and generate instructions.
|
* If there is ambiguity, different callbacks could be called on the same position, and results could be discarded in the future execution.
|
||||||
- Generate the default handler to create AST for the SAX-like parser.
|
* **Compiler** -> **AstGen**, **Execution**
|
||||||
- ToString algorithm.
|
* **Goal**: input described using `Generated source code (AST part)` and generate instructions (text parser)
|
||||||
- Bidirection binding with AST the text.
|
* **Produce** (from unit test)
|
||||||
- **ParserGen** -> **Compiler**
|
* Generated C++ source code (parser part) for ParserGen input
|
||||||
- **Goal**: CLI Tool
|
* Take the `ParserGen AST declaration` and generate instructions.
|
||||||
- Integrate `Generated source code (AST part)`
|
* Generate the default handler to create AST for the SAX-like parser.
|
||||||
- Integrate `Generated source code (parser part)`
|
* ToString algorithm.
|
||||||
- Handle command line arguments
|
* Bidirection binding with AST the text.
|
||||||
- **UnitTestAst**:
|
* **ParserGen** -> **Compiler**
|
||||||
- Unit test of **AstGen** building block and pool allocation etc.
|
* **Goal**: CLI Tool
|
||||||
- **Produces** steps
|
* Integrate `Generated source code (AST part)`
|
||||||
- Hand-written `AST for ParserGen` symbols.
|
* Integrate `Generated source code (parser part)`
|
||||||
- Codegen symbols and get `Generated source code (AST part)` for ParserGen input
|
* Handle command line arguments
|
||||||
- **UnitTestExecution**:
|
* **UnitTestAst**:
|
||||||
- Unit test of **Execution**.
|
* Unit test of **AstGen** building block and pool allocation etc.
|
||||||
- Assert directly on SAX-like parser.
|
* **Produces** steps
|
||||||
- **UnitTestCompiler**:
|
* Hand-written `AST for ParserGen` symbols.
|
||||||
- Unit test of **Compiler**, input are all **UnitTestExecution** test cases rewritten using the generated easy-builder for `AST for ParserGen` AST.
|
* Codegen symbols and get `Generated source code (AST part)` for ParserGen input
|
||||||
- Assert on the ToString-ed AST. (shared)
|
* **UnitTestExecution**:
|
||||||
- **Produces** steps
|
* Unit test of **Execution**.
|
||||||
- Implementa ParserGen AST Input syntax using generated easy builder form `Generated source code (AST part)`
|
* Assert directly on SAX-like parser.
|
||||||
- Serialize instructions and get `C++ source code (parser part)` for ParserGen input
|
* **UnitTestCompiler**:
|
||||||
- **UnitTestParserGen**:
|
* Unit test of **Compiler**, input are all **UnitTestExecution** test cases rewritten using the generated easy-builder for `AST for ParserGen` AST.
|
||||||
- Unit test of **ParserGen**, input are all **UnitTestExecution** test cases rewritten in text format.
|
* Assert on the ToString-ed AST. (shared)
|
||||||
- Assert on the ToString-ed AST. (shared)
|
* **Produces** steps
|
||||||
- Generate all parser in text format to C++ code
|
* Implementa ParserGen AST Input syntax using generated easy builder form `Generated source code (AST part)`
|
||||||
- **UnitTest**:
|
* Serialize instructions and get `C++ source code (parser part)` for ParserGen input
|
||||||
- Link all cpp files in all other unit test projects so that all test cases can be run in one F5.
|
* **UnitTestParserGen**:
|
||||||
- Test all generated parsers in **UnitTestParserGen**.
|
* Unit test of **ParserGen**, input are all **UnitTestExecution** test cases rewritten in text format.
|
||||||
- Assert on the ToString-ed AST. (shared)
|
* Assert on the ToString-ed AST. (shared)
|
||||||
- Since parser are written in different ways for different unit test projects, they are stored separately from unit test projects to share necessary files.
|
* Generate all parser in text format to C++ code
|
||||||
- Do not really write a file if the generated content doesn't change.
|
* **UnitTest**:
|
||||||
|
* Link all cpp files in all other unit test projects so that all test cases can be run in one F5.
|
||||||
|
* Test all generated parsers in **UnitTestParserGen**.
|
||||||
|
* Assert on the ToString-ed AST. (shared)
|
||||||
|
* Since parser are written in different ways for different unit test projects, they are stored separately from unit test projects to share necessary files.
|
||||||
|
* Do not really write a file if the generated content doesn't change.
|
||||||
|
|||||||
Reference in New Issue
Block a user