7.10.1. PEG-01 — Hello Parser
This tutorial introduces the peg module — daslang’s built-in PEG
(Parsing Expression Grammar) parser generator. You will learn:
The
parsemacro and grammar structureLiteral string matching
The
EOFterminalText extraction with
"{rule}" as nameHandling parse results and errors
Character sets with
set()
7.10.1.1. Setup
Import the PEG module:
require peg/peg
All parsers are defined inside a parse(input) { ... } macro block.
The macro generates a packrat parser at compile time — no runtime code
generation or external tools needed.
7.10.1.2. Your First Parser
A grammar is a set of rules. The first var declaration is the
entry rule — the parser starts matching there. Each rule(...)
defines an alternative tried in order:
def parse_hello(input : string;
blk : block<(val : bool; err : array<ParsingError>) : void>) {
parse(input) {
var greeting : bool
rule("Hello", EOF) {
return true
}
}
}
"Hello" matches the literal string. EOF requires end-of-input.
The block after rule runs when the alternative succeeds — its
return value becomes the parse result.
Call the parser with a callback that receives the value and an error array:
parse_hello("Hello") $(val; err) {
print("matched: {val}\n") // matched: true
}
7.10.1.3. Multiple Alternatives (Ordered Choice)
PEG parsers try alternatives in order — the first match wins. This eliminates ambiguity:
def parse_greeting(input : string;
blk : block<(val : string; err : array<ParsingError>) : void>) {
parse(input) {
var greeting : string
rule("Hello", EOF) { return "hello" }
rule("Hi", EOF) { return "hi" }
rule("Hey", EOF) { return "hey" }
}
}
7.10.1.4. Text Extraction
Wrap a rule reference in "{...}" and bind it with as to extract
the matched text. +rule means “one or more” repetitions:
def parse_name(input : string;
blk : block<(val : string; err : array<ParsingError>) : void>) {
parse(input) {
var greeting : string
rule("Hello, ", "{+letter}" as name, "!", EOF) {
return name
}
var letter : void?
rule(set('a'..'z', 'A'..'Z')) {
return null
}
}
}
Here letter is a helper rule using set() to match a character
range. Its return type void? marks it as a pattern-only rule (no
value produced). The "{+letter}" as name captures all matched text
into the name variable.
7.10.1.5. Character Sets
set() matches a single character from one or more ranges or
individual characters:
set('a'..'z', 'A'..'Z') // letters
set('0'..'9') // digits
set('a'..'z', '0'..'9', '_') // identifier chars
7.10.1.6. Multiple Rules and WS
Grammars can have any number of rules. WS is a built-in terminal
that matches zero or more whitespace characters:
parse(input) {
var entry : string
rule("{+alpha}" as key, WS, "=", WS, string_ as value, EOF) {
return "{key}: {value}"
}
var alpha : void?
rule(set('a'..'z', 'A'..'Z', '_')) {
return null
}
}
string_ is a built-in terminal that matches a double-quoted string
and returns its content (without quotes).
7.10.1.7. Error Handling
The parse callback receives both the result and an array<ParsingError>.
On failure, the result is the default value for its type:
parse_name("Hello, 123!") $(val; err) {
if (!empty(err)) {
for (e in err) {
print("{e.text} (at position {e.index})\n")
}
}
}
Each ParsingError has a text description and an index
position in the input string.
7.10.1.8. Quick Reference
Element |
Description |
|---|---|
|
Define a grammar and run it on |
|
Declare a rule with the given return type |
|
One alternative for the enclosing rule |
|
Match exact text |
|
Match end of input |
|
Match zero or more whitespace characters |
|
Match one character from ranges |
|
Extract matched text into |
|
One or more repetitions |
|
Match a double-quoted string literal |
|
Match a decimal integer |
See also
Full source: tutorials/dasPEG/01_hello_parser.das
Next tutorial: PEG-02 — Arithmetic Calculator