7.10.1. PEG-01 — Hello Parser

This tutorial introduces the peg module — daslang’s built-in PEG (Parsing Expression Grammar) parser generator. You will learn:

  • The parse macro and grammar structure

  • Literal string matching

  • The EOF terminal

  • Text extraction with "{rule}" as name

  • Handling parse results and errors

  • Character sets with set()

7.10.1.1. Setup

Import the PEG module:

require peg/peg

All parsers are defined inside a parse(input) { ... } macro block. The macro generates a packrat parser at compile time — no runtime code generation or external tools needed.

7.10.1.2. Your First Parser

A grammar is a set of rules. The first var declaration is the entry rule — the parser starts matching there. Each rule(...) defines an alternative tried in order:

def parse_hello(input : string;
                blk : block<(val : bool; err : array<ParsingError>) : void>) {
    parse(input) {
        var greeting : bool
        rule("Hello", EOF) {
            return true
        }
    }
}

"Hello" matches the literal string. EOF requires end-of-input. The block after rule runs when the alternative succeeds — its return value becomes the parse result.

Call the parser with a callback that receives the value and an error array:

parse_hello("Hello") $(val; err) {
    print("matched: {val}\n")   // matched: true
}

7.10.1.3. Multiple Alternatives (Ordered Choice)

PEG parsers try alternatives in order — the first match wins. This eliminates ambiguity:

def parse_greeting(input : string;
                   blk : block<(val : string; err : array<ParsingError>) : void>) {
    parse(input) {
        var greeting : string
        rule("Hello", EOF) { return "hello" }
        rule("Hi", EOF)    { return "hi" }
        rule("Hey", EOF)   { return "hey" }
    }
}

7.10.1.4. Text Extraction

Wrap a rule reference in "{...}" and bind it with as to extract the matched text. +rule means “one or more” repetitions:

def parse_name(input : string;
               blk : block<(val : string; err : array<ParsingError>) : void>) {
    parse(input) {
        var greeting : string
        rule("Hello, ", "{+letter}" as name, "!", EOF) {
            return name
        }

        var letter : void?
        rule(set('a'..'z', 'A'..'Z')) {
            return null
        }
    }
}

Here letter is a helper rule using set() to match a character range. Its return type void? marks it as a pattern-only rule (no value produced). The "{+letter}" as name captures all matched text into the name variable.

7.10.1.5. Character Sets

set() matches a single character from one or more ranges or individual characters:

set('a'..'z', 'A'..'Z')         // letters
set('0'..'9')                    // digits
set('a'..'z', '0'..'9', '_')    // identifier chars

7.10.1.6. Multiple Rules and WS

Grammars can have any number of rules. WS is a built-in terminal that matches zero or more whitespace characters:

parse(input) {
    var entry : string
    rule("{+alpha}" as key, WS, "=", WS, string_ as value, EOF) {
        return "{key}: {value}"
    }

    var alpha : void?
    rule(set('a'..'z', 'A'..'Z', '_')) {
        return null
    }
}

string_ is a built-in terminal that matches a double-quoted string and returns its content (without quotes).

7.10.1.7. Error Handling

The parse callback receives both the result and an array<ParsingError>. On failure, the result is the default value for its type:

parse_name("Hello, 123!") $(val; err) {
    if (!empty(err)) {
        for (e in err) {
            print("{e.text} (at position {e.index})\n")
        }
    }
}

Each ParsingError has a text description and an index position in the input string.

7.10.1.8. Quick Reference

Element

Description

parse(input) { ... }

Define a grammar and run it on input

var name : Type

Declare a rule with the given return type

rule(...) { ... }

One alternative for the enclosing rule

"literal"

Match exact text

EOF

Match end of input

WS

Match zero or more whitespace characters

set(ranges...)

Match one character from ranges

"{+rule}" as name

Extract matched text into name

+rule

One or more repetitions

string_

Match a double-quoted string literal

number

Match a decimal integer