7.7. PEG parser generator

The PEG module is a parser generator based on Parsing Expression Grammars. Define grammars directly in daslang using the parse macro — the compiler generates a packrat parser at compile time. No external tools, no runtime code generation.

See PEG-01 — Hello Parser for a hands-on introduction.

7.7.1. Grammar basics

A grammar lives inside a parse(input) { ... } block. The first var declaration is the entry rule. Each rule(...) adds an ordered alternative:

def my_parser(input : string; blk : block<(val : int; err : array<ParsingError>) : void>) {
    parse(input) {
        var expr : int
        rule(number as n, WS, "+", WS, number as m, EOF) {
            return n + m
        }
    }
}

Rules try alternatives top to bottom. The first match wins (ordered choice — no ambiguity).

7.7.2. Built-in terminals

Terminal

Description

"literal"

Exact text match

EOF

End of input

EOL

End of line (\n or \r\n)

WS

Zero or more whitespace (including newlines)

TS

Zero or more tabs/spaces (no newlines)

any

Any single character

number

Decimal integer (returns int)

string_

Double-quoted string literal (returns content)

double_

Floating-point number (returns double)

set(ranges...)

Character from ranges: set('a'..'z', '0'..'9')

not_set(chars...)

Any character NOT in the set

7.7.3. Operators and combinators

Syntax

Description

rule as name

Bind the rule result to name

"{rule}" as name

Extract matched text as string

+rule

One or more repetitions (array)

*rule

Zero or more repetitions (array)

MB(rule)

Optional (zero or one)

!rule

Negative lookahead (no input consumed)

PEEK(rule)

Positive lookahead (no input consumed)

commit

Cut — no backtracking past this point

log("msg")

Print debug message during parsing

7.7.4. Options

Add inside the parse block:

  • option(tracing) — print rule-by-rule execution trace

  • option(color) — ANSI colors in trace output

  • option(print_generated) — show generated parser code at compile time

7.7.5. Return types

Rules can return any daslang type: scalars, strings, structs, variants, tuples, tables, and arrays. Use void? for pattern-only rules that match characters without producing a value.

7.7.6. Error handling

parse calls a callback $(val; err) where err is an array<ParsingError>. Place commit after unambiguous prefixes to get meaningful error messages with position information.

7.7.7. Left recursion

dasPEG supports left-recursive rules via packrat memoization. This is the standard technique for encoding left-associative operators (e.g. a + b + c).

7.7.8. Performance

  • Packrat memoization caches results per rule per position — O(n) parsing

  • Place common alternatives first (PEG tries in order)

  • Use PEEK to quickly reject impossible alternatives

  • Use commit after unambiguous prefixes for speed and error quality

  • Set options stack = 1000000 — PEG parsers need more stack than default

All functions and symbols are in “peg” module, use require to get access to it.

require peg/peg

Example:

require peg/peg

    def parse_greeting(input : string;
                       blk : block<(val : string; err : array<ParsingError>) : void>) {
        parse(input) {
            var greeting : string
            rule("Hello, ", "{+letter}" as name, "!", EOF) {
                return name
            }
            var letter : void?
            rule(set('a'..'z', 'A'..'Z')) {
                return null
            }
        }
    }

    [export]
    def main() {
        parse_greeting("Hello, World!") $(val; err) {
            print("name = {val}\n")
        }
    }
    // output:
    // name = World

7.7.8.1. Structures

ParsingError

struct ParsingError

7.7.8.2. Matching primitives

get_current_char(parser: auto): int

def get_current_char (var parser: auto) : int

Arguments:
  • parser : auto

matches(parser: auto; tmpl: string; strlen: int): bool

def matches (var parser: auto; tmpl: string; strlen: int) : bool

Arguments:
  • parser : auto

  • tmpl : string

  • strlen : int

move(parser: auto; offset: auto): auto

def move (var parser: auto; offset: auto) : auto

Arguments:
  • parser : auto

  • offset : auto

reached_EOF(parser: auto): bool

def reached_EOF (var parser: auto) : bool

Arguments:
  • parser : auto

reached_EOL(parser: auto): bool

def reached_EOL (var parser: auto) : bool

Arguments:
  • parser : auto

7.7.8.3. Whitespace

skip_taborspace(parser: auto): auto

def skip_taborspace (var parser: auto) : auto

Arguments:
  • parser : auto

skip_whitespace(parser: auto): auto

def skip_whitespace (var parser: auto) : auto

Arguments:
  • parser : auto

7.7.8.4. Literal matching

match_decimal_literal(parser: auto): tuple<success:bool;value:int;endpos:int>

Simple lexing of decimal integers, doesn’t check for overflow

Arguments:
  • parser : auto

match_double_literal(parser: auto): tuple<success:bool;double>

Matches doubles in the form of [-+]? [0-9]* .? [0-9]+ ([eE] [-+]? [0-9]+)? The number is not checked to be representable as defined in IEEE-754

Arguments:
  • parser : auto

match_string_literal(parser: auto): tuple<success:bool;string>

Tries to match everything inside “”

Arguments:
  • parser : auto

7.7.8.5. Tracing and logging

log_fail(parser: auto; message: auto): auto

def log_fail (var parser: auto; var message: auto) : auto

Arguments:
  • parser : auto

  • message : auto

log_info(parser: auto; message: auto): auto

def log_info (var parser: auto; var message: auto) : auto

Arguments:
  • parser : auto

  • message : auto

log_plain(parser: auto; message: auto): auto

def log_plain (var parser: auto; var message: auto) : auto

Arguments:
  • parser : auto

  • message : auto

log_success(parser: auto; message: auto): auto

def log_success (var parser: auto; var message: auto) : auto

Arguments:
  • parser : auto

  • message : auto

tabulate(parser: auto): auto

def tabulate (var parser: auto) : auto

Arguments:
  • parser : auto