22. Regular expression library

The experimental REGEX module implement regular expression parser and pattern matching functionality.

Currently its in very early stage and implements only very few basic regex operations.

All functions and symbols are in “regex” module, use require to get access to it.

require daslib/regex

22.1. Type aliases

CharSet = uint[8]

Character set for regex.

ReGenRandom = iterator<uint>

Regex generator random iterator.

variant MaybeReNode

Regex node or nothing.

Variants
  • value : ReNode ? - Node.

  • nothing : void? - Nothing.

22.2. Enumerations

ReOp

Type of regular expression operation.

Values
  • Char = 0 - Matching a character

  • Set = 1 - Matching a character set

  • Any = 2 - Matches any character

  • Eos = 3 - Matches end of string

  • Group = 4 - Matching a group

  • Plus = 5 - Repetition: one or more

  • Star = 6 - Repetition: zero or more

  • Question = 7 - Repetition: zero or one

  • Concat = 8 - First followed by second

  • Union = 9 - Either first or second

22.3. Structures

ReNode

Regular expression node.

Fields
  • op : ReOp - Regex operation

  • id : int - Unique node identifier

  • fun2 : function<(regex: Regex ;node: ReNode ?;str:uint8?):uint8?> - Matchig function

  • gen2 : function<(node: ReNode ?;rnd: ReGenRandom ;str: StringBuilderWriter ):void> - Generator function

  • at : range - Source range

  • text : string - Text fragment

  • textLen : int - Length of text fragment

  • all : array< ReNode ?> - All child nodes

  • left : ReNode ? - Left child node

  • right : ReNode ? - Right child node

  • subexpr : ReNode ? - Subexpression node

  • next : ReNode ? - Next node in the list

  • cset : CharSet - Character set for character class matching

  • index : int - Index for character class matching

  • tail : uint8? - Tail of the string

Regex

Regular expression structure.

Fields
  • root : ReNode ? - Root node of the regex.

  • match : uint8? - Original source text.

  • groups : array<tuple<range;string>> - Captured groups.

  • earlyOut : CharSet - Character set for early out optimization.

  • canEarlyOut : bool - Whether early out optimization is enabled.

22.4. Compilation and validation

visit_top_down(node: ReNode?; blk: block<(var n:ReNode?):void>)

Visitor for regex nodes in top-down manner.

Arguments
is_valid(re: Regex) : bool()

Whether the regex is valid.

Arguments
regex_compile(re: Regex; expr: string) : bool()

Precompile a regular expression.

Arguments
  • re : Regex

  • expr : string

regex_compile(expr: string) : Regex()

Precompiles regular expression.

Arguments
  • expr : string

regex_compile(re: Regex) : Regex()

Precompile a regular expression.

Arguments
regex_debug(regex: Regex)

Debugs regular expression by printing its structure.

Arguments
debug_set(cset: CharSet)

Debugs character set by printing all characters it contains.

Arguments

22.5. Access

regex_group(regex: Regex; index: int; match: string) : string()

Returns the substring matched by the specified regex group.

Arguments
  • regex : Regex

  • index : int

  • match : string

regex_foreach(regex: Regex; str: string; blk: block<(at:range):bool>)

Iterate over all matches of a regex in a string.

Arguments
  • regex : Regex

  • str : string

  • blk : block<(at:range):bool>

22.6. Match & replace

regex_match(regex: Regex; str: string; offset: int = 0) : int()

Matches a regular expression against a string and returns the position of the match.

Arguments
  • regex : Regex

  • str : string

  • offset : int

regex_replace(regex: Regex; str: string; blk: block<(at:string):string>) : string()

Replaces substrings matched by the regex with the result of the provided block.

Arguments
  • regex : Regex

  • str : string

  • blk : block<(at:string):string>

22.7. Generation

re_gen_get_rep_limit() : uint()

Limit of repetitions for regex quantifiers.

re_gen(re: Regex; rnd: ReGenRandom) : string()

Generate a random string matching the regex.

Arguments