7.3. XML parser (pugixml)

The PUGIXML module provides XML parsing, navigation, manipulation, and XPath query support built on top of the pugixml C++ library. It exposes document loading/saving, DOM-style node and attribute access, text content helpers, and a full XPath 1.0 evaluation engine.

Use PUGIXML_boost for high-level helpers such as RAII document handling, iterator-based traversal, builder EDSL, and struct ↔ XML serialization. The low-level C++ bindings live in this module.

All functions and symbols are in “pugixml” module, use require to get access to it.

require pugixml

See also:

7.3.1. Constants

parse_minimal = 0x0

Minimal parsing mode: only elements and PCDATA are parsed.

parse_pi = 0x1

Parse processing instructions (<?…?>).

parse_comments = 0x2

Parse comments (<!–…–>).

parse_cdata = 0x4

Parse CDATA sections (<![CDATA[…]]>).

parse_ws_pcdata = 0x8

Parse whitespace-only PCDATA nodes.

parse_escapes = 0x10

Parse character and entity references (&amp;, &#123;, etc.).

parse_eol = 0x20

Normalize line endings to n.

parse_wconv_attribute = 0x40

Normalize whitespace in attribute values (convert tabs/newlines to spaces).

parse_wnorm_attribute = 0x80

Normalize and collapse whitespace in attribute values.

parse_declaration = 0x100

Parse XML declarations (<?xml …?>).

parse_doctype = 0x200

Parse DOCTYPE declarations.

parse_ws_pcdata_single = 0x400

Parse whitespace-only PCDATA as a single node.

parse_trim_pcdata = 0x800

Trim leading and trailing whitespace from PCDATA.

parse_fragment = 0x1000

Parse as document fragment (allows multiple root elements).

parse_embed_pcdata = 0x2000

Embed PCDATA value in the element node instead of creating a child.

parse_merge_pcdata = 0x4000

Merge adjacent PCDATA nodes into one.

parse_default = 0x74

Default parsing flags: parse_cdata | parse_escapes | parse_wconv_attribute | parse_eol.

parse_full = 0x377

Full parsing: all possible constructs are parsed.

format_indent = 0x1

Indent the output nodes according to tree depth.

format_write_bom = 0x2

Write an encoding byte-order mark (BOM) at the start.

format_raw = 0x4

Raw output: no indentation or newlines.

format_no_declaration = 0x8

Omit the XML declaration (<?xml …?>) from output.

format_no_escapes = 0x10

Do not escape special characters in output.

format_save_file_text = 0x20

Use platform-native line endings when saving to file.

format_indent_attributes = 0x40

Indent attributes on separate lines.

format_no_empty_element_tags = 0x80

Always use <tag></tag> instead of <tag/> for empty elements.

format_skip_control_chars = 0x100

Skip control characters during serialization.

format_attribute_single_quote = 0x200

Use single quotes for attribute values.

format_default = 0x1

Default formatting flags: format_indent.

7.3.2. Enumerations

xml_encoding

Character encoding used for XML input/output operations.

Values:
  • encoding_auto = 0 - Auto-detect encoding from BOM or content.

  • encoding_utf8 = 1 - UTF-8 encoding.

  • encoding_utf16_le = 2 - UTF-16 little-endian encoding.

  • encoding_utf16_be = 3 - UTF-16 big-endian encoding.

  • encoding_utf16 = 4 - UTF-16 with native endianness.

  • encoding_utf32_le = 5 - UTF-32 little-endian encoding.

  • encoding_utf32_be = 6 - UTF-32 big-endian encoding.

  • encoding_utf32 = 7 - UTF-32 with native endianness.

  • encoding_wchar = 8 - System wchar_t encoding.

  • encoding_latin1 = 9 - Latin-1 (ISO 8859-1) encoding.

xml_node_type

DOM node type identifying the kind of XML node.

Values:
  • node_null = 0 - Empty (null) node handle.

  • node_document = 1 - Document tree root (not an XML element).

  • node_element = 2 - Element tag (e.g. <node>).

  • node_pcdata = 3 - Plain character data (text content).

  • node_cdata = 4 - Character data section (<![CDATA[…]]>).

  • node_comment = 5 - Comment node (<!– … –>).

  • node_pi = 6 - Processing instruction (<?name …?>).

  • node_declaration = 7 - XML declaration (<?xml …?>).

  • node_doctype = 8 - Document type declaration (<!DOCTYPE …>).

xml_parse_status

Parsing result status codes indicating success or the kind of error encountered.

Values:
  • status_ok = 0 - No error, parsing succeeded.

  • status_file_not_found = 1 - File was not found during load.

  • status_io_error = 2 - I/O error during read.

  • status_out_of_memory = 3 - Out of memory.

  • status_internal_error = 4 - Internal parser error.

  • status_unrecognized_tag = 5 - Unrecognized tag encountered.

  • status_bad_pi = 6 - Malformed processing instruction.

  • status_bad_comment = 7 - Malformed comment.

  • status_bad_cdata = 8 - Malformed CDATA section.

  • status_bad_doctype = 9 - Malformed document type declaration.

  • status_bad_pcdata = 10 - Malformed PCDATA section.

  • status_bad_start_element = 11 - Malformed start element tag.

  • status_bad_attribute = 12 - Malformed attribute.

  • status_bad_end_element = 13 - Malformed end element tag.

  • status_end_element_mismatch = 14 - Start/end element tag mismatch.

  • status_append_invalid_root = 15 - Cannot append nodes to the specified root.

  • status_no_document_element = 16 - No document element found.

xpath_value_type

XPath expression return type.

Values:
  • xpath_type_none = 0 - Unknown or no type.

  • xpath_type_node_set = 1 - Node set (collection of XML nodes).

  • xpath_type_number = 2 - Floating-point number.

  • xpath_type_string = 3 - Character string.

  • xpath_type_boolean = 4 - Boolean value.

7.3.3. Handled structures

xml_node
xml_node.name(): string

Returns the attribute name.

xml_node.value(): string

Returns the attribute value as a string.

xml_node._type(): xml_node_type

Returns the node type (element, pcdata, cdata, comment, etc.).

xml_node.empty(): bool

Returns true if the attribute handle is empty (null).

Properties:
  • name : string

  • value : string

  • _type : xml_node_type

  • empty : bool

Lightweight handle to a DOM node (element, text, comment, etc.). Does not own memory; the owning xml_document must outlive it. The tag name of this element node. The raw text value of this node. The node type (see xml_node_type enumeration). True if this handle is empty (not bound to any node).

xml_text
xml_text.empty(): bool

Returns true if the attribute handle is empty (null).

xml_text.get(): string

Returns the text content as a raw string.

Properties:
  • empty : bool

  • get : string

Accessor for the text content of an element node. Provides typed read and write operations. True if this text object is empty (the element has no text content). The text content as a string.

xpath_node

A single result from an XPath query. Wraps either an xml_node or an xml_attribute together with its parent.

xml_attribute
xml_attribute.name(): string

Returns the attribute name.

xml_attribute.value(): string

Returns the attribute value as a string.

xml_attribute.empty(): bool

Returns true if the attribute handle is empty (null).

Properties:
  • name : string

  • value : string

  • empty : bool

Handle to a single XML attribute (name-value pair) on an element node. The attribute name. The attribute value as a string. True if this handle is empty (not bound to any attribute).

xpath_variable_set

A set of named XPath variables (bool, number, string) that can be bound to a compiled xpath_query.

xml_document

Owns the entire DOM tree. Must stay alive while any xml_node or xml_attribute obtained from it is in use.

xpath_node_set

An ordered collection of xpath_node results returned by XPath evaluation. Supports indexed access and sorting.

xpath_query

A compiled XPath 1.0 expression. Compile once and evaluate many times for efficiency.

xml_parse_result
xml_parse_result.description(): string

Returns a human-readable description of the parse result status.

Properties:
  • description : string

Result of a parsing operation. Contains the status code, encoding, and character offset of the first error. The parsing status code (see xml_parse_status enumeration).

Fields:
  • status : xml_parse_status - Character offset in the source where the error occurred (0 on success).

  • offset : int64 - The detected or specified document encoding.

  • encoding : xml_encoding - Human-readable description of the parsing result.

7.3.4. Document operations

document_as_node(document: xml_document?): xml_node

Returns the document as an xml_node, allowing direct node operations on the document root.

Arguments:
load_document(doc: xml_document?; filename: string; result: xml_parse_result): bool

Loads an XML document from a file. Populates result with parse status and error offset.

Arguments:
load_string(doc: xml_document?; content: string; result: xml_parse_result; options: uint): bool

Parses an XML string into the document. Uses the given parse options flags.

Arguments:
reset(document: xml_document?)

Resets the document, removing all nodes and freeing memory.

Arguments:
save_file(document: xml_document const?; filename: string; indent: string; flags: uint; encoding: xml_encoding): bool

Saves the document to a file with the specified indentation, flags, and encoding.

Arguments:
  • document : xml_document? implicit

  • filename : string implicit

  • indent : string implicit

  • flags : uint

  • encoding : xml_encoding

save_string(document: xml_document const?; indent: string; flags: uint; encoding: xml_encoding): string

Serializes the entire document to a string with the specified formatting.

Arguments:
xml_document const?.document_element(document: xml_document const?): xml_node

Returns the root element of the document (i.e. the outermost element).

Arguments:

7.3.5. Node lookup

7.3.5.1. append_child

append_child(node: xml_node; type: xml_node_type): xml_node

Appends a new child node of the given type (e.g. element, pcdata, comment) to the node.

Arguments:
append_child(node: xml_node; name: string): xml_node

child(node: xml_node; name: string): xml_node

Returns the first child element with the given name, or an empty node handle if not found.

Arguments:
  • node : xml_node implicit

  • name : string implicit

7.3.5.2. find_child_by_attribute

find_child_by_attribute(node: xml_node; element_name: string; attr_name: string; attr_value: string): xml_node

Finds the first child element that has an attribute matching the given name and value.

Arguments:
  • node : xml_node implicit

  • element_name : string implicit

  • attr_name : string implicit

  • attr_value : string implicit

find_child_by_attribute(node: xml_node; attr_name: string; attr_value: string): xml_node

first_element_by_path(node: xml_node; path: string): xml_node

Navigates a slash-separated element path (e.g. a/b/c) and returns the target node.

Arguments:
  • node : xml_node implicit

  • path : string implicit

7.3.5.3. prepend_child

prepend_child(node: xml_node; name: string): xml_node

Prepends a new child element with the given name to the node.

Arguments:
  • node : xml_node implicit

  • name : string implicit

prepend_child(node: xml_node; type: xml_node_type): xml_node

7.3.5.4. remove_child

remove_child(node: xml_node; child: xml_node): bool

Removes the given child node from this node.

Arguments:
remove_child(node: xml_node; name: string): bool

xml_node.first_child(node: xml_node): xml_node

Returns the first child node of this element.

Arguments:
xml_node.last_child(node: xml_node): xml_node

Returns the last child node of this element.

Arguments:

7.3.6. Node navigation

next_sibling(node: xml_node; name: string): xml_node

Returns the next sibling element with the given name, or an empty node if not found.

Arguments:
  • node : xml_node implicit

  • name : string implicit

path(node: xml_node; delimiter: string): string

Returns the absolute path of the node from the document root, using the given delimiter.

Arguments:
  • node : xml_node implicit

  • delimiter : string implicit

previous_sibling(node: xml_node; name: string): xml_node

Returns the previous sibling element with the given name, or an empty node if not found.

Arguments:
  • node : xml_node implicit

  • name : string implicit

print_to_string(node: xml_node; indent: string; flags: uint; encoding: xml_encoding): string

Serializes the node (and its subtree) to an XML string with the specified formatting.

Arguments:
xml_node.next_sibling(node: xml_node): xml_node

Returns the next sibling node in document order.

Arguments:
xml_node.parent(node: xml_node): xml_node

Returns the parent node of this element.

Arguments:
xml_node.previous_sibling(node: xml_node): xml_node

Returns the previous sibling node in document order.

Arguments:
xml_node.root(node: xml_node): xml_node

Returns the root node of the document this node belongs to.

Arguments:
xpath_node.parent(xpath_node: xpath_node): xml_node

Returns the parent element of this XPath result node.

Arguments:

7.3.7. Node mutation

insert_child_after(node: xml_node; name: string; after: xml_node): xml_node

Inserts a new child element with the given name after the specified sibling.

Arguments:
insert_child_before(node: xml_node; name: string; before: xml_node): xml_node

Inserts a new child element with the given name before the specified sibling.

Arguments:
remove_children(node: xml_node): bool

Removes all child nodes from this node.

Arguments:

7.3.7.1. set_name

set_name(node: xml_node; name: string): bool

Changes the name (tag) of the node or attribute.

Arguments:
  • node : xml_node implicit

  • name : string implicit

set_name(attribute: xml_attribute; name: string): bool

7.3.7.2. set_value

set_value(attribute: xml_attribute; value: string): bool

Sets the value of the node or attribute. Accepts string, int, uint, float, double, or bool.

Arguments:
set_value(attribute: xml_attribute; value: int): bool
set_value(node: xml_node; value: string): bool
set_value(attribute: xml_attribute; value: float): bool
set_value(attribute: xml_attribute; value: double): bool
set_value(attribute: xml_attribute; value: uint): bool
set_value(attribute: xml_attribute; value: bool): bool

7.3.8. Attribute access

append_attribute(node: xml_node; name: string): xml_attribute

Adds a new attribute with the given name at the end of the node’s attribute list.

Arguments:
  • node : xml_node implicit

  • name : string implicit

attribute(node: xml_node; name: string): xml_attribute

Returns the attribute with the given name, or an empty attribute handle if not found.

Arguments:
  • node : xml_node implicit

  • name : string implicit

insert_attribute_after(node: xml_node; name: string; after: xml_attribute): xml_attribute

Inserts a new attribute with the given name after the specified attribute.

Arguments:
insert_attribute_before(node: xml_node; name: string; before: xml_attribute): xml_attribute

Inserts a new attribute with the given name before the specified attribute.

Arguments:
prepend_attribute(node: xml_node; name: string): xml_attribute

Adds a new attribute with the given name at the beginning of the node’s attribute list.

Arguments:
  • node : xml_node implicit

  • name : string implicit

7.3.8.1. remove_attribute

remove_attribute(node: xml_node; attribute: xml_attribute): bool

Removes the specified attribute handle from the node. Returns true if the attribute was found and removed.

Arguments:
remove_attribute(node: xml_node; name: string): bool

remove_attributes(node: xml_node): bool

Removes all attributes from the node.

Arguments:
xml_attribute.next_attribute(attribute: xml_attribute): xml_attribute

Returns the next attribute in the element’s attribute list.

Arguments:
xml_attribute.previous_attribute(attribute: xml_attribute): xml_attribute

Returns the previous attribute in the element’s attribute list.

Arguments:
xml_node.first_attribute(node: xml_node): xml_attribute

Returns the first attribute of this element node.

Arguments:
xml_node.last_attribute(node: xml_node): xml_attribute

Returns the last attribute of this element node.

Arguments:
xpath_node.attribute(xpath_node: xpath_node): xml_attribute

Returns the attribute associated with this XPath result node, if any.

Arguments:

7.3.9. Copy and move

7.3.9.1. append_copy

append_copy(node: xml_node; proto: xml_attribute): xml_attribute

Appends a deep copy of the given attribute or node as the last child.

Arguments:
append_copy(node: xml_node; proto: xml_node): xml_node

append_move(node: xml_node; moved: xml_node): xml_node

Moves the given node to become the last child of this node.

Arguments:

7.3.9.2. prepend_copy

prepend_copy(node: xml_node; proto: xml_attribute): xml_attribute

Prepends a deep copy of the given attribute or node as the first child.

Arguments:
prepend_copy(node: xml_node; proto: xml_node): xml_node

prepend_move(node: xml_node; moved: xml_node): xml_node

Moves the given node to become the first child of this node.

Arguments:

7.3.10. Value reading

7.3.10.1. as_bool

as_bool(text: xml_text; default_value: bool): bool

Returns the text content as a bool, or default_value if conversion fails.

Arguments:
  • text : xml_text implicit

  • default_value : bool

as_bool(attribute: xml_attribute; default_value: bool): bool

7.3.10.2. as_double

as_double(text: xml_text; default_value: double): double

Returns the text content as a double, or default_value if conversion fails.

Arguments:
  • text : xml_text implicit

  • default_value : double

as_double(attribute: xml_attribute; default_value: double): double

7.3.10.3. as_float

as_float(text: xml_text; default_value: float): float

Returns the attribute or text value as a float, or default_value if conversion fails.

Arguments:
  • text : xml_text implicit

  • default_value : float

as_float(attribute: xml_attribute; default_value: float): float

7.3.10.4. as_int

as_int(attribute: xml_attribute; default_value: int): int

Returns the attribute value as an int, or default_value if conversion fails.

Arguments:
as_int(text: xml_text; default_value: int): int

as_int64(text: xml_text; default_value: int64): int64

Returns the text value as a 64-bit signed integer, or default_value if conversion fails.

Arguments:
  • text : xml_text implicit

  • default_value : int64

7.3.10.5. as_string

as_string(text: xml_text; default_value: string): string

Returns the attribute or text value as a string, or default_value if empty.

Arguments:
  • text : xml_text implicit

  • default_value : string implicit

as_string(attribute: xml_attribute; default_value: string): string

7.3.10.6. as_uint

as_uint(attribute: xml_attribute; default_value: uint): uint

Returns the attribute or text value as a uint, or default_value if conversion fails.

Arguments:
as_uint(text: xml_text; default_value: uint): uint

as_uint64(text: xml_text; default_value: uint64): uint64

Returns the text value as a 64-bit unsigned integer, or default_value if conversion fails.

Arguments:
  • text : xml_text implicit

  • default_value : uint64

child_value(node: xml_node; name: string): string

Returns the text content of the first child element with the given name.

Arguments:
  • node : xml_node implicit

  • name : string implicit

xml_node.child_value(node: xml_node): string

Returns the text content of the first PCDATA/CDATA child of this node.

Arguments:
xml_node.text(node: xml_node): xml_text

Returns an xml_text accessor for the text content of this element.

Arguments:
xml_text.data(text: xml_text): xml_node

Returns the data node that holds the actual character data for this xml_text.

Arguments:

7.3.11. Value writing

7.3.11.1. set

set(text: xml_text; value: int): bool

Sets the text content or XPath variable value. Multiple overloads accept string, int, uint, float, double, bool, int64, uint64.

Arguments:
set(text: xml_text; value: string): bool
set(text: xml_text; value: double): bool
set(text: xml_text; value: uint): bool
set(text: xml_text; value: int8): bool
set(text: xml_text; value: bool): bool
set(text: xml_text; value: uint8): bool
set(text: xml_text; value: float): bool
set(text: xml_text; value: int64): bool
set(text: xml_text; value: uint16): bool
set(text: xml_text; value: uint64): bool
set(variables: xpath_variable_set?; name: string; value: bool): bool
set(variables: xpath_variable_set?; name: string; value: double): bool
set(text: xml_text; value: int16): bool
set(variables: xpath_variable_set?; name: string; value: string): bool

7.3.12. XPath compilation and evaluation

evaluate_boolean(query: xpath_query const?; node: xml_node): bool

Evaluates the compiled XPath query against the given node and returns a bool result.

Arguments:
evaluate_node(query: xpath_query const?; node: xml_node): xpath_node

Evaluates the compiled XPath query and returns the first matching xpath_node.

Arguments:
evaluate_node_set(query: xpath_query const?; node: xml_node): xpath_node_set?

Evaluates the compiled XPath query and returns all matching nodes as an xpath_node_set.

Arguments:
evaluate_number(query: xpath_query const?; node: xml_node): double

Evaluates the compiled XPath query against the given node and returns a numeric result.

Arguments:
evaluate_string(query: xpath_query const?; node: xml_node): string

Evaluates the compiled XPath query against the given node and returns a string result.

Arguments:

7.3.12.1. xpath_compile

xpath_compile(query: string; variables: xpath_variable_set?): xpath_query?

Compiles an XPath expression string into an xpath_query. Optionally accepts an xpath_variable_set for parameterized queries.

Arguments:
xpath_compile(query: string): xpath_query?

xpath_query const?.result_description(query: xpath_query const?): string

Returns a human-readable error description if the XPath query failed to compile.

Arguments:
xpath_query const?.result_offset(query: xpath_query const?): int

Returns the character offset in the query string where the compilation error occurred.

Arguments:
xpath_query const?.return_type(query: xpath_query const?): xpath_value_type

Returns the XPath result type (node_set, number, string, or boolean).

Arguments:

7.3.13. XPath selection

7.3.13.1. select_node

select_node(node: xml_node; query: xpath_query const?): xpath_node

Selects the first node matching the XPath query string or compiled query.

Arguments:
select_node(node: xml_node; query: string): xpath_node

7.3.13.2. select_nodes

select_nodes(node: xml_node; query: xpath_query const?): xpath_node_set?

Selects all nodes matching the XPath query and returns them as an xpath_node_set.

Arguments:
select_nodes(node: xml_node; query: string): xpath_node_set?

xpath_node.node(xpath_node: xpath_node): xml_node

Returns the xml_node associated with this XPath result, if any.

Arguments:

7.3.14. XPath node set operations

at(set: xpath_node_set const?; index: int): xpath_node

Returns the xpath_node at the given zero-based index in the node set.

Arguments:
sort(set: xpath_node_set?; reverse: bool)

Sorts the xpath_node_set in document order (or reverse document order if reverse is true).

Arguments:
xpath_node_set const?.empty(set: xpath_node_set const?): bool

Returns true if the xpath_node_set contains no results.

Arguments:
xpath_node_set const?.first(set: xpath_node_set const?): xpath_node

Returns the first xpath_node in the set (in document order).

Arguments:
xpath_node_set const?.size(set: xpath_node_set const?): int

Returns the number of xpath_node entries in the node set.

Arguments:

7.3.15. Construction and RAII

7.3.15.1. using

using(arg0: block<(xml_document):void>)

Constructs an xml_document, passes it to the block, then destroys it automatically.

Arguments:
using(arg0: block<(xpath_variable_set):void>)
using(arg0: block<(xpath_node_set):void>)

xml_document(): xml_document

Constructs a new empty xml_document.

xpath_node_set(): xpath_node_set

Constructs a new empty xpath_node_set.

xpath_variable_set(): xpath_variable_set

Constructs a new empty xpath_variable_set.

7.3.16. Handle validity and comparison

xml_attribute!=(attr_a: xml_attribute; attr_b: xml_attribute): bool

Returns true if two xml_attribute handles refer to different attributes.

Arguments:
xml_attribute.hash_value(attribute: xml_attribute): uint64

Returns a hash value for the attribute handle, usable as a table key.

Arguments:
xml_attribute.ok(attribute: xml_attribute): bool

Returns true if the attribute handle is valid (non-null).

Arguments:
xml_attribute==(attr_a: xml_attribute; attr_b: xml_attribute): bool

Returns true if two xml_attribute handles refer to the same attribute.

Arguments:
xml_node!=(node_a: xml_node; node_b: xml_node): bool

Returns true if two xml_node handles refer to different DOM nodes.

Arguments:
xml_node.hash_value(node: xml_node): uint64

Returns a hash value for the node handle, usable as a table key.

Arguments:
xml_node.offset_debug(node: xml_node): int

Returns the byte offset of this node in the original parsed XML source.

Arguments:
xml_node.ok(node: xml_node): bool

Returns true if the node handle is valid (non-null).

Arguments:
xml_node==(node_a: xml_node; node_b: xml_node): bool

Returns true if two xml_node handles refer to the same DOM node.

Arguments:
xml_text.ok(text: xml_text): bool

Returns true if the text handle is valid (non-null).

Arguments:
xpath_node.ok(xpath_node: xpath_node): bool

Returns true if the xpath_node result is valid (non-null).

Arguments:
xpath_query const?.ok(query: xpath_query const?): bool

Returns true if the XPath query compiled successfully.

Arguments: