10. LINQ-fold patterns — what _fold(...) recognizes
This is a catalog of every chain shape that the _fold macro
splices into a specialized loop. Each row names a chain pattern, the
splice arm in daslib/linq_fold.das it dispatches to, and a short
note on the emitted loop’s shape.
Pipelines that don’t match any row here fall through to the default
fold_linq_default path, which lowers via the standard
iterator-materializing surface (the same code path as the
non-spliced linq calls).
For the corresponding tutorial, see LINQ over DECS. For
nanosecond-level cost comparisons across the SQL / Array / Decs lanes,
see benchmarks/sql/results.md in the source tree.
10.1. How dispatch works
_fold walks the chain inside-out (terminator first), flattens the
ExprCall spine via flatten_linq, normalizes adjacent
where/select pairs and order |> reverse shapes via the pre-passes
described below, and hands the result to try_splice_patterns in
daslib/linq_fold.das. That dispatcher walks a single global
splice_patterns table (17 rows, one per arm listed in this
document) twice:
Decs adapter pass. Runs only when
extract_decs_bridge(top) != null(i.e. the source isfrom_decs_template(...)). Each row’srequirespredicates gate the match; rows with anarray_sourcepredicate fail here and fall through. First emit that returns non-null wins.Array adapter pass. Runs only when
extract_decs_bridge(top) == null, i.e. the source is not a decs eager bridge. Top is firstpeel_each-unwrapped. Decs chains never reach this pass: if the Decs pass above cascades on every row, control falls through tofold_linq_defaultinstead. (This is deliberate —peel_eachdoes not strip the eager-bridgeExprInvoke, so without this gate thedecs_sourcepredicate would still succeed on a decs source in the Array pass and decs-only rows could match and emit viaSourceAdapter::Array, silently dropping adapter-specific captures likeupstream_join.)
If neither pass emits, the Theme 6 perf-warn fires for
from_decs_template chains (telling the user the bridge will
materialize and tier-2 will run on the buffer), then control falls
through to fold_linq_default (the iterator-materializing tier-2
path) and finally to a raw passthrough.
Note
Labels of the form plan_<X> (e.g. plan_distinct,
plan_decs_reverse) in the catalog below refer to the
pre-PR-E per-planner stubs that were collapsed into the unified
dispatcher. Each label now corresponds to a row (or pair of rows)
in splice_patterns whose requires predicates and emit
function carry the same logic. The names are kept here because
they read more naturally than row names like
order_buffer_helper_dispatch — see daslib/linq_fold.das
for the precise row each catalog entry maps to.
10.2. Pre-dispatch normalizations
A few chain rewrites fire at the start of the relevant planners (right
after flatten_linq, before the per-arm pattern-match) so the rest of
that planner’s logic sees the normalized shape and a single arm covers
what would otherwise be many lookalike chains:
_order_by(K).reverse()→_order_by_descending(K)(and the three symmetric flips:order_by_descending → order_by,order → order_descending,order_descending → order). Applied bynormalize_order_reverse, called from everyplan_*order_familyandplan_*reverseplanner right afterflatten_linq. The registry pointer is swapped to the flipped variant in place — theExprCallarg list is identical for ascending/descending order variants, so no AST clone is needed. Iterative: a chain like_order_by(K).reverse().reverse()collapses to_order_by(K)in two passes._select(f) |> _select(g)(N consecutive) →_select(g(f(_))). Applied bycollapse_chained_selects, called fromplan_zip,plan_distinct,plan_decs_distinct,plan_reverse,plan_decs_reverse,plan_decs_join(alsocollapse_chained_wheresper PR D2), and fromplan_order_family/plan_decs_order_family(which now accept a leading_select— see the next bullet). Mirrors how chained_wherealready compose via&&. Composition takes the INNER lambda’s structure (preserves param type), renames its bound param to a freshqn("cs", at)name to avoidapply_templaterecursive substitution when both lambdas share the boost-side_desugar, then overwrites its body with outer’s body where outer’s param is substituted by the renamed-inner body. Chain backlink rewired so subsequent planner passes see the shortened AST. Gated on!has_sideeffects(innerBody)— collapsing would shift evaluation count when outer references its param zero or many times (cascade always evaluates inner once per element). Chains with%/// user-call inner cascade to tier-2 (output remains correct). Also bails (cascades) when either selector is not a peelable single-arg, single-returnExprMakeBlocklambda — multi-statement projection bodies, captured/non-trivial lambda shapes, and function-pointer arguments all skip collapse.plan_loop_or_count,plan_group_by_core, andplan_decs_unrollalready handle chained selects natively via theirintermediateBinds/ chain-info machinery and don’t need the pre-pass.source |> _select(f) |> <order/distinct/take>— a leading_select(f)that projects the source is absorbed into the source walk. An optional leadingsrcselslot on the fourwrap_source_loop-based order rows (streaming_min/bounded_heap/order_then_plain_distinct/fused_prefilter— not the source-directbuffer_helper_dispatch) captures the_select;run_splice_adapterthen wraps the source adapter inProjectedSourceAdapter(linq_fold_common), which bindsprojName = f(rawElem)atop the per-element body and delegateswrap_source_loop/wrap_invoketo the inner adapter. Every emit seesf(rawElem)as the element with no emit-fn change, and XML field-pruning is preserved (f’sit.<field>reads reach the inner materializer). Without it the chain bails to tier-2 (materialize-all + sort-all); the bench m3f/m4 lanes hid the gap by pre-projecting their source array, the XML lane can’t. The slot is optional — chains with no leading_selectare byte-identical.
10.3. Source-side entry points
Source |
Recognized by |
Note |
|---|---|---|
|
|
Strips the |
|
pattern |
Two-source zip. The three-argument form |
|
|
Surfaces a |
|
|
Runtime component-name list form. Same decs splices as the template form. |
|
|
Optional source — only when the |
|
|
In-tree source — recognized by name plus a table-typed argument ( |
|
|
In-tree source — the adapter is compiled in unconditionally (no |
10.4. Array-source patterns
Chain shape |
Splice arm |
Notes |
|---|---|---|
|
|
O(1) length read when the source has a known length. |
|
|
Single counter, no allocation; one pass over the array. |
|
|
Single-pass scalar reduce; |
|
|
Multi-output reduce; one scalar per output kept in the loop’s state. |
|
|
Early-exit terminator; loop breaks on first match ( |
|
|
Boolean fast-path; loop exits on first hit ( |
|
|
Bounded counter/accumulator; loop exits at N matches. |
|
|
Take cap ticks unconditionally; |
|
|
|
|
|
Insert-loop straight into the result table — no intermediate array. A |
|
|
Single |
|
|
|
|
|
Theme 3 Phase 3 (audit C1/C5). The bounded-heap path gains a leading or middle |
|
|
Bounded-heap / streaming-min holds the raw element; projection |
|
|
Materializes + sorts. No bounded-heap shortcut. |
|
|
Multi-key orderby with per-key direction ( |
|
|
Single-hash set lane; |
|
|
Dedup table is built unconditionally so |
|
|
Theme 8 (audit 3b). The where_+order fused-loop path generalizes: when upstream |
|
pattern |
Per-key bucket reducer; single hash, one entry per group. PR D1: reducer dispatch is now a |
|
pattern |
HAVING filter on the bucket reference (pre-aggregate); can lift hidden reducer slots referenced by |
|
pattern |
HAVING filter on the constructed post-aggregate tuple (predicate references |
|
pattern |
Theme 3 Phase 2 (audit C2). Inline-cmp |
|
|
Single loop |
|
|
Catch-all path for chains with pre-reverse |
|
|
Theme 8 (audit 2a). Array source only ( |
|
|
The exact complement of R-2a ( |
|
|
Reverse is identity for a count, so one forward pass increments a counter — no buffer, no reverse. The projection still fires per match (side-effect parity). Works on iterator and array sources (for-loop body, no indexed access). |
10.5. Decs-source patterns
Every array pattern above has a decs mirror that walks each archetype
of T’s template instead of iterating the array. The body shape is
identical — only the source iteration changes.
Note
As of PR C, the four plan_decs_* planners (_reverse /
_distinct / _order_family / _unroll) are thin pattern-table
stubs that reuse the same pattern rows as their array-side siblings
(plan_reverse_patterns / plan_distinct_patterns /
plan_order_family_patterns / plan_loop_or_count_patterns)
with a SourceAdapter::Decs adapter swap. The 7 array-side emit
archetypes consume the adapter (adapter_bind_name selects
it vs decs_tup for lambda peeling; adapter_wrap_source_loop
dispatches for (it in src) vs for_each_archetype + build_decs_inner_for_pruned;
adapter_wrap_invoke dispatches the outer invoke wrap). For
plan_decs_unroll (which feeds emit_loop_or_count_lane), the
Decs-arm dispatch (emit_loop_or_count_lane_decs) reconstructs a
calls array from captures and routes to the existing
emit_decs_* lane fns unchanged (state hoist above
for_each_archetype stays per-adapter; see masterplan D1).
Two decs-specific fast paths preserved: emit_decs_count_archsize
(bare count()) and emit_decs_reverse_skip_into_tail
(reverse |> take(N) |> to_array). Row 4 of
plan_order_family_patterns (buffer_helper_dispatch) is gated
to Array adapter via array_source — decs cascades to Row 3
(fused_prefilter) which materializes the buffer, matching the
imperative decs behavior. reverse |> distinct[_by] on decs
sources now fuses via the source-generic R-2b forward keep-last row
(emit_reverse_distinct_forward_keeplast, gated non_array_source)
— one table-overwrite emit shared by decs / XML / iterators, not a
parallel decs fn (closes masterplan D6).
As of PR D3, the GroupBySourceAdapter shim (a parallel adapter
used only by plan_group_by_core) is gone — group_by’s three
source shapes (Array / Decs / DecsJoin) all flow
through the same SourceAdapter variant as every other planner.
plan_group_by_core calls adapter_wrap_source_loop and
adapter_wrap_invoke directly. The decs-join branch of
adapter_wrap_source_loop carries the inline hash-collect +
probe + per-pair result-lam bind body shared with
emit_decs_join.
Chain shape (decs source) |
Splice arm |
Notes |
|---|---|---|
|
|
Sums |
|
|
Counter loop over the per-archetype walk. The bare |
|
|
Per-archetype accumulator; pruner keeps only the components read by |
|
|
Walk lane reads one component per loop iteration; element_at uses cumulative-size short-circuit. Bare |
|
|
Reads the last non-empty archetype’s |
|
|
Boolean fast-path; walks until first match or end. |
|
|
Concatenates archetype slices; one allocation sized by |
|
|
Single |
|
|
Same heap pattern as the array variant; buffer size N. |
|
|
Decs mirror of the array-side |
|
|
Decs mirror of |
|
|
Streaming-min/max with key. |
|
|
Single-hash set lane mirroring |
|
|
Decs mirror of the array-side predicate-distinct splice. Same dedup-unconditional / counter-gated-on-P shape across archetypes. |
|
|
Whole-archetype skip + early-exit. For indexable sources the boundary archetype is random-indexed ( |
|
|
Decs mirror of |
|
pattern |
Shared bucket-reducer with the array path; differs only in the per-element source. |
|
pattern |
Decs mirror of the array-side post-aggregate HAVING. Same predicate-on-output-tuple semantics. |
|
pattern |
Decs mirror of the array-side ORDER BY splice (Theme 3 Phase 2 C2). Shares the same in-place inline-cmp sort tail; only the bucket-fill source differs. |
|
pattern |
Theme 3 Phase 1 cross-arm composition. |
|
|
Hoists |
|
|
Decs mirror of the array-side |
Note
Several decs rows above (including this one) label the splice arm as plan_decs_unroll even though, post-PR-C, dispatch flows through the shared plan_loop_or_count pattern row (emit_loop_or_count_lane → emit_loop_or_count_lane_decs → emit_decs_*). The plan_decs_unroll label is retained for table-row consistency across the pre- and post-unification decs entries.
10.5.1. Decs-decs equi-join
plan_decs_join is the hashed equi-join splice over two
from_decs_template sources. It collects the right side into a
table<KEY; array<TUPB>> in one for_each_archetype pass, then
walks the left side and probes via table.get. The key must be a
primitive (int* / uint* / float / double / bool /
string); tuple keys cascade to the standard join_impl.
Chain shape |
Splice arm |
Notes |
|---|---|---|
|
pattern |
Hash-fill + probe; |
|
pattern |
Hash-fill + probe; |
|
pattern |
Single bind of the join result per matched pair, then projection. |
|
pattern |
Bind join result, evaluate predicate, gate |
|
pattern |
Pre-join filter on srcA, fused into the per-archetype probe as |
|
|
Cross-arm composition. |
10.5.2. Array-array equi-join
emit_array_join is the array-source mirror of emit_decs_join —
hashed equi-join over two array / iterator sources. Algorithm is
identical (collect srcb into table<KEY; array<TUPB>> in one pass,
then walk srca and probe via table.get) but the lead iteration
comes from the adapter (wrap_source_loop / bind_name /
invoke_param_type), so any direct-return loop source rides it —
ArrayAdapter frames a plain for (elem in src), TableAdapter
its pruned slot walk (vs for_each_archetype + build_decs_inner_for
on the decs side). Both sources bind as invoke parameters (2-source
wrap, mirrors Zip). Same primitive equi-key gate as the decs side;
non-primitive keys cascade to join_impl_const. When srcB is a
table walked on its bare key, the internal hash is skipped entirely —
see the table-source row above and the probe row below.
Chain shape |
Splice arm |
Notes |
|---|---|---|
|
pattern |
Hash-fill + probe; |
|
pattern |
Implicit |
|
pattern |
Single bind of the join result per matched pair, then
projection. |
|
pattern |
Bind join result, evaluate predicate, gate |
|
pattern |
Pre-join filter on srcA, fused into the probe loop as
|
|
probe mode ( |
Table-srcB probe: the b-key selector IS the table key, so no
hash and no build loop — srcB binds the user’s table itself
(const param) and the per-A probe is a key lookup. Unique table
keys ⇒ bucket ≤ 1 ⇒ probe ≡ hash semantics exactly (b-key is a
bare field read, so skipping its per-B evaluation is
unobservable). Usage-pruned like the point-lookup fold:
count-no-where / key-only shapes probe |
|
pattern |
Table lead: same emitter, lead loop framed by
|
|
pattern |
Table lead group_by: |
|
pattern |
C# GroupJoin (outer): one result row per srcA row — |
|
|
Cross-arm composition. |
10.6. XML-source patterns
An unsafe(from_xml_node(node[, name], type<Row>)) source folds through
XmlAdapter (modules/dasPUGIXML/daslib/linq_fold_xml.das), loaded only when
the pugixml module is linked. Unlike a hard-coded source row, the adapter
rides every pattern row the array / decs planners expose — try_splice_patterns
runs with no onlyRow restriction, and per-row requires predicates plus the
adapter’s capability hooks decide what fuses. Three mechanics make the emitted loop
differ from the array and decs lanes.
Single flat DOM walk, forward-only. The adapter emits one while over the
node’s child elements (first_child / next_sibling, or child(node, name)
for the 3-arg named overload), like the array lane and unlike decs’s two-level
archetype walk. But an XML node has no random index, so XML matches the
non_array_source rows (e.g. the R-2b forward keep-last reverse-distinct) and is
excluded from the array_source-only rows (Row 5 buffer_helper_dispatch
direct-helper, R-2a backward-index reverse-distinct). Bare order_by / order
therefore cascades to the fused_prefilter row (materialized buffer), not the
direct daslib helper — the same fall-through decs takes.
Field-pruning (pass 2b). Before emitting the per-element body the chain is
scanned (XmlRowUsageScanner) for the Row fields it actually reads. Only
those attributes are read — each via read_xml_field into a scalar local, with
the body’s it.<field> rewritten to that local and the Row struct dropped
entirely. Unread fields are never touched, so a chain that reads only numeric
fields runs alloc-free (the per-string clone_string is the materialization
cost). Three outcomes:
Pruned — body reads only
it.<field>scalars: onelet xf_<f> = read_xml_field(...)per referenced field, struct dropped.Whole-row escape — body references the bind
itas a whole value (to_array, identity_select(_), pass-to-user-fn): falls back to the fullbuild_xml_row.Guarded escape — the whole row escapes only inside a bare
if (cond) { … }: the predicate’s fields are read cheaply viapeek_xml_field(borrowedstring#, no clone) and the fullbuild_xml_rowruns only for matching elements.
Deferred materialization. A buffered reducer (order / take, distinct_by)
holds (key, xml_node) handle surrogates rather than built rows, and runs
build_xml_row only for the K survivors at return — defers_materialization()
is true, current_handle_expr is the per-element xcur node, and
materialize_handle emits the deferred build_xml_row. A 1000-element document
feeding order_by(K).take(10) builds 10 rows, not 1000.
Chain shape (XML source) |
Splice arm |
Notes |
|---|---|---|
|
|
The base lane — the same emit archetypes as the array side, over the field-pruned DOM walk. |
|
|
One handle held in |
|
|
Heap of N |
|
|
|
|
|
Dedup set over the key; the kept slot stores the node handle ( |
|
|
Forward keep-last table-overwrite (no backward index); the slot stores the |
|
|
Backward DOM walk ( |
bare |
|
The last element is the first the backward walk reaches: one |
|
pattern |
Hashed equi-join: srcB (an in-memory array) collected into |
|
|
Cross-arm: the join’s per-pair result feeds the bucket update directly — one pass, no intermediate join array. Same v1 constraints as the array / decs cross-arm (primitive key, no segments between |
|
|
Per-key bucket reducer over the DOM walk; shares the array path’s reducer dispatch and the trailing HAVING / ORDER BY extensions. |
|
leading- |
The leading projection is absorbed into the source walk and field-pruning is preserved — |
Defers to tier-2 (the XmlAdapter hook returns null, so the chain cascades):
Group join (
_group_join/join … into) —emit_join_hookreturns null for thegroup_joinliteral.Non-primitive join keys / non-array srcB — the same gate as the array / decs join (tuple keys cascade to
join_impl).Correlated nested-collection flatten (
from o … from l in o.lines) —from_xml_nodereads scalar attributes only; there is no nested collection to flatten.Mixed-source operators (
union/except/intersect/concat) — fall back exactly as for array / decs sources.
10.7. Zip patterns
Chain shape |
Splice arm |
Notes |
|---|---|---|
|
pattern |
Fuses to a single index-loop over the shorter side. |
|
pattern |
Three-source zip; same loop shape with three reads per iteration. |
|
pattern |
Theme 1 (audit 7a). The 3-arg form |
|
pattern |
|
|
pattern |
Early-exit terminator on the zipped pair. |
|
pattern |
The 2-arg |
|
pattern |
Theme 8 (audit C4). |
10.8. What falls back
The default path (fold_linq_default) fires when none of the
plan_* arms recognize the chain. This is the standard linq surface
— iterators materialize, callbacks fire, and there is no fusion.
Common cases that fall back:
Mixed-source operators like
union(a, b),except(a, b),intersect(a, b),concat(a, b)after the first source has been transformed (e.g.each(a)._select(F).union(b)).Joins other than decs-decs equi-join:
_left_join/_right_join/_full_outer_join/_cross_joindon’t splice; array-source_joinalso falls back. Only the decs-decs primitive-key_joinshape catalogued above splices (viaplan_decs_join); tuple keys, non-primitive keys, mixed array/decs sources, or chain ops beyond a single trailing_where/_selectall cascade tojoin_impl.Aggregations on lazy groupings:
_group_by_lazy(K)._select(F)with a non-bucket-reducing_select.Selector-based ``to_table(key, elementSelector)`` — the 3-arg form keeps its tier-2 generic; only the selector-free
to_table()terminator splices (see the table-buffer materializer row above).Chained ``_select(f) |> _select(g)`` with an impure inner (
_ % N,_ / N, user-call inner that the typer can’t prove pure). Thecollapse_chained_selectspre-pass is gated on!has_sideeffects(innerBody)because collapsing would shift evaluation count when outer references its param zero or many times. Pure inner (_._field,_ + K,_ * K, etc.) collapses transparently and the downstream planner sees a single_select.
When a chain falls back, behavior is identical to writing the same
chain without _fold — correct, but not splice-fused.
10.8.1. Decs-bridge fall-off diagnostic
When a from_decs_template source survives _fold dispatch
without any tier-1 planner (decs or array-side) claiming it, the
bridge materializes into a temp res array before
fold_linq_default runs on top — an EXTRA allocation beyond
whatever cascade follows. LinqFold.visit detects this case right
before falling through to fold_linq_default: it destructures
flatten_linq(call.arguments[0]) into (top, calls) and fires
only when calls is non-empty (a real cascade is about to run) and
extract_decs_bridge(top) is non-null (the source IS a
from_decs_template bridge). Bare
_fold(from_decs_template(...)) with no chain ops is skipped —
there’s no cascade, just the bridge’s own materialization. When
fired, a *warning* goes to the compiler log naming the call
site:
user.das:42:8: *warning* `_fold`: from_decs_template source
survived dispatch — no `plan_decs_*` arm claimed this chain, so
the bridge materializes a temp `res` array and the tier-2 cascade
runs on the materialized buffer. Rewrite the chain to a
recognized decs shape (see
doc/source/reference/linq_fold_patterns.rst), or suppress with
`options _no_linq_perf_warn = true`.
The fix is usually to reorder ops so the chain matches a row in the
Decs section above (e.g. push _select past _skip_while /
_take_while since their predicates run on the source tuple, not
the projected value). Suppress per file with options
_no_linq_perf_warn = true for tests that intentionally exercise
cascade behavior as regression guards.
10.9. See also
LINQ over DECS — using
from_decs_templateand_foldover decs entities.LINQ — Language-Integrated Query — the underlying linq surface (
where,select,order_by, etc.) and the_foldmacro.LINQ fold macros (_fold / _old_fold) — the
linq_foldmodule reference (the two public macros_foldand_old_fold).LINQ — the full linq surface.
Boost module for LINQ — the shorthand macros (
_where,_select,_order_by, etc.) and the surrounding macro family.