10. LINQ-fold patterns — what _fold(...) recognizes

This is a catalog of every chain shape that the _fold macro splices into a specialized loop. Each row names a chain pattern, the splice arm in daslib/linq_fold.das it dispatches to, and a short note on the emitted loop’s shape.

Pipelines that don’t match any row here fall through to the default fold_linq_default path, which lowers via the standard iterator-materializing surface (the same code path as the non-spliced linq calls).

For the corresponding tutorial, see LINQ over DECS. For nanosecond-level cost comparisons across the SQL / Array / Decs lanes, see benchmarks/sql/results.md in the source tree.

10.1. How dispatch works

_fold walks the chain inside-out (terminator first), flattens the ExprCall spine via flatten_linq, normalizes adjacent where/select pairs and order |> reverse shapes via the pre-passes described below, and hands the result to try_splice_patterns in daslib/linq_fold.das. That dispatcher walks a single global splice_patterns table (17 rows, one per arm listed in this document) twice:

  1. Decs adapter pass. Runs only when extract_decs_bridge(top) != null (i.e. the source is from_decs_template(...)). Each row’s requires predicates gate the match; rows with an array_source predicate fail here and fall through. First emit that returns non-null wins.

  2. Array adapter pass. Runs only when extract_decs_bridge(top) == null, i.e. the source is not a decs eager bridge. Top is first peel_each-unwrapped. Decs chains never reach this pass: if the Decs pass above cascades on every row, control falls through to fold_linq_default instead. (This is deliberate — peel_each does not strip the eager-bridge ExprInvoke, so without this gate the decs_source predicate would still succeed on a decs source in the Array pass and decs-only rows could match and emit via SourceAdapter::Array, silently dropping adapter-specific captures like upstream_join.)

If neither pass emits, the Theme 6 perf-warn fires for from_decs_template chains (telling the user the bridge will materialize and tier-2 will run on the buffer), then control falls through to fold_linq_default (the iterator-materializing tier-2 path) and finally to a raw passthrough.

Note

Labels of the form plan_<X> (e.g. plan_distinct, plan_decs_reverse) in the catalog below refer to the pre-PR-E per-planner stubs that were collapsed into the unified dispatcher. Each label now corresponds to a row (or pair of rows) in splice_patterns whose requires predicates and emit function carry the same logic. The names are kept here because they read more naturally than row names like order_buffer_helper_dispatch — see daslib/linq_fold.das for the precise row each catalog entry maps to.

10.2. Pre-dispatch normalizations

A few chain rewrites fire at the start of the relevant planners (right after flatten_linq, before the per-arm pattern-match) so the rest of that planner’s logic sees the normalized shape and a single arm covers what would otherwise be many lookalike chains:

  • _order_by(K).reverse()_order_by_descending(K) (and the three symmetric flips: order_by_descending order_by, order order_descending, order_descending order). Applied by normalize_order_reverse, called from every plan_*order_family and plan_*reverse planner right after flatten_linq. The registry pointer is swapped to the flipped variant in place — the ExprCall arg list is identical for ascending/descending order variants, so no AST clone is needed. Iterative: a chain like _order_by(K).reverse().reverse() collapses to _order_by(K) in two passes.

  • _select(f) |> _select(g) (N consecutive) → _select(g(f(_))). Applied by collapse_chained_selects, called from plan_zip, plan_distinct, plan_decs_distinct, plan_reverse, plan_decs_reverse, plan_decs_join (also collapse_chained_wheres per PR D2), and from plan_order_family / plan_decs_order_family (which now accept a leading _select — see the next bullet). Mirrors how chained _where already compose via &&. Composition takes the INNER lambda’s structure (preserves param type), renames its bound param to a fresh qn("cs", at) name to avoid apply_template recursive substitution when both lambdas share the boost-side _ desugar, then overwrites its body with outer’s body where outer’s param is substituted by the renamed-inner body. Chain backlink rewired so subsequent planner passes see the shortened AST. Gated on !has_sideeffects(innerBody) — collapsing would shift evaluation count when outer references its param zero or many times (cascade always evaluates inner once per element). Chains with % / / / user-call inner cascade to tier-2 (output remains correct). Also bails (cascades) when either selector is not a peelable single-arg, single-return ExprMakeBlock lambda — multi-statement projection bodies, captured/non-trivial lambda shapes, and function-pointer arguments all skip collapse. plan_loop_or_count, plan_group_by_core, and plan_decs_unroll already handle chained selects natively via their intermediateBinds / chain-info machinery and don’t need the pre-pass.

  • source |> _select(f) |> <order/distinct/take> — a leading _select(f) that projects the source is absorbed into the source walk. An optional leading srcsel slot on the four wrap_source_loop-based order rows (streaming_min / bounded_heap / order_then_plain_distinct / fused_prefilter — not the source-direct buffer_helper_dispatch) captures the _select; run_splice_adapter then wraps the source adapter in ProjectedSourceAdapter (linq_fold_common), which binds projName = f(rawElem) atop the per-element body and delegates wrap_source_loop / wrap_invoke to the inner adapter. Every emit sees f(rawElem) as the element with no emit-fn change, and XML field-pruning is preserved (f’s it.<field> reads reach the inner materializer). Without it the chain bails to tier-2 (materialize-all + sort-all); the bench m3f/m4 lanes hid the gap by pre-projecting their source array, the XML lane can’t. The slot is optional — chains with no leading _select are byte-identical.

10.3. Source-side entry points

Source

Recognized by

Note

each(array<T>)

peel_each

Strips the each wrapper; subsequent chain plans see the raw array<T> source.

zip(a, b) / zip(a, b, sel)

pattern zip_general (emit fn emit_zip)

Two-source zip. The three-argument form zip(a, b, sel) is pre-lowered to zip(a, b) |> _select(sel-as-tuple) so the standard zip+select fusion fires (closes the dot-product idiom).

from_decs_template(type<T>)

plan_decs_unroll etc.

Surfaces a [decs_template] schema. Decs splices fire.

from_decs(...)

plan_decs_unroll etc.

Runtime component-name list form. Same decs splices as the template form.

unsafe(from_xml_node(node[, name], type<Row>))

extract_xml_source (XmlAdapter, modules/dasPUGIXML/daslib/linq_fold_xml.das)

Optional source — only when the pugixml module is linked (require ?pugixml + static_if (typeinfo builtin_module_exists(pugixml))). Emits an inlined DOM child-element walk replacing the generator, and field-prunes the per-element materialization (pass 2b): the chain body is scanned for the Row fields it reads, and only those attributes are read via read_xml_field into scalar locals — unread fields (notably string fields, whose clone_string is the alloc cost) are never touched, so a float-only chain runs alloc-free and JIT beats the equivalent SQLite query. A whole-row escape (to_array / identity _select(_) / pass-to-fn) routes to the full build_xml_row instead. The XmlAdapter rides every pattern row (try_splice_patterns runs with no onlyRow restriction); per-row requires predicates and the adapter’s capability hooks (can_join / can_group_by / defers_materialization / the non_array_source gate) decide what fuses, and a shape it can’t fuse cascades to tier-2 — see XML-source patterns for the full fuse/defer breakdown. unsafe is required (the source is [unsafe_outside_of_for]) and the node is passed by value (var root_fold’s macro-arg inference skips the const&→value copy).

unsafe(each_kv(tab)) / keys(tab) / values(tab)

extract_table_source (TableAdapter, daslib/linq_fold_table.das)

In-tree source — recognized by name plus a table-typed argument (table<K;V> / table<K>), so an unrelated user keys never fires it. The kv lane (each_kv) binds kv.key / kv.value and usage-prunes the walk: a chain touching only .value walks values(tab) alone, only .key (or neither) walks keys(tab) alone — half the slot-skip work of the zipped two-iterator form, which is emitted only when both sides (or the whole pair) are read. A whole-pair escape binds a named-tuple copy, so the kv lane fuses copyable value types only — a non-copyable-valued each_kv falls through and the surviving instantiation concept-asserts (error 31400; the keys/values lanes still fuse such tables). Bare count() / long_count() folds to O(1) length(tab); a plain distinct over raw keys/kv elements is dropped before matching (keys are unique by construction; only uniqueness-preserving prefix ops allow the drop — a preceding select keeps the distinct, and the values lane always keeps it). Point-lookup folds (try_table_point_lookup): a key-equality where (kv.key == X, bare k == X on the keys lane, either operand order; predicate-form any(p) / count(p) too) against a loop-invariant, side-effect-free X folds the whole walk to an O(1) probe — any / keys-lane contains(X)key_exists(tab, X), countkey_exists ? 1 : 0, first / first_or_default (± one trailing select) → a tab?[X] probe with the scan’s exact semantics (panic on a missing first, eagerly-bound default value otherwise). The key-equality may carry residual conjunctskv.key == X && residual… with the key-equality as the leftmost conjunct (a leading run of consecutive where calls AND-merges first, so where(key == X) |> where(res) is the same shape): the probe evaluates the residual on the probed element only, a false residual routes to the same miss path, and no purity gate is needed on the residual because keys are unique — under && short-circuit it runs at most once on both paths. Anything else — a compound predicate whose leftmost conjunct is not the key-equality, other comparison operators, an X that reads the binder or has side effects (the scan evaluates X per element, the probe once) — keeps the scan. order_by / take / first observe the table’s unspecified slot order, exactly like a hand for (k, v in keys(t), values(t)) loop. Joins fuse on either side (can_join is on; the adapter rides the shared emit_array_join through its own wrap_source_loop): a table lead walks its pruned slot iterator(s) as the probe loop; a table in the srcB slot joined on its bare key — d.key on the kv lane, the bare element on a keys(set) source — skips the join’s internal table<KEY; array<TUPB>> entirely and probes the user’s table per lead row (join_keyb_is_bare_key + build_join_probe_pieces; unique table keys make the probe ≡ hash semantics exactly). The probe is itself usage-pruned: count-no-where and key-only shapes stay on key_exists, value shapes bind the matched value by reference from a tab?[k] pointer (no copy), and only a whole-pair use binds the kv tuple. A non-bare b-key keeps the hashed build over the kv iterator; group_join (outer — its result consumes the whole bucket) always keeps it. ``group_by`` fuses (can_group_by is on; build_group_by_adapter hands plan_group_by_core a fresh TableAdapter, so the bucket-fill loop is the usage-pruned slot walk — a group key over kv.value.brand walks values(tab) alone) for the plain-lead shape only: join |> group_by over a table lead declines (the upstream-join arm returns null) and reverse has no backward slot walk — those shapes cascade to tier-2 (see benchmarks/sql/LINQ_TO_TABLE.md). ``to_table()`` sinks fuse (table-buffer materializer row above): the chain inserts straight into the result table — a bare each_kv(tab).to_table() is a reserve-ahead table clone through the fused walk, and a keys(tab) chain lands in the table<K> set form. unsafe is required at an unfused chain head (the sources are [unsafe_outside_of_for]); fused chains rewrite the head before inference.

unsafe(from_json(jv, type<Row>))

extract_json_source (JsonAdapter, daslib/linq_fold_json.das)

In-tree source — the adapter is compiled in unconditionally (no static_if gate, unlike XML’s pugixml one), but a program only pulls JSON into scope by requiring json / json_boost itself. extract_json_source matches a from_json whose first argument is a json::JsonValue?, so a JSON-less program returns null and the chain falls to the array tier. The adapter pulls in no json dependency — it emits from_json / read_json_field by name (resolved at the user’s splice site, like linq_fold_decs emits for_each_archetype; from_JV is emitted only for a non-struct element type). Emits an inlined for (e in jv.value as _array) walk replacing the generator, and field-prunes the per-element materialization (pass 2b): only the keys the chain reads are pulled via read_json_field by name — unread keys (notably string fields whose materialization clones) are never touched, so a scalar-only chain skips ~all of the full per-row build (3.6× over the full materialize — see benchmarks/micro/json_source_shapes.das). A whole-row escape reads every top-level field by name (emit_full_row_by_name), so a custom whole-row from_JV(Row) override is not honored (Option B — this is a flat query source, not a deserializer; materialize the array with an explicit from_JV first for that). unsafe is required (the source is [unsafe_outside_of_for]). Deferred materialization mirrors XML: order/distinct/take buffer a cheap (orderKey, JsonValue?) surrogate and materialize only the K survivors — by name (emit_full_row_by_name), so a struct survivor reads each field by key; only a non-struct Row falls back to outBind <- from_JV(handle, type<Row>). The JsonAdapter also fuses join / join |> group_by (emit_join_hook + JsonJoinAdapter off build_group_by_adapter’s upstream-join arm), reusing the array-join machinery (build_join_standalone_pieces / build_join_adapter_pieces): srcB is collected into a table<KEY; array<TUPB>> and the field-pruned array walk is the probe side, so the join key reads only its own field per element (e.g. read_json_field(jcur, "brand", …)). Standalone group_join and a trailing where / select / count over group-join rows defer to tier-2, mirroring XML.

10.4. Array-source patterns

Chain shape

Splice arm

Notes

.count() / .long_count()

emit_length_shortcut

O(1) length read when the source has a known length.

._where(P).count() / .long_count()

plan_loop_or_countemit_counter_lane

Single counter, no allocation; one pass over the array.

._where(P)._select(F).sum() / .average() / .min() / .max() / .aggregate(seed, op)

plan_loop_or_countemit_accumulator_lane

Single-pass scalar reduce; where and select fuse into the body.

._where(P)._select(F).min_by(K) / .max_by(K) / .min_max() / .min_max_by(K) / .min_max_average() / .min_max_average_by(K)

plan_loop_or_countemit_accumulator_lane

Multi-output reduce; one scalar per output kept in the loop’s state.

._where(P).first() / .first_or_default() / .last() / .last_or_default() / .single() / .single_or_default() / .element_at(N) / .element_at_or_default(N) / .aggregate(...)

plan_loop_or_countemit_early_exit_lane

Early-exit terminator; loop breaks on first match (first), counts to N (element_at), or carries-last (last / aggregate).

._where(P).any() / .all(P) / .contains(V)

plan_loop_or_countemit_any_empty_shortcut

Boolean fast-path; loop exits on first hit (any/contains) or first miss (all).

._where(P).take(N).count() / .sum()

plan_loop_or_count (counter / accumulator with takeExpr)

Bounded counter/accumulator; loop exits at N matches.

.take(N)._where(P).<terminator> (counter / accumulator / early-exit / array)

plan_loop_or_count (postTakeWhereCond gate)

Take cap ticks unconditionally; where gates only the per-element contribution. Preserves the “first N elements, then keep matching” semantic that where.take cannot express. Single trailing where only — skip / skip_while / take_while + where still cascade.

._where(P).take_while(P2).<...> / .skip_while(P2).<...>

plan_loop_or_count (predicate-driven ranges)

take_while exits on first non-match; skip_while toggles state.

._where(P)._select(K => V).to_table() (and bare / set forms)

plan_loop_or_count (table-buffer materializer)

Insert-loop straight into the result table — no intermediate array. A (k => v) tuple projection splits so key and value each evaluate once; other tuple projections bind to a local; a scalar chain lands in the table<K> set form. Reserve from O(1) source length on unfiltered walks. Duplicate keys keep the last occurrence (das insert semantics, not C#’s throw). The selector-based to_table(key, elementSelector) and decs sources keep the tier-2 path.

._order_by(K).first() / .first_or_default()

plan_order_family (streaming-min) → emit_streaming_min

Single var best + var seen, no buffer; one comparison per element.

._order_by(K).take(N).to_array()

plan_order_family (bounded-heap) → emit_bounded_heap

spliced_push_heap fill + replace, spliced_pop_heap on replace, order_inplace at end. Buffer of size N.

._distinct_by(K1)._order_by(K2).take(N).to_array() / ._order_by(K2).distinct().take(N).to_array() (plain distinct() mirror order accepted)

plan_order_family (bounded-heap + set-gate) → emit_bounded_heap

Theme 3 Phase 3 (audit C1/C5). The bounded-heap path gains a leading or middle distinct[_by] recognizer; per-element push/pop is gated by a set-insert on the distinct key (or whole element for plain distinct). Single source pass, no full distinct materialization. Position of distinct in the chain (before vs after _order_by) has no bearing on emission for the safe shapes — the set just gates the same heap update. Bails (cascades) on _order_by(K2).distinct_by(K1) because cascade semantics (“min-K2 per K1” — first K1 occurrence in sort order) cannot be honored by a source-walk set-gate, which would keep an arbitrary K1 representative; on distinct[_by] without take (would be silently dropped); and on take(N).distinct[_by]() (would dedup pre-take instead of post-take). Inline-able order key required (cascades otherwise). Composes with where_ (filter before distinct gate) and terminal _select (project ≤N heap survivors at return).

._order_by(K).take(N)._select(F).to_array() / .first()._select(F) / .first_or_default()._select(F)

plan_order_family (terminal _select) → emit_bounded_heap / emit_streaming_min

Bounded-heap / streaming-min holds the raw element; projection F runs ≤K times at return. Closes the natural “take top-K then project” idiom.

._order_by(K).to_array() / .order_by_descending(K).to_array() / .order(K).to_array() / .order_descending(K).to_array()

plan_order_family (full-sort fallback) → emit_buffer_helper_dispatch

Materializes + sorts. No bounded-heap shortcut.

._order_by_keys((K1, K2, …), descMask).to_array() / ._where(P)._order_by_keys((K1, K2), m).to_array()

plan_order_family (multi-key composite stable sort)

Multi-key orderby with per-key direction (descMask bit i → key i DESC; LSB = first key). With an inline-able tuple key + an upstream where (no take/first/distinct), emit_fused_prefilter builds one composite if-chain comparator (try_make_inline_cmp_keys) and emits a single stable_sort(buf, cmp) on the fused buffer — C# OrderBy / ThenBy parity, stable on full ties. Bare order_by_keys (no where) cascades to the eager order_by_keys op (also stable_sort-backed). Single-key ``_order_by`` is unchanged — it keeps the unstable order_inplace / sort path (no regression). take / first over a multi-key chain cascade to the eager op (multi-key is gated out of the bounded-heap and streaming-min rows, whose min/max-by-first-key collapse is wrong for a composite key). Capped at 4 keys (eager less_masked ≤ 4-arity).

._distinct() / ._distinct_by(K) followed by .count() / .to_array()

plan_distinctemit_hashtable_dedup

Single-hash set lane; count reads length(set).

._distinct() / ._distinct_by(K) followed by .count(P) / .long_count(P)

plan_distinct (predicate counter) → emit_hashtable_dedup

Dedup table is built unconditionally so distinct_by semantics keep FIRST occurrence per key; a separate var acc increments only when P matches that first occurrence. Mirrors tier-2 distinct.count(P) semantics (distinct-then-filter, not filter-then-distinct).

._distinct[_by](K1)._order_by[_descending](K2).to_array() / ._where(P)._distinct[_by](K1)._order_by(K2).to_array()

plan_order_family (fused-loop + set-gate) → emit_fused_prefilter

Theme 8 (audit 3b). The where_+order fused-loop path generalizes: when upstream distinct[_by] is present, declare var order_dset : table<...> and wrap the per-element push_clone with a set-gated if (!key_exists(...)) block. Single source pass + in-place sort, no distinct_by_to_array intermediate iterator setup. Composes with where_ (filter before distinct gate) and terminal _select (project at return). Bails (cascades) on distinct[_by] + order_by + first[_or_default] (streaming-min path has no dset hook) and on chains where take(N) is present (use the bounded-heap path via Theme 3 Phase 3 instead).

._group_by(K)._select(reduce).to_array()

pattern group_by_array (sub-codegen plan_group_by_corereducer_emitters lookup)

Per-key bucket reducer; single hash, one entry per group. PR D1: reducer dispatch is now a table<string; ReducerEmitterFn> lookup into named mk_reducer_* fns. Accepted reduce spellings per slot (is_bucket_reducer_call): bare _._1 |> <r>() for length / count / long_count / sum / min / max / first / average; inner-select _._1 |> select(L) |> <r>() and the equivalent direct selector _._1 |> <r>(L) (the 2-arg tier-2 overloads) for sum / min / max / average — an identity L canonicalizes to the bare form. first / count take no selector by design (their C# 2-arg forms are predicates). Untyped L params are fine on the bucket surface — the _select macro stamps the bucket element type before inference (BucketLambdaStamper), so fused and unfused chains accept the same spelling.

._group_by(K)._having(P)._select(...).to_array()

pattern group_by_array (sub-codegen plan_group_by_core)

HAVING filter on the bucket reference (pre-aggregate); can lift hidden reducer slots referenced by P but absent from the select.

._group_by(K)._select(reduce)._where(P).to_array() / .count()

pattern group_by_array (sub-codegen plan_group_by_core, trailing where as HAVING)

HAVING filter on the constructed post-aggregate tuple (predicate references _.AggField by name). Distinct from _having(P) and orthogonal — both can fire on the same chain.

._group_by(K)._select(reduce)._order_by(K2).to_array() / ._order_by_descending(K2).to_array()

pattern group_by_array (sub-codegen plan_group_by_core, trailing order_by as ORDER BY)

Theme 3 Phase 2 (audit C2). Inline-cmp sort(buf, ...) after the bucket-fill mutates the same output buffer in place — vs the tier-2 cascade’s separate order_by_inplace over a fresh allocation. v1: _order_by(K2) / _order_by_descending(K2) with inline-able key only; non-inline keys (side-effects, multi-stmt body) cascade. Composes with HAVING / _having(P).

.reverse().take(N)[._select(F)].to_array() (with no pre-reverse where / select)

plan_reverse R6 (backward-index walk) → emit_reverse_backward_index_walk

Single loop for k in 0..K indexes arr[len-1-k] and K push_clones into a srcElem-typed scratch buffer. When _select(F) is captured, build_terminal_select_tail then performs a post-loop projection pass into a separate projElem-typed output buffer (K projection push_clones). Two-buffer/two-pass mirrors the decs sibling emit_decs_reverse_skip_into_tail (PR #2915) and the R1-R4 catch-all: all source reads complete before any projection-side-effect runs, so impure _select behaves identically across the three paths. Skips the catch-all’s full-source push_clone walk (N → K raws) + reverse_inplace + resize. Fast path bails (cascades to R1-R4) when termsel’s call-result element type is unresolved at macro stage.

[._where(P)][._select(f)].reverse().take(N)._select(F).to_array() / .reverse()._select(F).first()

plan_reverse R1-R4 (terminal _select on catch-all) → emit_reverse_buffer_inplace / Rb (walk-and-overwrite scalar) → emit_reverse_walk_overwrite_scalar

Catch-all path for chains with pre-reverse _where / _select (R6 doesn’t accept those slots, cascades here). Projection runs ≤K times at return on the R1-R4 buffer or on the surviving last value. NOT accepted: reverse._select.take — user must reorder to reverse.take._select.

each(arr).reverse()._distinct[_by](K).to_array() (array source)

plan_reverse R-2a (backward index walk + set-gate) → emit_reverse_backward_walk_dset_gate

Theme 8 (audit 2a). Array source only (array_source predicate). Walks source backward via index (arr[len-1-k]), maintains var rev_dset : table<...> and gates push by set-insert on the dedup key (or whole element for plain distinct). LAST-per-key semantics preserved: backward walk picks first-seen-in-reversed-order = last-in-source occurrence, matching tier-2 reverse.distinct_by. Saves cascade’s reverse_to_array allocation AND second distinct_by_inplace pass. v1 implicit to_array only; pre-reverse _where / _select / take bail to cascade. Non-array (forward) sources take R-2b below.

src.reverse()._distinct[_by](K).to_array() (XML / decs / iterator source)

plan_reverse R-2b (forward keep-last table-overwrite) → emit_reverse_distinct_forward_keeplast

The exact complement of R-2a (non_array_source predicate): forward-only sources have no random index for the backward walk. One forward pass OVERWRITES var rdb_tab : table<key; (seq, val)> per element (so the slot ends at the last forward occurrence + its monotonic seq), then sorts survivors by descending seq and emits — output-identical to R-2a (descending forward-index of each last occurrence). Source-generic via emit_terminator_lane + wrap_source_loop: an XML source defers (val is the xml_node handle; build_xml_row runs only for the K survivors, field-pruned to the key), while decs / iterator store the full element and still win single-pass over the cascade’s reverse-buffer + second walk. Closes the decs m4 cell for this shape (D6).

[._where(P)][._select(F)].reverse().count()

plan_reverse Ra (counter) → emit_reverse_counter

Reverse is identity for a count, so one forward pass increments a counter — no buffer, no reverse. The projection still fires per match (side-effect parity). Works on iterator and array sources (for-loop body, no indexed access).

10.5. Decs-source patterns

Every array pattern above has a decs mirror that walks each archetype of T’s template instead of iterating the array. The body shape is identical — only the source iteration changes.

Note

As of PR C, the four plan_decs_* planners (_reverse / _distinct / _order_family / _unroll) are thin pattern-table stubs that reuse the same pattern rows as their array-side siblings (plan_reverse_patterns / plan_distinct_patterns / plan_order_family_patterns / plan_loop_or_count_patterns) with a SourceAdapter::Decs adapter swap. The 7 array-side emit archetypes consume the adapter (adapter_bind_name selects it vs decs_tup for lambda peeling; adapter_wrap_source_loop dispatches for (it in src) vs for_each_archetype + build_decs_inner_for_pruned; adapter_wrap_invoke dispatches the outer invoke wrap). For plan_decs_unroll (which feeds emit_loop_or_count_lane), the Decs-arm dispatch (emit_loop_or_count_lane_decs) reconstructs a calls array from captures and routes to the existing emit_decs_* lane fns unchanged (state hoist above for_each_archetype stays per-adapter; see masterplan D1).

Two decs-specific fast paths preserved: emit_decs_count_archsize (bare count()) and emit_decs_reverse_skip_into_tail (reverse |> take(N) |> to_array). Row 4 of plan_order_family_patterns (buffer_helper_dispatch) is gated to Array adapter via array_source — decs cascades to Row 3 (fused_prefilter) which materializes the buffer, matching the imperative decs behavior. reverse |> distinct[_by] on decs sources now fuses via the source-generic R-2b forward keep-last row (emit_reverse_distinct_forward_keeplast, gated non_array_source) — one table-overwrite emit shared by decs / XML / iterators, not a parallel decs fn (closes masterplan D6).

As of PR D3, the GroupBySourceAdapter shim (a parallel adapter used only by plan_group_by_core) is gone — group_by’s three source shapes (Array / Decs / DecsJoin) all flow through the same SourceAdapter variant as every other planner. plan_group_by_core calls adapter_wrap_source_loop and adapter_wrap_invoke directly. The decs-join branch of adapter_wrap_source_loop carries the inline hash-collect + probe + per-pair result-lam bind body shared with emit_decs_join.

Chain shape (decs source)

Splice arm

Notes

from_decs_template(type<T>).count() (bare, no chain ops, no predicate)

plan_decs_unrollemit_decs_count_archsize

Sums arch.size across archetypes; skips the per-entity walk entirely. Returns int — the += site truncates past INT_MAX, so chain long_count() instead (different splice arm — see next row) when an int64-safe total is required.

from_decs_template(type<T>).long_count() (bare); from_decs_template(type<T>)._where(P).count() / .long_count(); from_decs_template(type<T>).count(P) / .long_count(P)

plan_decs_unrollemit_decs_accumulator

Counter loop over the per-archetype walk. The bare long_count() shape does NOT use the arch.size shortcut above — that emitter returns int only. The count(P) / long_count(P) forms reach this arm via the Theme 4 root-cause fix to extract_decs_bridge: forExpr.iteratorVariables is unpopulated when no chain op forces an inference pass over the bridge’s inner for-loop, so previously bailed. The bridge now recovers iter names from mkTup.values (peeling the ExprRef2Value wrap).

from_decs_template(...)._select(F).sum() / .average() / .min() / .max() / .aggregate(...)

plan_decs_unrollemit_decs_accumulator

Per-archetype accumulator; pruner keeps only the components read by F.

from_decs_template(...).first() / .first_or_default() / .last() / .last_or_default() / .single() / .single_or_default() / .element_at(N) / .element_at_or_default(N) / .aggregate(...)

plan_decs_unrollemit_decs_walk_lane / emit_decs_element_at

Walk lane reads one component per loop iteration; element_at uses cumulative-size short-circuit. Bare .last() / .last_or_default() (no _where / _select / range) over indexable sources take the random-index row below instead of this walk.

from_decs_template(...).last() / .last_or_default(D)bare (no _where / _select / range)

emit_decs_last_random_index

Reads the last non-empty archetype’s [size-1] directly (get_ro(arch, comp, def)[idx]) — O(num_archetypes), no per-entity walk. for_each_archetype visits in order + skips empties, so the last overwrite is the global-last; behavior-identical to the walk lane. Indexable sources only — a [decs_template] field with a default-init compiles to get_default_ro (an iterator), so decs_can_random_index returns false and the chain cascades to the emit_decs_walk_lane row above.

from_decs_template(...).any() / .all(P) / .contains(V)

plan_decs_unrollemit_decs_early_exit

Boolean fast-path; walks until first match or end.

from_decs_template(...).to_array()

plan_decs_unrollemit_decs_to_array

Concatenates archetype slices; one allocation sized by sum(arch.size).

from_decs_template(...)._order_by(K).first() / .first_or_default()

plan_decs_order_family (streaming-min)

Single var best + var seen across archetypes.

from_decs_template(...)._order_by(K).take(N).to_array()

plan_decs_order_family (bounded-heap)

Same heap pattern as the array variant; buffer size N.

from_decs_template(...)._distinct_by(K1)._order_by(K2).take(N).to_array() / ._order_by(K2).distinct().take(N).to_array() (plain distinct() mirror order accepted)

plan_decs_order_family (bounded-heap + set-gate)

Decs mirror of the array-side plan_order_family distinct extension (Theme 3 Phase 3, audit C1/C5). The decs hoisted-buffer bounded-heap path gains the same set-gate inside the per-archetype loop body. Same bail conditions as the array variant: _order_by(K2).distinct_by(K1), distinct[_by] without take, and take(N).distinct[_by]() all cascade.

from_decs_template(...)._order_by(K).take(N)._select(F).to_array()

plan_decs_order_family (terminal _select)

Decs mirror of plan_order_family’s terminal _select — heap holds raw element, projection runs ≤K times at return.

from_decs_template(...).min_by(K) / .max_by(K)

plan_decs_unrollemit_decs_min_max_by

Streaming-min/max with key.

from_decs_template(...)._distinct() / ._distinct_by(K)

plan_decs_distinct

Single-hash set lane mirroring plan_distinct.

from_decs_template(...)._distinct() / ._distinct_by(K) followed by .count(P) / .long_count(P)

plan_decs_distinct (predicate counter)

Decs mirror of the array-side predicate-distinct splice. Same dedup-unconditional / counter-gated-on-P shape across archetypes.

from_decs_template(...).reverse().take(N)[._select(F)].to_array()

plan_decs_reverse (skip-into-tail; extended for terminal _select in PR #2915; boundary random-index added later)

Whole-archetype skip + early-exit. For indexable sources the boundary archetype is random-indexed (get_ro(arch, comp, def)[idx] over [skipsLeft .. size) via build_decs_index_collect) instead of continue-walking its head — O(K) on a single archetype, not O(N). Iterator sources (a [decs_template] field with a default-init → get_default_ro) fall back to the partial-archetype skip-counter walk. When trailing _select(F) is captured (no pre-reverse _where / _select), the K reversed survivors are projected into a separate buffer typed by termsel’s call-result element type — saves the catch-all’s N push_clones + full reverse_inplace + project pass. Bails (cascades to R1-R4) when termsel’s call-result element type is unresolved at macro stage.

from_decs_template(...).reverse()._select(F).first()

plan_decs_reverse (Rb walk-and-overwrite scalar with terminal _select)

Decs mirror of plan_reverse’s Rb walk-and-overwrite scalar. Projection applies to the surviving last value at return.

from_decs_template(...)._group_by(K)._select(reduce).to_array()

pattern group_by_decs (sub-codegen plan_group_by_core)

Shared bucket-reducer with the array path; differs only in the per-element source.

from_decs_template(...)._group_by(K)._select(reduce)._where(P).to_array() / .count()

pattern group_by_decs (sub-codegen plan_group_by_core, trailing where as HAVING)

Decs mirror of the array-side post-aggregate HAVING. Same predicate-on-output-tuple semantics.

from_decs_template(...)._group_by(K)._select(reduce)._order_by(K2).to_array() / ._order_by_descending(K2).to_array()

pattern group_by_decs (sub-codegen plan_group_by_core, trailing order_by as ORDER BY)

Decs mirror of the array-side ORDER BY splice (Theme 3 Phase 2 C2). Shares the same in-place inline-cmp sort tail; only the bucket-fill source differs.

from_decs_template(A)._join(from_decs_template(B), ka, kb, result)._group_by(K)._select(reduce).to_array() / .count()

pattern group_by_decs with upstream_join slot (isDecsJoin adapter; cross-arm — see Decs-decs equi-join)

Theme 3 Phase 1 cross-arm composition. emit_decs_join’s hashB-collect + srcA-probe feeds plan_group_by_core’s bucket update directly — one pass, no intermediate join array. Composes with the C2 trailing order_by extension above when applied to the join+group_by output.

from_decs_template(...)._take_while(P).<...> / ._skip_while(P).<...>

plan_decs_unroll (predicate-driven ranges)

Hoists skippingName state across archetypes.

from_decs_template(...).take(N)._where(P).<terminator> (counter / accumulator / early-exit / array / walk)

plan_decs_unroll (postTakeWhereCond gate in emit_decs_terminator_lane)

Decs mirror of the array-side postTakeWhereCond gate (Theme 2 5c). Take cap ticks unconditionally per element of the per-archetype walk; the trailing where gates only the per-element contribution. Predicate peels against chainInfo.finalBind so it composes with leading _where / _select chains and skip / skip_while / take_while ranges. Lands uniformly across all 6 emit_decs_* paths because the gate wraps spec.perElement once in the shared lane.

Note

Several decs rows above (including this one) label the splice arm as plan_decs_unroll even though, post-PR-C, dispatch flows through the shared plan_loop_or_count pattern row (emit_loop_or_count_laneemit_loop_or_count_lane_decsemit_decs_*). The plan_decs_unroll label is retained for table-row consistency across the pre- and post-unification decs entries.

10.5.1. Decs-decs equi-join

plan_decs_join is the hashed equi-join splice over two from_decs_template sources. It collects the right side into a table<KEY; array<TUPB>> in one for_each_archetype pass, then walks the left side and probes via table.get. The key must be a primitive (int* / uint* / float / double / bool / string); tuple keys cascade to the standard join_impl.

Chain shape

Splice arm

Notes

from_decs_template(A) |> _join(from_decs_template(B), ka, kb, result) |> count()

pattern decs_join_general (emit fn emit_decs_join)

Hash-fill + probe; count bumped by bucket length per hit. No per-pair invoke.

from_decs_template(A) |> _join(...) |> to_array()

pattern decs_join_general (emit fn emit_decs_join)

Hash-fill + probe; result lambda inlined at the push site (no per-pair invoke into join_impl).

from_decs_template(A) |> _join(...) |> _select(F) |> to_array()

pattern decs_join_general (terminal _select)

Single bind of the join result per matched pair, then projection.

from_decs_template(A) |> _join(...) |> _where(P) |> count() / to_array()

pattern decs_join_general (trailing _where)

Bind join result, evaluate predicate, gate count++ / push_clone. Composes with the trailing _select form (filter then project, single bind per pair).

from_decs_template(A) |> _where(P) |> _join(...) (leading _where)

pattern decs_join_general (leading _where slot)

Pre-join filter on srcA, fused into the per-archetype probe as if (P(a)) { <probe> } — no intermediate filtered array. Same shared wrap as the array side (build_join_standalone_pieces).

from_decs_template(A) |> _join(...) |> _group_by(K) |> _select(reduce) |> count() / to_array()

plan_decs_group_by (isDecsJoin adapter, Theme 3 C3)

Cross-arm composition. plan_decs_group_by recognizes a trailing join upstream of group_by_lazy and builds an adapter that emits hashB-collect + srcA-probe + per-pair result-lam bind as the per-element source loop; that bind feeds plan_group_by_core’s tab?[uk] ?? dummy bucket update directly. Single pass, no intermediate join array. v1 constraints: count / to_array terminator only; primitive equi-key (same guard as plan_decs_join); no segments between join and group_by_lazy; HAVING (trailing _where after the reducer _select) defers to v2.

10.5.2. Array-array equi-join

emit_array_join is the array-source mirror of emit_decs_join — hashed equi-join over two array / iterator sources. Algorithm is identical (collect srcb into table<KEY; array<TUPB>> in one pass, then walk srca and probe via table.get) but the lead iteration comes from the adapter (wrap_source_loop / bind_name / invoke_param_type), so any direct-return loop source rides it — ArrayAdapter frames a plain for (elem in src), TableAdapter its pruned slot walk (vs for_each_archetype + build_decs_inner_for on the decs side). Both sources bind as invoke parameters (2-source wrap, mirrors Zip). Same primitive equi-key gate as the decs side; non-primitive keys cascade to join_impl_const. When srcB is a table walked on its bare key, the internal hash is skipped entirely — see the table-source row above and the probe row below.

Chain shape

Splice arm

Notes

arrA |> _join(arrB, on, into) |> count()

pattern array_join_general (emit fn emit_array_join)

Hash-fill + probe; count bumped by bucket length per hit. No per-pair invoke (count-no-where bucket-length fast path).

arrA |> _join(arrB, on, into) (no explicit terminator)

pattern array_join_general (emit fn emit_array_join)

Implicit to_array lane: hash-fill + probe; result lambda inlined at the push site. Note: select’s array overload returns array<...> directly, so the chain types as an array without a trailing to_array() call.

arrA |> _join(arrB, ...) |> _select(F) or with trailing |> to_array()

pattern array_join_general (terminal _select)

Single bind of the join result per matched pair, then projection. resultType extracted from the select lambda’s body type (not from selCall._type.firstType, which may stay as an unresolved typedecl(...) when no to_array() forces resolution).

arrA |> _join(arrB, ...) |> _where(P) |> count() / to_array()

pattern array_join_general (trailing _where)

Bind join result, evaluate predicate, gate count++ / push_clone. Composes with the trailing _select form (filter then project, single bind per pair).

arrA |> _where(P) |> _join(arrB, ...) (leading _where)

pattern array_join_general (leading _where slot)

Pre-join filter on srcA, fused into the probe loop as if (P(a)) { <probe> } — no intermediate filtered-srcA array (vs. the tier-2 fallback, which materializes one). The optional lead_where slot precedes the join slot; a where after join is the separate trailing slot. Composes with the trailing _where / _select forms. Wrapping lives in the shared build_join_standalone_pieces, so decs / XML / JSON inherit it.

arrA |> _join(unsafe(each_kv(tab)), <on a == d.key>, ...) (or keys(set) with a bare-element key; any terminator/where/select form above)

probe mode (join_srcb_table_call + join_keyb_is_bare_keybuild_join_probe_pieces)

Table-srcB probe: the b-key selector IS the table key, so no hash and no build loop — srcB binds the user’s table itself (const param) and the per-A probe is a key lookup. Unique table keys ⇒ bucket ≤ 1 ⇒ probe ≡ hash semantics exactly (b-key is a bare field read, so skipping its per-B evaluation is unobservable). Usage-pruned like the point-lookup fold: count-no-where / key-only shapes probe key_exists (value never touched), value shapes bind by reference from tab?[k], a whole-pair use binds the kv tuple. Non-bare b-keys and group_join keep the hashed build over the kv iterator. Composes with every lead the emitter serves (array lead, table lead — table×table probes both sides).

unsafe(each_kv(tabA)) |> _join(srcB, on, into) |> ... (table lead; keys / values lanes too)

pattern join_generalTableAdapter.emit_join_hookemit_array_join

Table lead: same emitter, lead loop framed by TableAdapter.wrap_source_loop — the kv usage-pruner sees the whole probe body (key lambda + result + trailing where/select), so a join touching only c.value.* walks values(tab) alone. All srcB modes compose (hashed array/iterator srcB, table-srcB probe); group_join stays outer over every slot.

unsafe(each_kv(tab)) |> _group_by(K) |> _select(reduce) |> ... (keys / values lanes too; having / trailing where / order_by / count compose)

pattern group_byTableAdapter.build_group_by_adapterplan_group_by_core

Table lead group_by: build_group_by_adapter hands the planner a fresh TableAdapter, so the bucket-fill loop is framed by wrap_source_loop and the kv usage-pruner sees the whole accumulation body (key expr + reducer updates + upstream where/select segments) — a group key over kv.value.brand walks values(tab) alone. join |> group_by over a table lead declines (the upstream-join arm returns null) and cascades to tier-2.

arrA |> _group_join(arrB, on, into) (+ optional leading _where)

pattern join_general with the group_join literal (isGroupJoin)

C# GroupJoin (outer): one result row per srcA row — result(a, bucket) pushed once (no per-pair loop), plus if (!get(...)) { var empty; push result(a, empty) } so an unmatched srcA still surfaces (empty group). The join slot matcher is one_of ["join", "group_join"]; isGroupJoin threads through build_join_standalone_pieces, which rebinds the result lambda’s 2nd param to the whole bucket (array<TUPB>) so the per-group aggregate runs inside the result. Array / table leads only — decs / XML / JSON group joins defer to tier-2 (their emit_join_hook returns null for group_join); a trailing where / select / count over the group rows also defers, and a table srcB keeps the hashed build (the probe never serves group joins).

arrA |> _join(arrB, ...) |> _group_by(K) |> _select(reduce) |> count() / to_array()

plan_group_by_core via SourceAdapter.ArrayJoin (chunk N+2)

Cross-arm composition. emit_group_by’s Array branch recognizes a trailing join upstream of group_by_lazy and builds an ArrayJoin adapter; plan_group_by_core consumes it via adapter_wrap_source_loop’s ArrayJoin branch (plain for loops in lieu of for_each_archetype). Same v1 constraints as the decs-side cross-arm: primitive equi-key, no segments between join and group_by_lazy, HAVING defers to v2.

10.6. XML-source patterns

An unsafe(from_xml_node(node[, name], type<Row>)) source folds through XmlAdapter (modules/dasPUGIXML/daslib/linq_fold_xml.das), loaded only when the pugixml module is linked. Unlike a hard-coded source row, the adapter rides every pattern row the array / decs planners expose — try_splice_patterns runs with no onlyRow restriction, and per-row requires predicates plus the adapter’s capability hooks decide what fuses. Three mechanics make the emitted loop differ from the array and decs lanes.

Single flat DOM walk, forward-only. The adapter emits one while over the node’s child elements (first_child / next_sibling, or child(node, name) for the 3-arg named overload), like the array lane and unlike decs’s two-level archetype walk. But an XML node has no random index, so XML matches the non_array_source rows (e.g. the R-2b forward keep-last reverse-distinct) and is excluded from the array_source-only rows (Row 5 buffer_helper_dispatch direct-helper, R-2a backward-index reverse-distinct). Bare order_by / order therefore cascades to the fused_prefilter row (materialized buffer), not the direct daslib helper — the same fall-through decs takes.

Field-pruning (pass 2b). Before emitting the per-element body the chain is scanned (XmlRowUsageScanner) for the Row fields it actually reads. Only those attributes are read — each via read_xml_field into a scalar local, with the body’s it.<field> rewritten to that local and the Row struct dropped entirely. Unread fields are never touched, so a chain that reads only numeric fields runs alloc-free (the per-string clone_string is the materialization cost). Three outcomes:

  • Pruned — body reads only it.<field> scalars: one let xf_<f> = read_xml_field(...) per referenced field, struct dropped.

  • Whole-row escape — body references the bind it as a whole value (to_array, identity _select(_), pass-to-user-fn): falls back to the full build_xml_row.

  • Guarded escape — the whole row escapes only inside a bare if (cond) { }: the predicate’s fields are read cheaply via peek_xml_field (borrowed string#, no clone) and the full build_xml_row runs only for matching elements.

Deferred materialization. A buffered reducer (order / take, distinct_by) holds (key, xml_node) handle surrogates rather than built rows, and runs build_xml_row only for the K survivors at return — defers_materialization() is true, current_handle_expr is the per-element xcur node, and materialize_handle emits the deferred build_xml_row. A 1000-element document feeding order_by(K).take(10) builds 10 rows, not 1000.

Chain shape (XML source)

Splice arm

Notes

…<terminator> with _where / _select (count / long_count / sum / average / min / max / any / all / contains / first / last / single / element_at / take / take_while / skip_while / to_array)

loop_or_count_general (XmlAdapter swap)

The base lane — the same emit archetypes as the array side, over the field-pruned DOM walk. take / where / select fuse into the body.

._order_by(K).first() / .first_or_default()

plan_order_family (streaming-min) + deferral

One handle held in var best; build_xml_row runs once at return.

._order_by(K).take(N).to_array() / …._select(F).to_array()

plan_order_family (bounded-heap) + deferral

Heap of N (key, node) surrogates; build_xml_row (and any terminal _select) runs ≤N times at return.

._order_by(K).to_array() / ._where(P)._order_by(K).to_array()

plan_order_family (fused_prefilter) + deferral

array_source Row 5 excludes XML, so bare order cascades to the materialized-buffer row — the buffer holds node handles, rows built for survivors only.

._distinct_by(K).to_array() / ._distinct_by(K)._order_by(K2)…

plan_distinct / plan_order_family + deferral

Dedup set over the key; the kept slot stores the node handle (distinct_by defers; plain distinct over the whole row materializes per element).

.reverse()._distinct[_by](K).to_array()

plan_reverse R-2b (non_array_source) → emit_reverse_distinct_forward_keeplast

Forward keep-last table-overwrite (no backward index); the slot stores the xml_node handle, build_xml_row (field-pruned to the key) runs for the K survivors. The one row shared by decs / iterators / XML.

.reverse().take(N) [._select(F)].to_array()

plan_reverse skip-into-tail → XmlAdapter.emit_reverse_skip_into_tail

Backward DOM walk (last_child / previous_sibling, both O(1) in pugixml): collects only the last N element children — already in reverse order, so no reverse_inplace and no full forward buffer of all N handles. The forward-source analog of the array R6 backward-index walk. Profiled win: m5f reverse_take 88.9 → 0.0 ns/op. The named 3-arg from_xml_node(root, "tag", …) form has no last-named-child primitive, so it falls back to the buffer-all path (emit_reverse_buffer_inplace deferred materialize).

bare .last() / .last_or_default(d) / .reverse().first[_or_default]() (no where / range)

emit_early_exit_lane last branch / emit_reverse_walk_overwrite_scalar (Rb) → XmlAdapter.emit_reverse_last_backward

The last element is the first the backward walk reaches: one last_child step, build that row, return — no forward scan. Pre-/post-reverse _select projects the single survivor. Predicated ``[where] |> last`` deliberately keeps the forward walk — reverse DOM traversal is ~2× cache-hostile per node (profiled), so a match far from the end would regress. (Pre-existing, orthogonal: a row struct with a default-bearing field routes bare last() to tier-2 linq.das before reaching this hook.)

from_xml_node(…) |> _join(arrB, ka, kb, result) (+ optional leading / trailing _where, trailing _select; count / to_array / iterator)

pattern join_generalXmlAdapter.emit_join_hook

Hashed equi-join: srcB (an in-memory array) collected into table<KEY; array<TUPB>>, probed from the field-pruned DOM walk. Primitive equi-key only. Mirrors emit_array_join with srcA = the XML node.

from_xml_node(…) |> _join(arrB, …) |> _group_by(K) |> _select(reduce) |> to_array() / .count()

plan_group_by_core via XmlJoinAdapter

Cross-arm: the join’s per-pair result feeds the bucket update directly — one pass, no intermediate join array. Same v1 constraints as the array / decs cross-arm (primitive key, no segments between join and group_by, HAVING defers to v2).

from_xml_node(…) |> _group_by(K) |> _select(reduce) |> to_array() (+ HAVING / trailing order_by)

plan_group_by_core (XmlAdapter via build_group_by_adapter)

Per-key bucket reducer over the DOM walk; shares the array path’s reducer dispatch and the trailing HAVING / ORDER BY extensions.

source |> _select(f) |> <order/distinct/take>

leading-_select absorption (ProjectedSourceAdapter wrap)

The leading projection is absorbed into the source walk and field-pruning is preservedf’s it.<field> reads still reach the materializer (see Pre-dispatch normalizations).

Defers to tier-2 (the XmlAdapter hook returns null, so the chain cascades):

  • Group join (_group_join / join into) — emit_join_hook returns null for the group_join literal.

  • Non-primitive join keys / non-array srcB — the same gate as the array / decs join (tuple keys cascade to join_impl).

  • Correlated nested-collection flatten (from o from l in o.lines) — from_xml_node reads scalar attributes only; there is no nested collection to flatten.

  • Mixed-source operators (union / except / intersect / concat) — fall back exactly as for array / decs sources.

10.7. Zip patterns

Chain shape

Splice arm

Notes

zip(a, b)._select(F).sum() / .count() / .average()

pattern zip_general (emit fn emit_zip)

Fuses to a single index-loop over the shorter side.

zip(a, b, c)._select(F).<terminator>

pattern zip_general (emit fn emit_zip)

Three-source zip; same loop shape with three reads per iteration.

zip(a, b, sel).<terminator> (3-arg, with selector lambda)

pattern zip_general (3-arg pre-lowered)

Theme 1 (audit 7a). The 3-arg form zip(a, b, sel) is pre-lowered by plan_zip to zip(a, b) |> _select(sel-as-tuple) before per-arm matching, so the standard zip+select fusion fires — the natural zip(xs, ys, $(x, y) => x * y) |> sum() dot-product idiom splices instead of cascading.

zip(a, b)._where(P)._select(F).<terminator>

pattern zip_general (chain ops via head c_chain + range slots)

where / select / take / skip / take_while / skip_while between zip and the terminator are all fused.

zip(a, b).first() / .first_or_default() / .aggregate(...)

pattern zip_general (early-exit / accumulator lanes delegate to emit_early_exit_lane / emit_accumulator_lane)

Early-exit terminator on the zipped pair.

zip(a, b)._select(F).count(P) / .long_count(P)

pattern zip_general (counter with separate predicate gate)

The 2-arg count(P) / long_count(P) form is captured into a dedicated counter-predicate gate emitted around acc++ inside the upstream where/select wrap, so eager where(W).select(F).count(P) ordering is preserved (W filters first, then F runs once per surviving element, then P decides whether to count). With _select, the predicate peels against the projected value via a vproj bind. Length-shortcut is suppressed when P is present (the counter loop runs).

zip(a, b)[._select(F)|._where(P)|...].reverse().<terminator>

pattern zip_general (trailing reverse slot)

Theme 8 (audit C4). reverse accepted as the last chain op between zip’s chain and the terminator. Array lane emits _::reverse_inplace($i(bufName)) before return; counter / accumulator (sum/min/max/avg) / any / all / contains lanes treat reverse as a no-op (mathematical identity). Bails (cascades) on first / first_or_default (NOT identity under reverse) and when reverse is not the last chain op (anything after would see the reversed stream and change semantics vs cascade).

10.8. What falls back

The default path (fold_linq_default) fires when none of the plan_* arms recognize the chain. This is the standard linq surface — iterators materialize, callbacks fire, and there is no fusion.

Common cases that fall back:

  • Mixed-source operators like union(a, b), except(a, b), intersect(a, b), concat(a, b) after the first source has been transformed (e.g. each(a)._select(F).union(b)).

  • Joins other than decs-decs equi-join: _left_join / _right_join / _full_outer_join / _cross_join don’t splice; array-source _join also falls back. Only the decs-decs primitive-key _join shape catalogued above splices (via plan_decs_join); tuple keys, non-primitive keys, mixed array/decs sources, or chain ops beyond a single trailing _where / _select all cascade to join_impl.

  • Aggregations on lazy groupings: _group_by_lazy(K)._select(F) with a non-bucket-reducing _select.

  • Selector-based ``to_table(key, elementSelector)`` — the 3-arg form keeps its tier-2 generic; only the selector-free to_table() terminator splices (see the table-buffer materializer row above).

  • Chained ``_select(f) |> _select(g)`` with an impure inner (_ % N, _ / N, user-call inner that the typer can’t prove pure). The collapse_chained_selects pre-pass is gated on !has_sideeffects(innerBody) because collapsing would shift evaluation count when outer references its param zero or many times. Pure inner (_._field, _ + K, _ * K, etc.) collapses transparently and the downstream planner sees a single _select.

When a chain falls back, behavior is identical to writing the same chain without _fold — correct, but not splice-fused.

10.8.1. Decs-bridge fall-off diagnostic

When a from_decs_template source survives _fold dispatch without any tier-1 planner (decs or array-side) claiming it, the bridge materializes into a temp res array before fold_linq_default runs on top — an EXTRA allocation beyond whatever cascade follows. LinqFold.visit detects this case right before falling through to fold_linq_default: it destructures flatten_linq(call.arguments[0]) into (top, calls) and fires only when calls is non-empty (a real cascade is about to run) and extract_decs_bridge(top) is non-null (the source IS a from_decs_template bridge). Bare _fold(from_decs_template(...)) with no chain ops is skipped — there’s no cascade, just the bridge’s own materialization. When fired, a *warning* goes to the compiler log naming the call site:

user.das:42:8: *warning* `_fold`: from_decs_template source
survived dispatch — no `plan_decs_*` arm claimed this chain, so
the bridge materializes a temp `res` array and the tier-2 cascade
runs on the materialized buffer. Rewrite the chain to a
recognized decs shape (see
doc/source/reference/linq_fold_patterns.rst), or suppress with
`options _no_linq_perf_warn = true`.

The fix is usually to reorder ops so the chain matches a row in the Decs section above (e.g. push _select past _skip_while / _take_while since their predicates run on the source tuple, not the projected value). Suppress per file with options _no_linq_perf_warn = true for tests that intentionally exercise cascade behavior as regression guards.

10.9. See also