8.7.6. dasLLAMA-06 — The Architecture Registry
How does one load_model run Llama, Qwen, Phi, Gemma, and gpt-oss? Through
the architecture registry. This tutorial needs no model file and no -jit —
it’s about the registry, not inference:
daslang.exe tutorials/dasLLAMA/06_add_an_arch.das
8.7.6.1. The registry
Requiring dasllama/dasllama registers every supported architecture (each
dasllama_arch_*.das self-registers at [init]). load_model reads
the GGUF’s general.architecture string and looks it up — an unknown arch
panics with the registered list, better than feeding weights through the
wrong path.
let names <- arch_names()
print("{length(names)} registered architectures: {join(names, ", ")}\n")
// -> 9 registered architectures: qwen2, qwen2moe, llama, gemma2, gemma3, ...
8.7.6.2. Anatomy of an ArchDesc
An architecture is data, not a fork of the engine:
configureA function setting
Configflags — rope style, extra norms, sliding-window patterns, MoE shape. It runs before dim/vocab are known. The real qwen3 setsrope_neox+qk_normand nothing else.blocksThe forward-pass kernels.
std_blocks()is the standard attention + FFN pair; MoE (qwen2moe) and gpt-oss swap in their own.chatThe chat template as data: per-turn part lists (
chat_special/chat_text/chat_content) plus stop tokens.
Most new model families are just Config flags over std_blocks — that
is the Tier-1/Tier-2 design the registry exists for.
8.7.6.3. Register your own
Suppose a llama-shaped model ships with an architecture string the registry doesn’t know. Mapping it onto the existing machinery takes no engine change:
def private configure_my_llama(var c : Config) {
c.rope_neox = true
}
var d = ArchDesc(name = "my_llama", configure = @@configure_my_llama)
d.blocks = std_blocks()
d.chat.add_bos = true
d.chat.user.parts <- [chat_special("<|im_start|>"), chat_text("user\n"),
chat_content(), chat_special("<|im_end|>"), chat_text("\n")]
d.chat.assistant_open.parts <- [chat_special("<|im_start|>"), chat_text("assistant\n")]
d.chat.stop <- ["<|im_end|>"]
register_arch("my_llama", d)
From here, load_model on a GGUF whose general.architecture says
my_llama would configure, load, and chat through this descriptor.
8.7.6.4. Adding a real architecture to the module
The in-tree recipe (see modules/dasLLAMA/API_REWORK.md’s model-support
waves for worked examples, from Qwen3’s two flags to gpt-oss’s
MoE-with-sinks):
modules/dasLLAMA/dasllama/dasllama_arch_<name>.das— configure + blocks + chat template, self-registered in an[init]function.requireit from thedasllamaumbrella; add it to the module’s CMake AOT file lists and.das_moduledescriptor.Validate token-for-token against llama.cpp on a real GGUF before trusting any output — greedy continuations must match id-for-id.
Registry-count tests (
tests/dasLLAMA/test_arch_registry.das) hardcode the architecture count — bump them.
See also
Full source: tutorials/dasLLAMA/06_add_an_arch.das