8.7.6. dasLLAMA-06 — The Architecture Registry

How does one load_model run Llama, Qwen, Phi, Gemma, and gpt-oss? Through the architecture registry. This tutorial needs no model file and no -jit — it’s about the registry, not inference:

daslang.exe tutorials/dasLLAMA/06_add_an_arch.das

8.7.6.1. The registry

Requiring dasllama/dasllama registers every supported architecture (each dasllama_arch_*.das self-registers at [init]). load_model reads the GGUF’s general.architecture string and looks it up — an unknown arch panics with the registered list, better than feeding weights through the wrong path.

let names <- arch_names()
print("{length(names)} registered architectures: {join(names, ", ")}\n")
// -> 9 registered architectures: qwen2, qwen2moe, llama, gemma2, gemma3, ...

8.7.6.2. Anatomy of an ArchDesc

An architecture is data, not a fork of the engine:

configure

A function setting Config flags — rope style, extra norms, sliding-window patterns, MoE shape. It runs before dim/vocab are known. The real qwen3 sets rope_neox + qk_norm and nothing else.

blocks

The forward-pass kernels. std_blocks() is the standard attention + FFN pair; MoE (qwen2moe) and gpt-oss swap in their own.

chat

The chat template as data: per-turn part lists (chat_special / chat_text / chat_content) plus stop tokens.

Most new model families are just Config flags over std_blocks — that is the Tier-1/Tier-2 design the registry exists for.

8.7.6.3. Register your own

Suppose a llama-shaped model ships with an architecture string the registry doesn’t know. Mapping it onto the existing machinery takes no engine change:

def private configure_my_llama(var c : Config) {
    c.rope_neox = true
}

var d = ArchDesc(name = "my_llama", configure = @@configure_my_llama)
d.blocks = std_blocks()
d.chat.add_bos = true
d.chat.user.parts <- [chat_special("<|im_start|>"), chat_text("user\n"),
    chat_content(), chat_special("<|im_end|>"), chat_text("\n")]
d.chat.assistant_open.parts <- [chat_special("<|im_start|>"), chat_text("assistant\n")]
d.chat.stop <- ["<|im_end|>"]
register_arch("my_llama", d)

From here, load_model on a GGUF whose general.architecture says my_llama would configure, load, and chat through this descriptor.

8.7.6.4. Adding a real architecture to the module

The in-tree recipe (see modules/dasLLAMA/API_REWORK.md’s model-support waves for worked examples, from Qwen3’s two flags to gpt-oss’s MoE-with-sinks):

  1. modules/dasLLAMA/dasllama/dasllama_arch_<name>.das — configure + blocks + chat template, self-registered in an [init] function.

  2. require it from the dasllama umbrella; add it to the module’s CMake AOT file lists and .das_module descriptor.

  3. Validate token-for-token against llama.cpp on a real GGUF before trusting any output — greedy continuations must match id-for-id.

  4. Registry-count tests (tests/dasLLAMA/test_arch_registry.das) hardcode the architecture count — bump them.