.. _tutorial_dasLLAMA_add_an_arch: ======================================= dasLLAMA-06 — The Architecture Registry ======================================= .. index:: single: Tutorial; dasLLAMA single: Tutorial; Architecture Registry How does one ``load_model`` run Llama, Qwen, Phi, Gemma, and gpt-oss? Through the architecture registry. This tutorial needs no model file and no ``-jit`` — it's about the registry, not inference:: daslang.exe tutorials/dasLLAMA/06_add_an_arch.das The registry ============ Requiring ``dasllama/dasllama`` registers every supported architecture (each ``dasllama_arch_*.das`` self-registers at ``[init]``). ``load_model`` reads the GGUF's ``general.architecture`` string and looks it up — an unknown arch panics with the registered list, better than feeding weights through the wrong path. .. code-block:: das let names <- arch_names() print("{length(names)} registered architectures: {join(names, ", ")}\n") // -> 9 registered architectures: qwen2, qwen2moe, llama, gemma2, gemma3, ... Anatomy of an ArchDesc ====================== An architecture is data, not a fork of the engine: ``configure`` A function setting ``Config`` flags — rope style, extra norms, sliding-window patterns, MoE shape. It runs before dim/vocab are known. The real qwen3 sets ``rope_neox`` + ``qk_norm`` and nothing else. ``blocks`` The forward-pass kernels. ``std_blocks()`` is the standard attention + FFN pair; MoE (qwen2moe) and gpt-oss swap in their own. ``chat`` The chat template as data: per-turn part lists (``chat_special`` / ``chat_text`` / ``chat_content``) plus stop tokens. Most new model families are just ``Config`` flags over ``std_blocks`` — that is the Tier-1/Tier-2 design the registry exists for. Register your own ================= Suppose a llama-shaped model ships with an architecture string the registry doesn't know. Mapping it onto the existing machinery takes no engine change: .. code-block:: das def private configure_my_llama(var c : Config) { c.rope_neox = true } var d = ArchDesc(name = "my_llama", configure = @@configure_my_llama) d.blocks = std_blocks() d.chat.add_bos = true d.chat.user.parts <- [chat_special("<|im_start|>"), chat_text("user\n"), chat_content(), chat_special("<|im_end|>"), chat_text("\n")] d.chat.assistant_open.parts <- [chat_special("<|im_start|>"), chat_text("assistant\n")] d.chat.stop <- ["<|im_end|>"] register_arch("my_llama", d) From here, ``load_model`` on a GGUF whose ``general.architecture`` says ``my_llama`` would configure, load, and chat through this descriptor. Adding a real architecture to the module ======================================== The in-tree recipe (see ``modules/dasLLAMA/API_REWORK.md``'s model-support waves for worked examples, from Qwen3's two flags to gpt-oss's MoE-with-sinks): 1. ``modules/dasLLAMA/dasllama/dasllama_arch_.das`` — configure + blocks + chat template, self-registered in an ``[init]`` function. 2. ``require`` it from the ``dasllama`` umbrella; add it to the module's CMake AOT file lists and ``.das_module`` descriptor. 3. Validate token-for-token against llama.cpp on a real GGUF before trusting any output — greedy continuations must match id-for-id. 4. Registry-count tests (``tests/dasLLAMA/test_arch_registry.das``) hardcode the architecture count — bump them. .. seealso:: Full source: :download:`tutorials/dasLLAMA/06_add_an_arch.das <../../../../tutorials/dasLLAMA/06_add_an_arch.das>`