Model to use (model ID, alias, or path to .gguf file)
OptionalgpuGPU layers to offload (-1 = all, 0 = CPU only)
OptionalcontextContext size override
OptionalhuggingHuggingFace access token for gated models (like Gemma 3) Can also be set via HF_TOKEN environment variable
OptionalenableEnable thinking/reasoning mode for models that support it (Qwen3, DeepSeek R1)
Options for engine initialization