Class LLMEngine

Native LLM Engine

Provides text generation using llama.cpp with Metal GPU acceleration.

Index

Constructors

constructor

Methods

isAvailable initialize generate generateStreaming chat getModelInfo resetSession dispose listModels getModelForUseCase

Constructors

constructor

new LLMEngine(options: EngineOptions): LLMEngine
Parameters
- options: EngineOptions
Returns LLMEngine
- Defined in engine.ts:44

Methods

isAvailable

isAvailable(): boolean
Check if we're running on a supported platform

Returns boolean
- Defined in engine.ts:66

initialize

initialize(): Promise<void>
Initialize the engine and load the model

Downloads the model from HuggingFace if not cached locally. Uses Metal GPU acceleration on Apple Silicon.

Returns Promise<void>
Throws
Error if model download or loading fails
Example
```
const engine = new LLMEngine({ model: "gemma-3n-e4b" })
await engine.initialize()
```
- Defined in engine.ts:138

generate

generate(options: GenerateOptions): Promise<GenerateResult>
Generate text from a prompt

Automatically initializes the engine if not already done. For thinking-mode models (Qwen3, DeepSeek), applies appropriate settings.
Parameters
- options: GenerateOptions
  Generation options including prompt, maxTokens, temperature
Returns Promise<GenerateResult>
Generation result with text, token counts, and performance metrics
Example
```
const result = await engine.generate({
  prompt: "Explain quantum computing",
  maxTokens: 200,
  temperature: 0.7
})
console.log(result.text)
console.log(`${result.tokensPerSecond.toFixed(1)} tok/s`)
```
- Defined in engine.ts:201

generateStreaming

generateStreaming(
options: GenerateOptions,
onToken: TokenCallback,
): Promise<GenerateResult>
Generate text with streaming token-by-token output

Same as generate() but calls onToken for each generated token, enabling real-time display of responses.
Parameters
- options: GenerateOptions
  Generation options including prompt, maxTokens, temperature
- onToken: TokenCallback
  Callback invoked for each generated token
Returns Promise<GenerateResult>
Generation result with text, token counts, and performance metrics
Example
```
const result = await engine.generateStreaming(
  { prompt: "Write a haiku" },
  (token) => process.stdout.write(token)
)
```
- Defined in engine.ts:267

chat

chat(
messages: { role: "system" | "user" | "assistant"; content: string }[],
options?: Omit<GenerateOptions, "prompt" | "systemPrompt">,
): Promise<GenerateResult>
Generate text using chat message format

Supports multi-turn conversations with system, user, and assistant messages. Automatically manages chat history within the session.
Parameters
- messages: { role: "system" | "user" | "assistant"; content: string }[]
  Array of chat messages with role and content
- Optionaloptions: Omit<GenerateOptions, "prompt" | "systemPrompt">
  Optional generation options (maxTokens, temperature, etc.)
Returns Promise<GenerateResult>
Generation result with assistant's response
Example
```
const result = await engine.chat([
  { role: "system", content: "You are a helpful assistant." },
  { role: "user", content: "What is 2+2?" }
])
console.log(result.text) // "4"
```
- Defined in engine.ts:336

getModelInfo

getModelInfo(): | {
    name: "Gemma 3n E2B";
    repo: "unsloth/gemma-3n-E2B-it-GGUF";
    file: "gemma-3n-E2B-it-Q4_K_M.gguf";
    parameters: "5B→2B";
    quantization: "Q4_K_M";
    contextLength: 32768;
    languages: readonly [
        "en",
        "de",
        "fr",
        "es",
        "it",
        "pt",
        "nl",
        "pl",
        "ru",
        "ja",
        "ko",
        "zh",
    ];
    description: "Ultra-efficient edge model, ~2GB RAM";
    requiresAuth: false;
    benchmarks: { mmlu: 64; arena: 1250 };
}
| {
    name: "Gemma 3n E4B";
    repo: "unsloth/gemma-3n-E4B-it-GGUF";
    file: "gemma-3n-E4B-it-Q4_K_M.gguf";
    parameters: "8B→4B";
    quantization: "Q4_K_M";
    contextLength: 32768;
    languages: readonly [
        "en",
        "de",
        "fr",
        "es",
        "it",
        "pt",
        "nl",
        "pl",
        "ru",
        "ja",
        "ko",
        "zh",
    ];
    description: "Best edge model, ~3GB RAM";
    requiresAuth: false;
    benchmarks: { mmlu: 75; arena: 1300 };
}
| {
    name: "Gemma 3 27B";
    repo: "unsloth/gemma-3-27b-it-GGUF";
    file: "gemma-3-27b-it-Q4_K_M.gguf";
    parameters: "27B";
    quantization: "Q4_K_M";
    contextLength: 131072;
    languages: readonly [
        "en",
        "de",
        "fr",
        "es",
        "it",
        "pt",
        "nl",
        "pl",
        "ru",
        "ja",
        "ko",
        "zh",
    ];
    description: "Maximum quality, 128K context, ~18GB RAM";
    benchmarks: { mmlu: 77; arena: 1338 };
}
| {
    name: "GPT-OSS 20B";
    repo: "unsloth/gpt-oss-20b-GGUF";
    file: "gpt-oss-20b-Q4_K_M.gguf";
    parameters: "21B (3.6B active)";
    quantization: "Q4_K_M";
    contextLength: 131072;
    languages: readonly ["en"];
    description: "OpenAI's open model, MoE, ~16GB RAM";
    benchmarks: { mmlu: 82; arena: 1340 };
}
| {
    name: "Phi-4 14B";
    repo: "bartowski/phi-4-GGUF";
    file: "phi-4-Q4_K_M.gguf";
    parameters: "14B";
    quantization: "Q4_K_M";
    contextLength: 16384;
    languages: readonly ["en"];
    description: "Microsoft's reasoning-focused, excellent for STEM";
    benchmarks: { mmlu: 84; arena: 1320 };
}
| {
    name: "Qwen3 4B";
    repo: "unsloth/Qwen3-4B-GGUF";
    file: "Qwen3-4B-Q4_K_M.gguf";
    parameters: "4B";
    quantization: "Q4_K_M";
    contextLength: 32768;
    languages: readonly [
        "en",
        "zh",
        "de",
        "fr",
        "es",
        "pt",
        "it",
        "nl",
        "pl",
        "ru",
        "ja",
        "ko",
    ];
    description: "Thinking mode, 100+ languages, ~3GB RAM";
    thinkingMode: "qwen";
    benchmarks: { mmlu: 76; arena: 1300 };
}
| {
    name: "Qwen3 8B";
    repo: "unsloth/Qwen3-8B-GGUF";
    file: "Qwen3-8B-Q4_K_M.gguf";
    parameters: "8B";
    quantization: "Q4_K_M";
    contextLength: 32768;
    languages: readonly [
        "en",
        "zh",
        "de",
        "fr",
        "es",
        "pt",
        "it",
        "nl",
        "pl",
        "ru",
        "ja",
        "ko",
    ];
    description: "Thinking mode, excellent multilingual, ~5GB RAM";
    thinkingMode: "qwen";
    benchmarks: { mmlu: 81; arena: 1350 };
}
| {
    name: "Qwen3 14B";
    repo: "unsloth/Qwen3-14B-GGUF";
    file: "Qwen3-14B-Q4_K_M.gguf";
    parameters: "14B";
    quantization: "Q4_K_M";
    contextLength: 32768;
    languages: readonly [
        "en",
        "zh",
        "de",
        "fr",
        "es",
        "pt",
        "it",
        "nl",
        "pl",
        "ru",
        "ja",
        "ko",
    ];
    description: "Thinking mode, top multilingual, ~9GB RAM";
    thinkingMode: "qwen";
    benchmarks: { mmlu: 84; arena: 1380 };
}
| {
    name: "Qwen 2.5 Coder 7B";
    repo: "bartowski/Qwen2.5-Coder-7B-Instruct-GGUF";
    file: "Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf";
    parameters: "7B";
    quantization: "Q4_K_M";
    contextLength: 131072;
    languages: readonly ["en"];
    description: "Optimized for code generation";
    benchmarks: { mmlu: 66; arena: 1250 };
}
| {
    name: "DeepSeek R1 Distill 7B";
    repo: "bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF";
    file: "DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf";
    parameters: "7B";
    quantization: "Q4_K_M";
    contextLength: 131072;
    languages: readonly ["en", "zh"];
    description: "Strong reasoning with chain-of-thought";
    thinkingMode: "deepseek";
    benchmarks: { mmlu: 72; arena: 1300 };
}
| {
    name: "DeepSeek R1 Distill 14B";
    repo: "bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF";
    file: "DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf";
    parameters: "14B";
    quantization: "Q4_K_M";
    contextLength: 131072;
    languages: readonly ["en", "zh"];
    description: "Best reasoning model, shows thinking";
    thinkingMode: "deepseek";
    benchmarks: { mmlu: 79; arena: 1350 };
}
| {
    name: string;
    repo: string;
    file: string;
    parameters: string;
    quantization: string;
    contextLength: number;
    languages: string[];
    description: string;
}
Get information about the current model

Returns model metadata including name, parameters, context length, supported languages, and benchmark scores.

Returns
    | {
        name: "Gemma 3n E2B";
        repo: "unsloth/gemma-3n-E2B-it-GGUF";
        file: "gemma-3n-E2B-it-Q4_K_M.gguf";
        parameters: "5B→2B";
        quantization: "Q4_K_M";
        contextLength: 32768;
        languages: readonly [
            "en",
            "de",
            "fr",
            "es",
            "it",
            "pt",
            "nl",
            "pl",
            "ru",
            "ja",
            "ko",
            "zh",
        ];
        description: "Ultra-efficient edge model, ~2GB RAM";
        requiresAuth: false;
        benchmarks: { mmlu: 64; arena: 1250 };
    }
    | {
        name: "Gemma 3n E4B";
        repo: "unsloth/gemma-3n-E4B-it-GGUF";
        file: "gemma-3n-E4B-it-Q4_K_M.gguf";
        parameters: "8B→4B";
        quantization: "Q4_K_M";
        contextLength: 32768;
        languages: readonly [
            "en",
            "de",
            "fr",
            "es",
            "it",
            "pt",
            "nl",
            "pl",
            "ru",
            "ja",
            "ko",
            "zh",
        ];
        description: "Best edge model, ~3GB RAM";
        requiresAuth: false;
        benchmarks: { mmlu: 75; arena: 1300 };
    }
    | {
        name: "Gemma 3 27B";
        repo: "unsloth/gemma-3-27b-it-GGUF";
        file: "gemma-3-27b-it-Q4_K_M.gguf";
        parameters: "27B";
        quantization: "Q4_K_M";
        contextLength: 131072;
        languages: readonly [
            "en",
            "de",
            "fr",
            "es",
            "it",
            "pt",
            "nl",
            "pl",
            "ru",
            "ja",
            "ko",
            "zh",
        ];
        description: "Maximum quality, 128K context, ~18GB RAM";
        benchmarks: { mmlu: 77; arena: 1338 };
    }
    | {
        name: "GPT-OSS 20B";
        repo: "unsloth/gpt-oss-20b-GGUF";
        file: "gpt-oss-20b-Q4_K_M.gguf";
        parameters: "21B (3.6B active)";
        quantization: "Q4_K_M";
        contextLength: 131072;
        languages: readonly ["en"];
        description: "OpenAI's open model, MoE, ~16GB RAM";
        benchmarks: { mmlu: 82; arena: 1340 };
    }
    | {
        name: "Phi-4 14B";
        repo: "bartowski/phi-4-GGUF";
        file: "phi-4-Q4_K_M.gguf";
        parameters: "14B";
        quantization: "Q4_K_M";
        contextLength: 16384;
        languages: readonly ["en"];
        description: "Microsoft's reasoning-focused, excellent for STEM";
        benchmarks: { mmlu: 84; arena: 1320 };
    }
    | {
        name: "Qwen3 4B";
        repo: "unsloth/Qwen3-4B-GGUF";
        file: "Qwen3-4B-Q4_K_M.gguf";
        parameters: "4B";
        quantization: "Q4_K_M";
        contextLength: 32768;
        languages: readonly [
            "en",
            "zh",
            "de",
            "fr",
            "es",
            "pt",
            "it",
            "nl",
            "pl",
            "ru",
            "ja",
            "ko",
        ];
        description: "Thinking mode, 100+ languages, ~3GB RAM";
        thinkingMode: "qwen";
        benchmarks: { mmlu: 76; arena: 1300 };
    }
    | {
        name: "Qwen3 8B";
        repo: "unsloth/Qwen3-8B-GGUF";
        file: "Qwen3-8B-Q4_K_M.gguf";
        parameters: "8B";
        quantization: "Q4_K_M";
        contextLength: 32768;
        languages: readonly [
            "en",
            "zh",
            "de",
            "fr",
            "es",
            "pt",
            "it",
            "nl",
            "pl",
            "ru",
            "ja",
            "ko",
        ];
        description: "Thinking mode, excellent multilingual, ~5GB RAM";
        thinkingMode: "qwen";
        benchmarks: { mmlu: 81; arena: 1350 };
    }
    | {
        name: "Qwen3 14B";
        repo: "unsloth/Qwen3-14B-GGUF";
        file: "Qwen3-14B-Q4_K_M.gguf";
        parameters: "14B";
        quantization: "Q4_K_M";
        contextLength: 32768;
        languages: readonly [
            "en",
            "zh",
            "de",
            "fr",
            "es",
            "pt",
            "it",
            "nl",
            "pl",
            "ru",
            "ja",
            "ko",
        ];
        description: "Thinking mode, top multilingual, ~9GB RAM";
        thinkingMode: "qwen";
        benchmarks: { mmlu: 84; arena: 1380 };
    }
    | {
        name: "Qwen 2.5 Coder 7B";
        repo: "bartowski/Qwen2.5-Coder-7B-Instruct-GGUF";
        file: "Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf";
        parameters: "7B";
        quantization: "Q4_K_M";
        contextLength: 131072;
        languages: readonly ["en"];
        description: "Optimized for code generation";
        benchmarks: { mmlu: 66; arena: 1250 };
    }
    | {
        name: "DeepSeek R1 Distill 7B";
        repo: "bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF";
        file: "DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf";
        parameters: "7B";
        quantization: "Q4_K_M";
        contextLength: 131072;
        languages: readonly ["en", "zh"];
        description: "Strong reasoning with chain-of-thought";
        thinkingMode: "deepseek";
        benchmarks: { mmlu: 72; arena: 1300 };
    }
    | {
        name: "DeepSeek R1 Distill 14B";
        repo: "bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF";
        file: "DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf";
        parameters: "14B";
        quantization: "Q4_K_M";
        contextLength: 131072;
        languages: readonly ["en", "zh"];
        description: "Best reasoning model, shows thinking";
        thinkingMode: "deepseek";
        benchmarks: { mmlu: 79; arena: 1350 };
    }
    | {
        name: string;
        repo: string;
        file: string;
        parameters: string;
        quantization: string;
        contextLength: number;
        languages: string[];
        description: string;
    }
Model information object
- Defined in engine.ts:384

resetSession

resetSession(): void
Reset the chat session

Clears all conversation history, starting fresh for new conversations. The model remains loaded; use dispose() to fully unload.

Returns void
- Defined in engine.ts:406

dispose

dispose(): Promise<void>
Clean up resources and unload the model

Releases GPU memory and cleans up native resources. Call this when done with the engine to prevent memory leaks.

Returns Promise<void>
Example
```
const engine = new LLMEngine({ model: "gemma-3n-e4b" })
try {
  await engine.initialize()
  const result = await engine.generate({ prompt: "Hello" })
} finally {
  await engine.dispose()
}
```
- Defined in engine.ts:429

`Static`listModels

listModels(): (
    { id: string } & (
        | {
            name: "Gemma 3n E2B";
            repo: "unsloth/gemma-3n-E2B-it-GGUF";
            file: "gemma-3n-E2B-it-Q4_K_M.gguf";
            parameters: "5B→2B";
            quantization: "Q4_K_M";
            contextLength: 32768;
            languages: readonly [
                "en",
                "de",
                "fr",
                "es",
                "it",
                "pt",
                "nl",
                "pl",
                "ru",
                "ja",
                "ko",
                "zh",
            ];
            description: "Ultra-efficient edge model, ~2GB RAM";
            requiresAuth: false;
            benchmarks: { mmlu: 64; arena: 1250 };
        }
        | {
            name: "Gemma 3n E4B";
            repo: "unsloth/gemma-3n-E4B-it-GGUF";
            file: "gemma-3n-E4B-it-Q4_K_M.gguf";
            parameters: "8B→4B";
            quantization: "Q4_K_M";
            contextLength: 32768;
            languages: readonly [
                "en",
                "de",
                "fr",
                "es",
                "it",
                "pt",
                "nl",
                "pl",
                "ru",
                "ja",
                "ko",
                "zh",
            ];
            description: "Best edge model, ~3GB RAM";
            requiresAuth: false;
            benchmarks: { mmlu: 75; arena: 1300 };
        }
        | {
            name: "Gemma 3 27B";
            repo: "unsloth/gemma-3-27b-it-GGUF";
            file: "gemma-3-27b-it-Q4_K_M.gguf";
            parameters: "27B";
            quantization: "Q4_K_M";
            contextLength: 131072;
            languages: readonly [
                "en",
                "de",
                "fr",
                "es",
                "it",
                "pt",
                "nl",
                "pl",
                "ru",
                "ja",
                "ko",
                "zh",
            ];
            description: "Maximum quality, 128K context, ~18GB RAM";
            benchmarks: { mmlu: 77; arena: 1338 };
        }
        | {
            name: "GPT-OSS 20B";
            repo: "unsloth/gpt-oss-20b-GGUF";
            file: "gpt-oss-20b-Q4_K_M.gguf";
            parameters: "21B (3.6B active)";
            quantization: "Q4_K_M";
            contextLength: 131072;
            languages: readonly ["en"];
            description: "OpenAI's open model, MoE, ~16GB RAM";
            benchmarks: { mmlu: 82; arena: 1340 };
        }
        | {
            name: "Phi-4 14B";
            repo: "bartowski/phi-4-GGUF";
            file: "phi-4-Q4_K_M.gguf";
            parameters: "14B";
            quantization: "Q4_K_M";
            contextLength: 16384;
            languages: readonly ["en"];
            description: "Microsoft's reasoning-focused, excellent for STEM";
            benchmarks: { mmlu: 84; arena: 1320 };
        }
        | {
            name: "Qwen3 4B";
            repo: "unsloth/Qwen3-4B-GGUF";
            file: "Qwen3-4B-Q4_K_M.gguf";
            parameters: "4B";
            quantization: "Q4_K_M";
            contextLength: 32768;
            languages: readonly [
                "en",
                "zh",
                "de",
                "fr",
                "es",
                "pt",
                "it",
                "nl",
                "pl",
                "ru",
                "ja",
                "ko",
            ];
            description: "Thinking mode, 100+ languages, ~3GB RAM";
            thinkingMode: "qwen";
            benchmarks: { mmlu: 76; arena: 1300 };
        }
        | {
            name: "Qwen3 8B";
            repo: "unsloth/Qwen3-8B-GGUF";
            file: "Qwen3-8B-Q4_K_M.gguf";
            parameters: "8B";
            quantization: "Q4_K_M";
            contextLength: 32768;
            languages: readonly [
                "en",
                "zh",
                "de",
                "fr",
                "es",
                "pt",
                "it",
                "nl",
                "pl",
                "ru",
                "ja",
                "ko",
            ];
            description: "Thinking mode, excellent multilingual, ~5GB RAM";
            thinkingMode: "qwen";
            benchmarks: { mmlu: 81; arena: 1350 };
        }
        | {
            name: "Qwen3 14B";
            repo: "unsloth/Qwen3-14B-GGUF";
            file: "Qwen3-14B-Q4_K_M.gguf";
            parameters: "14B";
            quantization: "Q4_K_M";
            contextLength: 32768;
            languages: readonly [
                "en",
                "zh",
                "de",
                "fr",
                "es",
                "pt",
                "it",
                "nl",
                "pl",
                "ru",
                "ja",
                "ko",
            ];
            description: "Thinking mode, top multilingual, ~9GB RAM";
            thinkingMode: "qwen";
            benchmarks: { mmlu: 84; arena: 1380 };
        }
        | {
            name: "Qwen 2.5 Coder 7B";
            repo: "bartowski/Qwen2.5-Coder-7B-Instruct-GGUF";
            file: "Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf";
            parameters: "7B";
            quantization: "Q4_K_M";
            contextLength: 131072;
            languages: readonly ["en"];
            description: "Optimized for code generation";
            benchmarks: { mmlu: 66; arena: 1250 };
        }
        | {
            name: "DeepSeek R1 Distill 7B";
            repo: "bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF";
            file: "DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf";
            parameters: "7B";
            quantization: "Q4_K_M";
            contextLength: 131072;
            languages: readonly ["en", "zh"];
            description: "Strong reasoning with chain-of-thought";
            thinkingMode: "deepseek";
            benchmarks: { mmlu: 72; arena: 1300 };
        }
        | {
            name: "DeepSeek R1 Distill 14B";
            repo: "bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF";
            file: "DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf";
            parameters: "14B";
            quantization: "Q4_K_M";
            contextLength: 131072;
            languages: readonly ["en", "zh"];
            description: "Best reasoning model, shows thinking";
            thinkingMode: "deepseek";
            benchmarks: { mmlu: 79; arena: 1350 };
        }
    )
)[]
List all available curated models

Returns (
    { id: string } & (
        | {
            name: "Gemma 3n E2B";
            repo: "unsloth/gemma-3n-E2B-it-GGUF";
            file: "gemma-3n-E2B-it-Q4_K_M.gguf";
            parameters: "5B→2B";
            quantization: "Q4_K_M";
            contextLength: 32768;
            languages: readonly [
                "en",
                "de",
                "fr",
                "es",
                "it",
                "pt",
                "nl",
                "pl",
                "ru",
                "ja",
                "ko",
                "zh",
            ];
            description: "Ultra-efficient edge model, ~2GB RAM";
            requiresAuth: false;
            benchmarks: { mmlu: 64; arena: 1250 };
        }
        | {
            name: "Gemma 3n E4B";
            repo: "unsloth/gemma-3n-E4B-it-GGUF";
            file: "gemma-3n-E4B-it-Q4_K_M.gguf";
            parameters: "8B→4B";
            quantization: "Q4_K_M";
            contextLength: 32768;
            languages: readonly [
                "en",
                "de",
                "fr",
                "es",
                "it",
                "pt",
                "nl",
                "pl",
                "ru",
                "ja",
                "ko",
                "zh",
            ];
            description: "Best edge model, ~3GB RAM";
            requiresAuth: false;
            benchmarks: { mmlu: 75; arena: 1300 };
        }
        | {
            name: "Gemma 3 27B";
            repo: "unsloth/gemma-3-27b-it-GGUF";
            file: "gemma-3-27b-it-Q4_K_M.gguf";
            parameters: "27B";
            quantization: "Q4_K_M";
            contextLength: 131072;
            languages: readonly [
                "en",
                "de",
                "fr",
                "es",
                "it",
                "pt",
                "nl",
                "pl",
                "ru",
                "ja",
                "ko",
                "zh",
            ];
            description: "Maximum quality, 128K context, ~18GB RAM";
            benchmarks: { mmlu: 77; arena: 1338 };
        }
        | {
            name: "GPT-OSS 20B";
            repo: "unsloth/gpt-oss-20b-GGUF";
            file: "gpt-oss-20b-Q4_K_M.gguf";
            parameters: "21B (3.6B active)";
            quantization: "Q4_K_M";
            contextLength: 131072;
            languages: readonly ["en"];
            description: "OpenAI's open model, MoE, ~16GB RAM";
            benchmarks: { mmlu: 82; arena: 1340 };
        }
        | {
            name: "Phi-4 14B";
            repo: "bartowski/phi-4-GGUF";
            file: "phi-4-Q4_K_M.gguf";
            parameters: "14B";
            quantization: "Q4_K_M";
            contextLength: 16384;
            languages: readonly ["en"];
            description: "Microsoft's reasoning-focused, excellent for STEM";
            benchmarks: { mmlu: 84; arena: 1320 };
        }
        | {
            name: "Qwen3 4B";
            repo: "unsloth/Qwen3-4B-GGUF";
            file: "Qwen3-4B-Q4_K_M.gguf";
            parameters: "4B";
            quantization: "Q4_K_M";
            contextLength: 32768;
            languages: readonly [
                "en",
                "zh",
                "de",
                "fr",
                "es",
                "pt",
                "it",
                "nl",
                "pl",
                "ru",
                "ja",
                "ko",
            ];
            description: "Thinking mode, 100+ languages, ~3GB RAM";
            thinkingMode: "qwen";
            benchmarks: { mmlu: 76; arena: 1300 };
        }
        | {
            name: "Qwen3 8B";
            repo: "unsloth/Qwen3-8B-GGUF";
            file: "Qwen3-8B-Q4_K_M.gguf";
            parameters: "8B";
            quantization: "Q4_K_M";
            contextLength: 32768;
            languages: readonly [
                "en",
                "zh",
                "de",
                "fr",
                "es",
                "pt",
                "it",
                "nl",
                "pl",
                "ru",
                "ja",
                "ko",
            ];
            description: "Thinking mode, excellent multilingual, ~5GB RAM";
            thinkingMode: "qwen";
            benchmarks: { mmlu: 81; arena: 1350 };
        }
        | {
            name: "Qwen3 14B";
            repo: "unsloth/Qwen3-14B-GGUF";
            file: "Qwen3-14B-Q4_K_M.gguf";
            parameters: "14B";
            quantization: "Q4_K_M";
            contextLength: 32768;
            languages: readonly [
                "en",
                "zh",
                "de",
                "fr",
                "es",
                "pt",
                "it",
                "nl",
                "pl",
                "ru",
                "ja",
                "ko",
            ];
            description: "Thinking mode, top multilingual, ~9GB RAM";
            thinkingMode: "qwen";
            benchmarks: { mmlu: 84; arena: 1380 };
        }
        | {
            name: "Qwen 2.5 Coder 7B";
            repo: "bartowski/Qwen2.5-Coder-7B-Instruct-GGUF";
            file: "Qwen2.5-Coder-7B-Instruct-Q4_K_M.gguf";
            parameters: "7B";
            quantization: "Q4_K_M";
            contextLength: 131072;
            languages: readonly ["en"];
            description: "Optimized for code generation";
            benchmarks: { mmlu: 66; arena: 1250 };
        }
        | {
            name: "DeepSeek R1 Distill 7B";
            repo: "bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF";
            file: "DeepSeek-R1-Distill-Qwen-7B-Q4_K_M.gguf";
            parameters: "7B";
            quantization: "Q4_K_M";
            contextLength: 131072;
            languages: readonly ["en", "zh"];
            description: "Strong reasoning with chain-of-thought";
            thinkingMode: "deepseek";
            benchmarks: { mmlu: 72; arena: 1300 };
        }
        | {
            name: "DeepSeek R1 Distill 14B";
            repo: "bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF";
            file: "DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf";
            parameters: "14B";
            quantization: "Q4_K_M";
            contextLength: 131072;
            languages: readonly ["en", "zh"];
            description: "Best reasoning model, shows thinking";
            thinkingMode: "deepseek";
            benchmarks: { mmlu: 79; arena: 1350 };
        }
    )
)[]
Array of model information objects
Example
```
const models = LLMEngine.listModels()
models.forEach(m => console.log(`${m.id}: ${m.name} (${m.parameters})`))
```
- Defined in engine.ts:457

`Static`getModelForUseCase

getModelForUseCase(
    useCase:
        | "fast"
        | "balanced"
        | "quality"
        | "edge"
        | "multilingual"
        | "reasoning"
        | "code"
        | "longContext",
): | "gemma-3n-e2b"
| "gemma-3n-e4b"
| "gemma-3-27b"
| "gpt-oss-20b"
| "phi-4"
| "qwen3-4b"
| "qwen3-8b"
| "qwen3-14b"
| "qwen-2.5-coder-7b"
| "deepseek-r1-7b"
| "deepseek-r1-14b"
Get recommended model for a specific use case
Parameters
- useCase:
      | "fast"
      | "balanced"
      | "quality"
      | "edge"
      | "multilingual"
      | "reasoning"
      | "code"
      | "longContext"
  One of: fast, balanced, quality, edge, multilingual, reasoning, code, longContext
Returns
    | "gemma-3n-e2b"
    | "gemma-3n-e4b"
    | "gemma-3-27b"
    | "gpt-oss-20b"
    | "phi-4"
    | "qwen3-4b"
    | "qwen3-8b"
    | "qwen3-14b"
    | "qwen-2.5-coder-7b"
    | "deepseek-r1-7b"
    | "deepseek-r1-14b"
Model ID string
Example
```
const modelId = LLMEngine.getModelForUseCase("code")
const engine = new LLMEngine({ model: modelId })
```
- Defined in engine.ts:476

Class LLMEngine

Index

Constructors

Methods

Constructors

constructor

Parameters

Returns LLMEngine

Methods

isAvailable

Returns boolean

initialize

Returns Promise<void>

Throws

Example

generate

Parameters

Returns Promise<GenerateResult>

Example

generateStreaming

Parameters

Returns Promise<GenerateResult>

Example

chat

Parameters

Returns Promise<GenerateResult>

Example

getModelInfo

resetSession

Returns void

dispose

Returns Promise<void>

Example

StaticlistModels

Example

StaticgetModelForUseCase

Parameters

Returns | "gemma-3n-e2b" | "gemma-3n-e4b" | "gemma-3-27b" | "gpt-oss-20b" | "phi-4" | "qwen3-4b" | "qwen3-8b" | "qwen3-14b" | "qwen-2.5-coder-7b" | "deepseek-r1-7b" | "deepseek-r1-14b"

Example

Settings

On This Page

`Static`listModels

`Static`getModelForUseCase

Returns
| "gemma-3n-e2b"
| "gemma-3n-e4b"
| "gemma-3-27b"
| "gpt-oss-20b"
| "phi-4"
| "qwen3-4b"
| "qwen3-8b"
| "qwen3-14b"
| "qwen-2.5-coder-7b"
| "deepseek-r1-7b"
| "deepseek-r1-14b"