llama.cpp provider

Local LLM inference provider

Local LLM inference provider backed by the open-source llama.cpp project, which provides C/C++ runtime and server tooling for running models locally.

Screenshot of llama.cpp provider website

Recent stories

1 linked story

releasePRIMARY2026-05-16

llama.cpp provider adds in-process AI SDK support with tool calling

A new llama.cpp provider lets the AI SDK run directly inside a Node process without a separate server, while exposing reasoning, tool calling, image inputs, and prompt caching. The setup shortens local deployment paths for AI SDK apps that want llama.cpp bindings.