llm-cpp

A suite of 26 single-header C++17 libraries for integrating large language models into native applications. Each library is a self-contained .hpp file -- drop in what you need, define one implementation macro, and ship. No Python, no SDKs, no package manager required.

Start Here

Just want to call an LLM? → llm-stream

Building a chatbot? → llm-chat + llm-retry

Building RAG? → llm-rag + llm-embed + llm-rank

Need production observability? → llm-log + llm-trace + llm-cost

The Suite

Core -- foundational primitives every LLM app needs

Library	Description	Deps
llm-stream	Stream OpenAI & Anthropic responses via SSE	libcurl
llm-retry	Retry with exponential backoff + circuit breaker	None
llm-cost	Token counting + cost estimation for 6 models	None
llm-cache	LRU response cache -- skip identical API calls	None
llm-format	JSON schema enforcement + structured output	None
llm-json	Recursive-descent JSON parser and builder	None

Data -- move, retrieve, and reshape information

Library	Description	Deps
llm-embed	Text embeddings + cosine similarity + vector store	libcurl
llm-rag	Retrieval-augmented generation pipeline	libcurl
llm-rank	BM25 + LLM passage reranking, hybrid mode	libcurl†
llm-compress	Context compression: truncate, sliding window, summarize	None*
llm-parse	Offline HTML/markdown parsing, chunking, TextStats	None
llm-batch	Batch processing with thread pool, rate limiting, checkpointing	libcurl

Ops -- observe, test, and operate at scale

Library	Description	Deps
llm-log	Structured JSONL logging for every LLM call	None
llm-trace	RAII span tracing with OTLP JSON export	None
llm-pool	Concurrent request pool with priority queue + rate limiting	None
llm-mock	Mock LLM provider for unit testing -- zero network	None
llm-eval	N-run evaluation + consistency scoring + model comparison	libcurl
llm-ab	A/B testing with Welch t-test and Cohen d	libcurl

App -- build complete user-facing features

Library	Description	Deps
llm-chat	Multi-turn conversation manager with token-budget truncation	libcurl
llm-agent	Tool-calling agent loop (OpenAI function calling)	libcurl
llm-vision	Multimodal image+text for OpenAI and Anthropic	libcurl
llm-template	Mustache-style prompt templating	None
llm-router	Route prompts to the right model by complexity	None
llm-guard	PII detection + prompt injection scoring -- fully offline	None
llm-audio	Whisper transcription, translation, and TTS	libcurl
llm-finetune	Fine-tuning job lifecycle: upload, create, poll, manage models	libcurl

*llm-compress requires libcurl only for the optional Summarize strategy. †llm-rank requires libcurl only for LLM-based reranking; local BM25 mode has zero deps.

Quickstart

Libraries compose naturally. Here is a production-ready pattern using llm-log, llm-retry, and llm-stream together:

#define LLM_LOG_IMPLEMENTATION
#include "llm_log.hpp"

#define LLM_RETRY_IMPLEMENTATION
#include "llm_retry.hpp"

#define LLM_STREAM_IMPLEMENTATION
#include "llm_stream.hpp"

int main() {
 llm::Logger logger("calls.jsonl");

 llm::Config cfg;
 cfg.api_key = std::getenv("OPENAI_API_KEY");
 cfg.model = "gpt-4o-mini";

 const std::string prompt = "Explain backpressure in one paragraph.";
 auto log_id = logger.log_request(prompt, cfg.model);

 auto result = llm::with_retry<std::string>([&]() -> std::string {
 std::string output;
 llm::stream_openai(prompt, cfg,
 [&](std::string_view tok) { std::cout << tok << std::flush; output += tok; },
 [](const llm::StreamStats& s) {
 std::cout << "\n[" << s.token_count << " tokens, " << s.tokens_per_sec << " tok/s]\n";
 }
 );
 return output;
 });

 logger.log_response(log_id, result);
}

Another example -- guard, route, and chat together:

#define LLM_GUARD_IMPLEMENTATION
#include "llm_guard.hpp"

#define LLM_ROUTER_IMPLEMENTATION
#include "llm_router.hpp"

#define LLM_CHAT_IMPLEMENTATION
#include "llm_chat.hpp"

int main() {
 // 1. Check input for PII / injection
 auto guard = llm::scan(user_input);
 if (!guard.safe) user_input = guard.scrubbed;

 // 2. Route to the right model
 llm::RouterConfig rcfg;
 rcfg.strategy = llm::RoutingStrategy::Balanced;
 rcfg.models = {{"gpt-4o-mini", 0.15, 0.5, 0.7, 40}, {"gpt-4o", 5.0, 1.0, 0.9, 100}};
 auto decision = llm::Router(rcfg).route(user_input);

 // 3. Send with conversation memory
 llm::ChatConfig ccfg;
 ccfg.api_key = std::getenv("OPENAI_API_KEY");
 ccfg.model = decision.model_name;
 llm::Conversation conv(ccfg);
 std::cout << conv.chat(user_input) << "\n";
}

Installation

Each library is a single .hpp file. Copy what you need:

# Core

[![Star History Chart](https://api.star-history.com/svg?repos=Mattbusel/llm-cpp&type=Date)](https://star-history.com/#Mattbusel/llm-cpp)

curl -O https://raw.githubusercontent.com/Mattbusel/llm-stream/main/include/llm_stream.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-retry/main/include/llm_retry.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-cost/main/include/llm_cost.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-cache/main/include/llm_cache.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-format/main/include/llm_format.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-json/main/include/llm_json.hpp

# Data

[![Star History Chart](https://api.star-history.com/svg?repos=Mattbusel/llm-cpp&type=Date)](https://star-history.com/#Mattbusel/llm-cpp)

curl -O https://raw.githubusercontent.com/Mattbusel/llm-embed/main/include/llm_embed.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-rag/main/include/llm_rag.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-rank/main/include/llm_rank.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-compress/main/include/llm_compress.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-parse/main/include/llm_parse.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-batch/main/include/llm_batch.hpp

# Ops

[![Star History Chart](https://api.star-history.com/svg?repos=Mattbusel/llm-cpp&type=Date)](https://star-history.com/#Mattbusel/llm-cpp)

curl -O https://raw.githubusercontent.com/Mattbusel/llm-log/main/include/llm_log.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-trace/main/include/llm_trace.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-pool/main/include/llm_pool.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-mock/main/include/llm_mock.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-eval/main/include/llm_eval.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-ab/main/include/llm_ab.hpp

# App

[![Star History Chart](https://api.star-history.com/svg?repos=Mattbusel/llm-cpp&type=Date)](https://star-history.com/#Mattbusel/llm-cpp)

curl -O https://raw.githubusercontent.com/Mattbusel/llm-chat/main/include/llm_chat.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-agent/main/include/llm_agent.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-vision/main/include/llm_vision.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-template/main/include/llm_template.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-router/main/include/llm_router.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-guard/main/include/llm_guard.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-audio/main/include/llm_audio.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-finetune/main/include/llm_finetune.hpp

In exactly one .cpp file per library, define the implementation macro before including:

#define LLM_STREAM_IMPLEMENTATION
#define LLM_RETRY_IMPLEMENTATION
#define LLM_LOG_IMPLEMENTATION
#include "llm_stream.hpp"
#include "llm_retry.hpp"
#include "llm_log.hpp"

All other translation units just #include without the macro.

Requirements

Requirement	Detail
C++ standard	C++17 or later
Compiler	GCC, Clang, MSVC -- all supported
External deps	libcurl for network libraries (see table above). All others: zero deps.
Build system	Any. Works with CMake, Make, Bazel, MSVC, plain `g++`.

License

Related Projects by @Mattbusel

Special-Relativity-in-Financial-Modeling -- C++ special relativity / Lorentz transform applied to OHLCV financial modeling
LLMTokenStreamQuantEngine -- C++ engine combining LLM token streams with quantitative trading signals
llm-stream -- Single-header C++ streaming library for OpenAI and Anthropic APIs

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-cpp

Start Here

The Suite

Core -- foundational primitives every LLM app needs

Data -- move, retrieve, and reshape information

Ops -- observe, test, and operate at scale

App -- build complete user-facing features

Quickstart

Installation

Requirements

License

Related Projects by @Mattbusel

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

llm-cpp

Start Here

The Suite

Core -- foundational primitives every LLM app needs

Data -- move, retrieve, and reshape information

Ops -- observe, test, and operate at scale

App -- build complete user-facing features

Quickstart

Installation

Requirements

License

Related Projects by @Mattbusel

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Packages