Skip to content

Mattbusel/llm-cpp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

llm-cpp

CI

Star History Chart

C++17 MIT License 26 Libraries Zero Deps Core

A suite of 26 single-header C++17 libraries for integrating large language models into native applications. Each library is a self-contained .hpp file -- drop in what you need, define one implementation macro, and ship. No Python, no SDKs, no package manager required.


Start Here

Just want to call an LLM?llm-stream

Building a chatbot?llm-chat + llm-retry

Building RAG?llm-rag + llm-embed + llm-rank

Need production observability?llm-log + llm-trace + llm-cost


The Suite

Core -- foundational primitives every LLM app needs

Library Description Deps
llm-stream Stream OpenAI & Anthropic responses via SSE libcurl
llm-retry Retry with exponential backoff + circuit breaker None
llm-cost Token counting + cost estimation for 6 models None
llm-cache LRU response cache -- skip identical API calls None
llm-format JSON schema enforcement + structured output None
llm-json Recursive-descent JSON parser and builder None

Data -- move, retrieve, and reshape information

Library Description Deps
llm-embed Text embeddings + cosine similarity + vector store libcurl
llm-rag Retrieval-augmented generation pipeline libcurl
llm-rank BM25 + LLM passage reranking, hybrid mode libcurl†
llm-compress Context compression: truncate, sliding window, summarize None*
llm-parse Offline HTML/markdown parsing, chunking, TextStats None
llm-batch Batch processing with thread pool, rate limiting, checkpointing libcurl

Ops -- observe, test, and operate at scale

Library Description Deps
llm-log Structured JSONL logging for every LLM call None
llm-trace RAII span tracing with OTLP JSON export None
llm-pool Concurrent request pool with priority queue + rate limiting None
llm-mock Mock LLM provider for unit testing -- zero network None
llm-eval N-run evaluation + consistency scoring + model comparison libcurl
llm-ab A/B testing with Welch t-test and Cohen d libcurl

App -- build complete user-facing features

Library Description Deps
llm-chat Multi-turn conversation manager with token-budget truncation libcurl
llm-agent Tool-calling agent loop (OpenAI function calling) libcurl
llm-vision Multimodal image+text for OpenAI and Anthropic libcurl
llm-template Mustache-style prompt templating None
llm-router Route prompts to the right model by complexity None
llm-guard PII detection + prompt injection scoring -- fully offline None
llm-audio Whisper transcription, translation, and TTS libcurl
llm-finetune Fine-tuning job lifecycle: upload, create, poll, manage models libcurl

*llm-compress requires libcurl only for the optional Summarize strategy. †llm-rank requires libcurl only for LLM-based reranking; local BM25 mode has zero deps.


Quickstart

Libraries compose naturally. Here is a production-ready pattern using llm-log, llm-retry, and llm-stream together:

#define LLM_LOG_IMPLEMENTATION
#include "llm_log.hpp"

#define LLM_RETRY_IMPLEMENTATION
#include "llm_retry.hpp"

#define LLM_STREAM_IMPLEMENTATION
#include "llm_stream.hpp"

int main() {
 llm::Logger logger("calls.jsonl");

 llm::Config cfg;
 cfg.api_key = std::getenv("OPENAI_API_KEY");
 cfg.model = "gpt-4o-mini";

 const std::string prompt = "Explain backpressure in one paragraph.";
 auto log_id = logger.log_request(prompt, cfg.model);

 auto result = llm::with_retry<std::string>([&]() -> std::string {
 std::string output;
 llm::stream_openai(prompt, cfg,
 [&](std::string_view tok) { std::cout << tok << std::flush; output += tok; },
 [](const llm::StreamStats& s) {
 std::cout << "\n[" << s.token_count << " tokens, " << s.tokens_per_sec << " tok/s]\n";
 }
 );
 return output;
 });

 logger.log_response(log_id, result);
}

Another example -- guard, route, and chat together:

#define LLM_GUARD_IMPLEMENTATION
#include "llm_guard.hpp"

#define LLM_ROUTER_IMPLEMENTATION
#include "llm_router.hpp"

#define LLM_CHAT_IMPLEMENTATION
#include "llm_chat.hpp"

int main() {
 // 1. Check input for PII / injection
 auto guard = llm::scan(user_input);
 if (!guard.safe) user_input = guard.scrubbed;

 // 2. Route to the right model
 llm::RouterConfig rcfg;
 rcfg.strategy = llm::RoutingStrategy::Balanced;
 rcfg.models = {{"gpt-4o-mini", 0.15, 0.5, 0.7, 40}, {"gpt-4o", 5.0, 1.0, 0.9, 100}};
 auto decision = llm::Router(rcfg).route(user_input);

 // 3. Send with conversation memory
 llm::ChatConfig ccfg;
 ccfg.api_key = std::getenv("OPENAI_API_KEY");
 ccfg.model = decision.model_name;
 llm::Conversation conv(ccfg);
 std::cout << conv.chat(user_input) << "\n";
}

Installation

Each library is a single .hpp file. Copy what you need:

# Core

[![Star History Chart](https://api.star-history.com/svg?repos=Mattbusel/llm-cpp&type=Date)](https://star-history.com/#Mattbusel/llm-cpp)

curl -O https://raw.githubusercontent.com/Mattbusel/llm-stream/main/include/llm_stream.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-retry/main/include/llm_retry.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-cost/main/include/llm_cost.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-cache/main/include/llm_cache.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-format/main/include/llm_format.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-json/main/include/llm_json.hpp

# Data

[![Star History Chart](https://api.star-history.com/svg?repos=Mattbusel/llm-cpp&type=Date)](https://star-history.com/#Mattbusel/llm-cpp)

curl -O https://raw.githubusercontent.com/Mattbusel/llm-embed/main/include/llm_embed.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-rag/main/include/llm_rag.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-rank/main/include/llm_rank.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-compress/main/include/llm_compress.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-parse/main/include/llm_parse.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-batch/main/include/llm_batch.hpp

# Ops

[![Star History Chart](https://api.star-history.com/svg?repos=Mattbusel/llm-cpp&type=Date)](https://star-history.com/#Mattbusel/llm-cpp)

curl -O https://raw.githubusercontent.com/Mattbusel/llm-log/main/include/llm_log.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-trace/main/include/llm_trace.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-pool/main/include/llm_pool.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-mock/main/include/llm_mock.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-eval/main/include/llm_eval.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-ab/main/include/llm_ab.hpp

# App

[![Star History Chart](https://api.star-history.com/svg?repos=Mattbusel/llm-cpp&type=Date)](https://star-history.com/#Mattbusel/llm-cpp)

curl -O https://raw.githubusercontent.com/Mattbusel/llm-chat/main/include/llm_chat.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-agent/main/include/llm_agent.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-vision/main/include/llm_vision.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-template/main/include/llm_template.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-router/main/include/llm_router.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-guard/main/include/llm_guard.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-audio/main/include/llm_audio.hpp
curl -O https://raw.githubusercontent.com/Mattbusel/llm-finetune/main/include/llm_finetune.hpp

In exactly one .cpp file per library, define the implementation macro before including:

#define LLM_STREAM_IMPLEMENTATION
#define LLM_RETRY_IMPLEMENTATION
#define LLM_LOG_IMPLEMENTATION
#include "llm_stream.hpp"
#include "llm_retry.hpp"
#include "llm_log.hpp"

All other translation units just #include without the macro.


Requirements

Requirement Detail
C++ standard C++17 or later
Compiler GCC, Clang, MSVC -- all supported
External deps libcurl for network libraries (see table above). All others: zero deps.
Build system Any. Works with CMake, Make, Bazel, MSVC, plain g++.

License

All 26 libraries: MIT -- Copyright (c) 2026 Mattbusel.


Related Projects by @Mattbusel

About

The C++ LLM toolkit. 26 single-header libraries for streaming, caching, cost estimation, retry, and structured output. Drop in what you need.

Topics

Resources

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors