lupe
A local-first command-line tool that turns piles of documents into a searchable, connected knowledge base on your own machine, replacing a brittle OCR and parsing pipeline with a handful of model calls.
- Live
- Open Source
Context
lupe, German for the loupe an investigator holds to a document, is a small and composable command-line tool that turns piles of documents into structured, searchable, connected knowledge on your own machine. It reads documents, extracts their entities, builds a knowledge graph, indexes everything, and then lets you ask questions of the whole corpus. It runs locally by default and reaches for a cloud model only when you ask it to. It is open-source under the MIT license.
The problem
The classical document pipeline is brittle and heavy. It stacks optical character recognition, deskewing, denoising, and hundreds of cleanup patterns, then regular expressions and named-entity rules, then hand-built parsers, and every layer is fragile and expensive to maintain. On top of that, most modern document tools assume your files leave your machine. The goal with lupe was to replace that brittle stack with a small number of model calls while keeping the documents private by default.
Approach
lupe is five composable stages. Each one is plain files in and files out, usable on its own or chained end to end.
- The read stage turns any page image into clean Markdown using a vision-language model, with no Tesseract, no deskewing or denoising, and no hand-written cleanup stage.
- The extract stage turns text into validated JSON entities and relationships, driven by a schema the user can define.
- The graph stage turns those entities into an Obsidian vault you can explore as a node graph, with mermaid and graphml exports alongside it.
- The index stage persists the text and entities and embeds them for search, using SQLite by default or Postgres with pgvector when you want it.
- The ask stage answers questions over the corpus, grounded in both the retrieved chunks and the entity graph, and returns citations.
A mode and weight system lets you choose where the compute runs, either local, hybrid, or cloud, and how large the models are. lupe keeps only one local model resident at a time, frees it when the stage is done, and prints the expected memory cost before each run, so it never quietly takes over the machine.
What’s interesting
Two decisions carry the project. The first is that a single vision-model call replaces an entire classical pipeline of optical character recognition and cleanup, which removes a whole category of brittle code. The second is that local-first is a real default rather than a slogan. Ollama runs everything offline, the cloud tier is strictly opt-in, the documents never leave the machine unless the user chooses otherwise, and there is no telemetry. The tool is also deliberately lightweight, with no torch, no transformers, and no langchain, because the heavy work is delegated to Ollama or to the cloud APIs.
Outcome
The result is a shipped, MIT-licensed command-line tool in Python that takes documents from raw page images all the way to an interrogable, cited knowledge base, entirely on your own hardware when you want it that way. It includes an offline demo that runs the whole pipeline with no models and no network, a guided setup that detects and pulls the models you need, and a preflight self-test that gives you one green light before a real run. The repository is private for now.
The stack is Python 3.11, a vision-language model and a reasoning model served through Ollama or Anthropic, embeddings from Voyage or Ollama, and SQLite or Postgres with pgvector for storage.