Provable Provable

Drop a million documents.
Prove every answer.

Cited answers for questions. Ranked chunks for agents. Both signed, both verifiable, both on your machine.

your_corpus.py
from provable import Pipeline

p = Pipeline.from_documents("./pdfs/")
ans = p.query("data retention policy?")

ans.answer       # cited text
ans.citations    # source spans
ans.proof        # SHA-256 Merkle
3 lines · runs on your laptop · no GPU
1.2M
docs indexed
12,847
searched per query
<40ms
proof verify
0
data leaves your host
Works on any folder of documents
PDF
.pdf
MD
.md
TXT
.txt
DOC
.docx
HTML
.html
JSON
.json
CASE
case law
CSV
.csv
Architecture

From folder to signed answer in four stages.

Every stage is deterministic, inspectable, and reproducible. No black-box embedding API. No vendor upload. No surprises.

PDF MD TXT 01 · YOUR FOLDER PDF · MD · TXT · DOCX 100 to 100,000 files 02 · INDEX + SIGN 0x4f3a 0xa12c 0xb55d 0xc91e 0xd2b8 Sparse + dense index SHA-256 per document Merkle root committed "question?" 03 · RETRIEVE Hybrid retrieval Coverage gate Calibrated confidence 04 · SIGNED OUTPUT PROOF A8B3C1D2… Cited answer verifiable in < 40 ms your machine on-disk index + Merkle commit hybrid retrieval + gating signed · independently verifiable proof = SHA-256(query ‖ sorted([SHA-256(doc_i) for doc_i ∈ retrieved]))
Three lines · any folder

The whole library is one import away.

No vendor upload. No external embedding API. Pure Python, runs on your machine.

from provable import Pipeline

pipeline = Pipeline.from_documents("./my_corpus/")
result   = pipeline.query("What does our compliance policy say about data retention?")

#  result.answer      → cited natural-language answer
#  result.citations   → exact source spans with doc IDs
#  result.proof       → SHA-256 Merkle commitment (verifiable in <40ms)
#  result.verdict     → "ANSWERED" or "ABSTAINED"
# Same query against a running Provable server
curl "https://your-host/api/query?q=What+does+our+compliance+policy+say"

# Independently verify any returned proof:
curl -X POST "https://your-host/api/verify" \
     -H "content-type: application/json" \
     -d '{"query": "...", "retrieved_doc_ids": [...], "proof_signature": "PROOF-..."}'

# → { "valid": true, "verify_ms": 1.8, "reason": "all hashes match" }
{
  "verdict":          "ANSWERED",
  "answer":           "Data retention policy requires 7-year storage...",
  "primary_doc_id":   "policies/retention-2024.pdf#p3",
  "citations": [
    { "doc_id": "policies/retention-2024.pdf#p3", "score": 9.41 },
    { "doc_id": "compliance/gdpr-summary.md",     "score": 7.22 }
  ],
  "proof_signature":  "PROOF-A8B3C1D2E4F5G6H7I8J9K0L1",
  "latency_ms":       113.4
}
Two outputs · no third option

Cited answer, or honest abstention.

There is no "best guess." There is no fabricated citation. The set is partitioned.

VERDICT · ANSWERED
"What's the maximum adult dose of aspirin?"
Standard adult dose is 325–650 mg every 4–6 hours, not exceeding 4,000 mg per day.
✓ 3 cited sources ✓ proof verifiable in 31ms
PROOF-A8B3C1D2E4F5G6H7I8J9K0L1
VERDICT · ABSTAINED
"What is the boiling point of mercury?"
Your corpus does not contain this. Missing concepts: boiling, mercury. Closest available terms: "vapor pressure", "thermometer". Upload chemistry references to close the gap.
⚠ honest abstention ⚠ no fabrication
AWARE-1F2E3D4C5B6A7980
Retrieval layer for agents

Your agent shouldn't read 5,000 documents.
Provable picks the 47 that matter.

Modern agents waste their context window on irrelevant text. Provable filters your corpus down to the chunks that actually matter for the decision. Cited, signed, ranked. Drop them straight into the system prompt. The agent decides on verified ground truth.

5,000
documents in corpus
47
relevant chunks fetched
1
verified decision
23ms
end-to-end context build
94%
context-window saved
100%
chunks SHA-256 signed
01 YOUR CORPUS 5,283 documents
02 PROVABLE 47 chunks · 23 ms
retention-policy-2024.pdf · p3 98%
"All customer records must be retained for a minimum of 7 years from the date of last activity..."
gdpr-summary.pdf · p12 92%
"Personal data may be pseudonymized after 5 years if no further processing is necessary..."
ccpa-compliance-q3.pdf · §4.2 87%
"California residents may request deletion of personal information except where retention is required by law..."
+ 44 more chunks ranked & signed
03 YOUR AGENT cited decision
DECISION Retain customer record for 7 years per retention-policy-2024.pdf#p3. Redact PII after year 5 per gdpr-summary.pdf#p12. PROOF-A8B3C1D2E4F5G6H7
Built for the work that can't afford a wrong answer

Point it at your industry's documents.

Healthcare

Dosing answers cited to the FDA monograph.

Clinical AI that cites. Audit trail verifiable in milliseconds.

Financial services

Every customer-facing answer becomes one cryptographic check.

Cited to the actual disclosure document. Supervision review becomes minutes, not hours.

Government & public benefit

Every eligibility decision defensible at appeal.

Cited to the actual regulation. The administrative record writes itself.

Your corpus never leaves your host.

On-prem or air-gapped No external embedding API Append-only signed audit log Anyone can re-derive the proof
Live demo

Try it on a sample corpus.

Pre-loaded with 518 documents across medical, legal, finance, science, and software. Drop your own folder anytime. Same engine.

docs
0queries
0verified
0tampers
connecting…
Click an example, or ask your own:
How it works

Three steps. No magic.

01
Index your folder.
Plain text, PDFs, any size. Stays on your machine.
02
Ask a question.
Get a cited answer plus a signed proof anyone can verify.
03
Get honest "I don't know."
If the corpus can't answer, it names the gap. It will not fabricate.
terminal
$ provable index ./pdfs/
indexing 12,847 documents…
✓ sparse + dense built (4.7s)
✓ Merkle root: 4a3f…b91d
$ provable ask "data retention policy?"
ANSWERED Data must be retained for 7 years…
cited: policies/retention-2024.pdf · p3-4
proof: PROOF-A8B3C1D2E4F5… ✓ 31ms
$ provable ask "boiling point of mercury?"
ABSTAINED not in your corpus
missing: boiling, mercury
closest: vapor pressure, thermometer

Point it at your corpus.

One folder, any format. Cited answers, signed proofs. Honest abstentions.