macOS · indexes the knowledge · concept graph · chat

Your handwritten notes,
as a knowledge graph.

Engram turns Microsoft OneNote courses (exported as PDF) into an Obsidian-native Markdown vault on your Mac. It's an index, not a transcription: each page becomes a concise, searchable summary of its key ideas and formulas, the original right below. Related notes link into a multi-layer concept graph — and now you can chat with it, grounded in your notes and cited.

🔒 Local-first · nothing leaves your Mac 🪶 The raw page is always the source of truth ♻️ Incremental & idempotent

A research prototype by Kaiwen Bian

PROTOTYPE This is an early v0.2 prototype (v0.2 adds chat). It ingests Microsoft OneNote notebooks exported as PDF, and runs on macOS 14+ (Apple Silicon). See prototype status for what is and isn't validated yet.

From an opaque PDF to a navigable brain

You already write notes on an iPad with Apple Pencil in OneNote. Big notebooks can't be exported whole, so you export one section group at a time (e.g. one course) to PDF. Engram groups several such PDFs into a single domain — treated as one combined knowledge base — and turns it into something you can actually think with.

Before
CSE 234.pdf · DSC 120.pdf · …
one PDF per course · hundreds of pages · not searchable · no structure
After
📁 Data Science (domain)
📁 CSE 234 (section group)
📁 MLSys
📄 Auto Differentiation.md+ .pdf
📁 CSE 150B
📁 MDP
📄 Value Iteration.md+ .pdf
↳ ## Related → [[Auto Differentiation]]

A hierarchy that is also a graph

Domain → Section group → Section → Note gives you the tree. A holistic pass then distills the whole domain into a multi-layer concept map — Concepts → Themes → Areas — where nodes cluster by parent and connect by shared notes (solid) and model knowledge (dashed). The macOS app renders each layer as an interactive, force-directed graph — switch layers, drag a node, or click one to see the notes it covers.

Theme / Concept (by cluster) note shared note related (knowledge)
Conceptual mockup of the app's concept-graph view · nodes colored by cluster · same physics as the real GraphEngine (repulsion · springs · gravity · clustering). Click a node.

How a page becomes a note

Each note keeps its original page as a faithful per-note PDF leaf; above it sits a concise, searchable index entry — the page's key ideas and all its formulas — always one scroll from the original. Engram indexes the knowledge, not the page: nothing is invented, illegible parts are flagged, and papers / homeworks / long code are skipped.

Pop & Timing
handwritten · ruled paper preserved
domain: Skateboarding section_group: Tricks section: Ollies kind: handwritten

Pop & Timing

## Mechanics

Snap the tail down, then slide the front foot up the board to level it out.

  • Pop comes from the back foot, not a jump.
  • Timing: pop → drag → level at the apex.
$$h \approx \tfrac{1}{2}\,v_{\text{pop}}\,t$$
![[Pop & Timing.pdf]]   # the raw leaf, embedded
  1. 1ParsePDF → notes; recover Domain → Section group → Section → Note from the text layer (PyMuPDF, no model).
  2. 2LeafSplit each note's original page(s) into a faithful per-note PDF + page images.
  3. 3IndexA vision model (Claude Code or local) writes a concise, searchable summary — key ideas + all formulas (LaTeX) — not a transcription. Papers, homeworks & long code notebooks are filtered out.
  4. 4Roll-upSection · section-group · domain overview notes (MOCs) wikilink the tree.
  5. 5VaultIdempotent write: <Domain>/<SectionGroup>/<Section>/<Title>.md beside .pdf.
  6. 6LinkHolistic domain-wide pass adds ## Related wikilinks — even across section groups.
  7. 7ConceptsA multi-layer concept map (Concepts → Themes → Areas) — the graph view.
  8. 8Chatv0.2 — once the graph exists, query a course/concept or synthesise a field; answers are grounded in your notes and cited.
New in v0.2

Talk to your knowledge base

Once the concept graph exists, you can chat with your notes. Context is pulled deterministically from the graph and your summaries — no embeddings — and every answer is grounded and cited (the notes it used become clickable sources), with anything beyond your notes clearly marked. Two modes: Query a single course or concept, or Synthesise a whole field.

💬 Query ✦ Synthesis Concept: Markov Decision Processes

How does value iteration converge, and why?

Thinking…

Value iteration converges because the Bellman operator is a γ-contraction in the sup-norm:

‖ℬV − ℬV′‖ ≤ γ‖V − V′‖,  γ < 1

so the value estimates form a Cauchy sequence with a unique fixed point V* — geometric convergence from any start.

Sources 📄 Walking in Value Space 📄 “Singularity”

▸ Beyond your notes — next: asynchronous / prioritized sweeping, sample-based VI.

  • Grounded & citedAnswers prefer what your notes say and link them as clickable [[sources]].
  • Marked extensionsAnything the model adds beyond your notes goes under a clear “Beyond your notes” line — never disguised.
  • Query vs SynthesisLocal questions about a concept/course, or big-picture questions across a whole field.
  • Your choice of modelDefaults to Claude (via your Claude Code login); switch models in Settings.
  • No internet yetUses the model's own knowledge — it can't browse the web or use external (MCP) tools yet.

What makes it different

Built for a personal knowledge base you actually trust — local, faithful, and reproducible.

🎛️

Your choice of model

Pick the reader in Settings: Claude Code (vision, via your login — best quality, default) or a local Qwen2.5-VL 7B/32B, fully on-device. The concept-map & cross-link reasoning runs through Claude Code too — all from a dropdown, no code.

🎯

An index, not an archive

Engram distills each note to a searchable digest of its key ideas and all formulas — so you can find and reconnect, then jump to the original page (the source of truth, embedded in every note). It filters out noise — attached papers, homeworks, long code — and never invents; illegible parts are flagged.

🕸️

Multi-layer concept graph

Beyond folders, a holistic pass builds a hierarchy of concept graphs — Concepts → Themes → Areas — that span section groups. Nodes cluster by parent; links are grounded shared-note edges plus dashed knowledge edges. Switch layers as tabs in the app.

💬

Chat — query & synthesis

Ask your knowledge base. Query a course or a concept, or synthesise a whole field. Context is pulled from the graph + your notes; answers are grounded and cited (clickable sources), with anything beyond your notes clearly marked. Your choice of Claude model.

♻️

Incremental, per section group

Re-add one course and only its changed or new notes are re-processed, keyed on a rendered-page fingerprint — the rest of the domain is untouched. An unchanged export is a true no-op (the model never even loads), and every run reports what changed.

📝

Obsidian-native output

Plain Markdown on disk — frontmatter, [[wikilinks]], embedded PDFs, section · section-group · domain overviews. Open the folder in Obsidian, or paste big slices into a long-context LLM (Karpathy's "second brain").

🖥️

Native macOS app

A SwiftUI app to add/replace a section-group PDF (with a change report), browse notes by section group → section, read each summary with LaTeX & code rendered beside its original page — and explore the multi-layer concept graph.

Run it locally

Python 3.11+ and uv on macOS 14+ (Apple Silicon). The first run downloads the local model (~5 GB).

terminal
# 1 · set up the environment (add --extra local for the on-device model)
uv sync --extra dev --extra local

# 2 · create a domain, then add a section-group PDF (one per course)
uv run python -m engram domain create "Data Science"
uv run python -m engram add "Data Science" "CSE 234.pdf"

# 3 · cross-link the whole domain (links span section groups)
uv run python -m engram link "Data Science"

# 4 · build the multi-layer concept map → the graph
uv run python -m engram concepts "Data Science" --layers 3

uv run python -m engram status

Output lands in ~/Engram/<Domain>/<SectionGroup>/<Section>/<Title>.{md,pdf} — open ~/Engram in Obsidian.

Prefer a GUI? Build the native app:

macapp
cd macapp && ./build_app.sh && open Engram.app
Full README & source on GitHub ↗

Prototype status — the honest part

Engram's whole ethos is honesty over polish. This is a working v0.2, run end-to-end on real multi-course domains (and v0.1 validated on a known notebook of 18 notes, 73 links) — but it's early, and several things are heuristic or not yet field-validated.

✓ Works today

  • Domains: many section-group PDFs as one knowledge base
  • Searchable index entries — key ideas + all formulas (LaTeX)
  • Filters out papers, homeworks & long code notebooks
  • Model picker: Claude Code (vision) or local Qwen2.5-VL 7B/32B
  • Domain-wide cross-links + a multi-layer concept graph (shared across notebooks)
  • Chat (v0.2): grounded, cited query & synthesis
  • Incremental rebuilds + a native macOS app

⚠ Prototype caveats

  • OneNote PDF export only — no other capture sources yet
  • macOS 14+ on Apple Silicon (~16 GB RAM for the 7B model)
  • Structure parsing is heuristic & locale-dependent (needs an English fallback)
  • Concept / cross-link / chat reasoning runs through your Claude Code login
  • Chat has no internet / MCP tools yet — model's own knowledge only
  • Re-exports aren't pixel-stable, so re-added notes read as "modified"
  • One-way only (OneNote → vault); never written back