Engram documentation
Engram turns Microsoft OneNote courses (exported as PDF) into a local, Obsidian-native index of your knowledge — searchable summaries of each page's key ideas and formulas, organized as a hierarchy that is also a concept graph.
Overview
Engram is an index and connector, not a transcriber. For each note it writes a concise, searchable summary of the page's key ideas and all its formulas, and keeps the original page right below as the source of truth. It then links related notes and distils a multi-layer concept graph — so you can find an idea, see what it connects to, and jump back to the page.
- Local-first or Claude — read pages with the on-device model or with Claude Code.
- Focused — it filters out noise (papers, homeworks, long code notebooks).
- Obsidian-native — plain Markdown + embedded PDFs +
[[wikilinks]]. - Incremental — re-add a course and only changed notes are re-processed.
Recommended start ★
Bringing in your whole library on day one means committing hours of model time before you know whether the summaries, model, and filters are right for you. Try one course first, look at the results, then scale.
- Export one course to PDF (one section group — see
Getting notes out) and put the
.pdfon your Mac. - Add it to a domain (the domain is auto-created):
uv run python -m engram add "Data Science" "CSE 234.pdf" - Open the Mac app and read a few notes — are the summaries capturing the right ideas and formulas? Is anything you care about getting skipped?
- Tune in Settings (gear): pick the transcription model (Claude Code = best; local 7B = fast/offline) and adjust the filters.
- Happy? Add the rest of your courses, then build the graph:
uv run python -m engram link "Data Science" uv run python -m engram concepts "Data Science" --layers 3
Install & requirements
- macOS 14+ on Apple Silicon (16 GB RAM for the 7B local model; 64 GB for 32B).
- Python 3.11+ with uv.
- Claude Code (for the default transcription + the concept/link reasoning) — or skip it and run fully local.
- Obsidian to browse the vault (optional but intended).
- A OneNote PDF export (made on the OneNote Windows app).
# set up; --extra local adds the on-device vision model (mlx-vlm)
uv sync --extra dev --extra local
cd macapp && ./build_app.sh && open Engram.app # the GUI
How notes are organized
Engram mirrors OneNote's structure with one extra grouping level:
Domain e.g. "Data Science" (many courses, one knowledge base)
└─ Section group e.g. "CSE 234" (one exported PDF = one course)
└─ Section e.g. "MLSys"
└─ Note a OneNote page → <Title>.md + <Title>.pdf (the leaf)
On disk: ~/Engram/<Domain>/<SectionGroup>/<Section>/<Title>.{md,pdf}.
The split per-note PDF leaf is the source of truth; the .md
summary is a searchable projection above it.
Getting notes out of OneNote
OneNote can't reliably export a whole large notebook, so export per section group
(one course). The repo's windows-export/ helper automates it:
export_onenote.ps1(Windows) drives the OneNote desktop COM API to export each section group to PDF (handles nested groups + on-demand sync).merge_sections.py(Mac) merges any section-by-section exports into one<Course>.pdf:uv run python windows-export/merge_sections.py "~/exports/Data Science"
The result is one <Course>.pdf per section group, ready for engram add.
An index, not a transcription
Each note becomes a concise, searchable digest: every key concept, definition, and term, plus all formulas verbatim in LaTeX — not a word-for-word copy. Routine examples and arithmetic are summarized; the full page is one scroll below. The summary never invents content, and illegible parts are flagged.
Transcription models
Choose the model that reads your pages in Settings → Transcription model (or via the CLI). All produce the same index-style summary.
| Option | What it is | Best for |
|---|---|---|
claude-code | Claude vision via your Claude Code login (default) | Best quality & math; no API key. Watch rate limits on big batches. |
local-7b | Qwen2.5-VL 7B on-device (mlx-vlm) | Fast, free, fully offline/private. |
local-32b | Qwen2.5-VL 32B on-device | Better local quality; ~3-4× slower, needs ~64 GB. |
uv run python -m engram config set transcribe local-7b # or claude-code / local-32b
The concept-map & cross-link reasoning runs through Claude Code too (text only) — pick the model under Settings → Concept & cross-link model.
Filters — what gets indexed
Engram indexes knowledge, not everything. By default it skips (and reports — never silently drops) content that's noise for an index. Skipped notes still exist in the source PDF. Toggle these in Settings or the config.
| Filter | Skips | Config |
|---|---|---|
| Papers | attached papers & printouts/scans | skip_papers |
| Homeworks | homework / hw / assignment / lab / project | skip_homework |
| Long code | code notebooks over N pages | skip_code_over_pages |
| Custom | any section/title substring you add (e.g. "lit") | skip_section_patterns |
Cross-links
A holistic, domain-wide pass adds a ## Related section to each
note — links to related notes, including across section groups. Safe by construction
(no dangling links, symmetric, capped).
uv run python -m engram link "Data Science"
Concept graph
Beyond folders, Engram distils the domain into a multi-layer graph of concepts (not files): Concepts → Themes → Areas. Each level is a tab in the app you switch between to change granularity. Nodes are colored by cluster (their parent); edges are solid (two concepts share a note) or dashed (related by the model's general knowledge). Click a concept to see — and open — the notes it covers.
uv run python -m engram concepts "Data Science" --layers 3 # 3 tabs
In the app: the brain icon builds it (pick scope + layers); the Graph icon opens it.
Cross-notebook graphs
A concept graph can span one notebook or several. Building across notebooks (e.g. Data Science + Mathematics) surfaces concepts that connect across domains. In the brain builder, just check 2+ notebooks; on the CLI, pass more than one domain:
uv run python -m engram concepts "Data Science" "Mathematics" --layers 3
Combined maps are stored under ~/Engram/_concept_maps/<A + B>/; clicking a concept opens its note even if it lives in the other notebook. A combined map is shared — it opens from either member notebook's entrance, and a switcher lets you flip between a notebook's own map and any shared one.
Chat — query & synthesis
Once a concept graph exists, you can chat with your knowledge base. Context is
pulled deterministically from the graph and your notes (no embeddings) and the
answer is grounded and cited — it links the notes it drew on as
[[wikilinks]] (shown as clickable Sources), and anything the model adds
beyond your notes is put under a clearly-marked "Beyond your notes" section.
Two modes:
| Mode | For | Context it sees |
|---|---|---|
| Query | a specific question about one course or one concept | that concept's notes + its graph neighbours (or all of a course's note summaries) |
| Synthesis | a big-picture question about a whole field | the concept graph + overview MOCs + the field's note summaries (budgeted, most-central first) |
In the app: the Chat toolbar button opens it with Query / Synthesis tabs; you can also click a node in the graph and pick "Ask about this" to chat about that concept. The chat model defaults to Claude and is switchable in Settings.
# ask about one concept, or one course
uv run python -m engram chat query "Data Science" "How does value iteration converge?" --concept "Markov Decision Processes"
uv run python -m engram chat query "Data Science" "Give me an exam cheat-sheet" --course "DSC 120"
# a big question about a whole field (one or more notebooks)
uv run python -m engram chat synthesis "Mathematics" -q "What's the unifying story, and what's missing?"
The Mac app
- Sidebar: domains → section-group folders → sections → notes (folded by default).
- Note view: the Markdown summary with LaTeX math + code rendered (KaTeX), beside the original-page PDF.
- Add / Link / Concepts / Graph / Chat / Settings in the toolbar.
- Settings (gear): model dropdowns (transcription + concept + chat), filter toggles, Claude Code status.
- Graph: the multi-layer concept graph with granularity tabs; click a node for its notes — or to chat about it.
- Chat: Query / Synthesis modes with grounded, cited answers (enabled once a concept graph exists).
cd macapp && ./build_app.sh && open Engram.app
CLI reference
# domains
engram domain create "Data Science"
engram domain list
engram domain remove "Data Science"
# add / update one section group (a PDF) — incremental
engram add "Data Science" "CSE 234.pdf" [--group "CSE 234"]
# domain-wide cross-links
engram link "Data Science"
# multi-layer concept map (1 domain, or several for cross-notebook)
engram concepts "Data Science" ["Mathematics"] [--layers 3]
# chat (needs a concept graph): query a course/concept, or synthesise a field
engram chat query "Data Science" "…question…" (--concept "…" | --course "…")
engram chat synthesis "Mathematics" ["Data Science"] -q "…big question…"
# inspect structure (no model) / status / config
engram inspect "CSE 234.pdf"
engram status
engram config set transcribe claude-code # transcribe / reasoning_model / chat_model / skip_* / vault_path
engram config set-key sk-ant-… # optional Claude API key
Run as uv run python -m engram … from the repo root.
Honest limitations
- Structure parsing is heuristic & locale-dependent (note boundaries + the
分区 … 的第 N 页footer); an English-locale fallback is future work. - Re-exports aren't pixel-stable — a fresh OneNote export of the same notes reads as "modified," so cross-export incremental determinism isn't solved.
- Claude Code rate limits can throttle a few-hundred-note batch; switch to a local model if so.
- Concept extraction quality varies with the model; regenerate to re-roll.
- One-way (OneNote → vault); never written back.
Engram is an early v0.1 research prototype · source on GitHub · back to home