Core Architecture
How the system is structured internally — from how your files are stored to how PDFs stay pixel-perfect through the extraction pipeline.
Zero-loss PDF extraction
Most AI PDF tools convert your file into plain text before analysing it. In doing so they destroy fonts, tables, diagrams, and layout. The extracted output is a degraded approximation of the original.
This system takes a different approach. The AI reads your document to make a decision — which pages to keep. Once that decision is made, the AI is done. The engine goes back to the original raw PDF file, slices the approved page data byte-for-byte, and writes it into a new file. No rendering, no conversion, no re-encoding.
Multi-file merge architecture
When multiple files are uploaded in a single job, each file is extracted independently in parallel. The AI analyses all files simultaneously, and the resulting keep/discard verdicts are merged into a unified ordered list before the output file is assembled.
Paper-2022.pdf22 pagesPaper-2023.pdf24 pagesPaper-2024.pdf20 pagesClient-side storage
The Study Editor stores files locally using IndexedDB — a low-level key-value store built into every modern browser. Files persist between browser sessions and survive page refreshes.
| Storage type | What\'s stored | Cleared by |
|---|---|---|
| IndexedDB | PDF & Markdown files, annotations | Clearing site data or explicit delete |
| Session memory | Active selection, toolbar state | Page refresh |
| Server (none) | Nothing | N/A |