Architecture

Core Architecture

How the system is structured internally — from how your files are stored to how PDFs stay pixel-perfect through the extraction pipeline.

6 min read·Technical

Zero-loss PDF extraction

Most AI PDF tools convert your file into plain text before analysing it. In doing so they destroy fonts, tables, diagrams, and layout. The extracted output is a degraded approximation of the original.

This system takes a different approach. The AI reads your document to make a decision — which pages to keep. Once that decision is made, the AI is done. The engine goes back to the original raw PDF file, slices the approved page data byte-for-byte, and writes it into a new file. No rendering, no conversion, no re-encoding.

PDF mode pipeline

Original PDF

→

AI reads → decides

→

Raw pages sliced

→

New PDF stitched

Font preservation. Because pages are copied raw, every custom font, mathematical symbol, coloured table cell, and annotated diagram is preserved exactly as in the original. The output is indistinguishable from the source.

Multi-file merge architecture

When multiple files are uploaded in a single job, each file is extracted independently in parallel. The AI analyses all files simultaneously, and the resulting keep/discard verdicts are merged into a unified ordered list before the output file is assembled.

Example: 3-file merge

Paper-2022.pdf22 pages

6 pages kept

Paper-2023.pdf24 pages

8 pages kept

Paper-2024.pdf20 pages

5 pages kept

Merged output19-page combined PDF

Pages from different source files are interleaved in a logical order determined by the AI, not just concatenated. If Paper 2023 has a better introduction to a topic, it may appear before Paper 2022 material in the final output.

Client-side storage

The Study Editor stores files locally using IndexedDB — a low-level key-value store built into every modern browser. Files persist between browser sessions and survive page refreshes.

Storage type	What\'s stored	Cleared by
IndexedDB	PDF & Markdown files, annotations	Clearing site data or explicit delete
Session memory	Active selection, toolbar state	Page refresh
Server (none)	Nothing	N/A

Because files live in your browser, they are device-specific. Files uploaded on your laptop won\'t appear on your phone. Clearing your browser cache or storage will permanently remove all stored documents.

Processing Engine Feature Overview