Reference

Feature Overview

A complete reference to every capability of the extraction engine. Understanding these features will help you get the most out of every job you run.

6 min read·Core features

Smart question extraction

Standard PDF readers give you Ctrl+F for exact keyword matches. The extraction engine actually reads each page and understands its semantic meaning. You can ask for "questions about circular motion" even if the word "motion" doesn't appear on the page.

The AI builds an understanding of each page's role: is it a question, an answer key, a diagram explanation, a title slide, or a bibliography? This classification step is what lets it filter with such precision.

Example

"Only keep pages that contain practice questions about Newton's laws. Ignore worked examples, formula sheets, and the answer section."

The engine understands related concepts — asking for "thermodynamics" will also match pages that discuss "ideal gas laws" and "entropy" even if those exact words aren't in your prompt.

Smart bucketing (multi-file splitting)

Instead of filtering down to a single output, you can instruct the AI to reorganise a document into multiple named groups — each becoming a separate downloaded file.

Bucket prompt example

"Split this 200-page lab manual into three PDFs: 'Setup Instructions', 'Maintenance Procedures', and 'Troubleshooting Guide'."

This produces exactly three PDF downloads, each containing only the pages that match that bucket's description. Pages that don't match any bucket are discarded.

Browser may block multiple downloads. When generating 2+ buckets, your browser will likely show a permissions prompt in the address bar. Click "Allow" to receive all your files.

PDF & Markdown export

Two fundamentally different output modes serve different use cases. Choose before hitting send.

Property	PDF mode	Markdown mode
Output file	.pdf	.md
Formatting	Pixel-perfect copy	Clean text, LaTeX math
Diagrams & images	Yes — preserved	No — text only
Editable text	No	Yes
Works in Editor	Yes	Yes
Notion / Obsidian	No	Yes
Cost	Lower per page	Slightly higher

Multi-file merging

You can upload multiple PDF files in the same job. The engine reads every page across all files, applies your prompt uniformly, and merges the matching pages into a single coherent output.

Maths-2022-Paper1.pdf

Maths-2023-Paper1.pdf

Maths-2024-Paper1.pdf

one prompt: "extract all integration questions"

integration-questions.pdf — 14 pages across 3 papers

Use Cases Processing Engine