Upload PDF
Drag & Drop Your PDF Here
Multi-page PDFs supported. Processing runs on the server; download a styled .xlsx when ready.
Preview
PDF (left) · Excel preview (right)No PDF to Display
Upload a PDF to preview it here
No Data to DisplayConvert a PDF to see extracted layout here |
- PyMuPDF extraction for text positions, fonts, colors, and embedded images
- Camelot / lattice-stream table detection when vector lines are present
- OpenPyXL output with fills, fonts, borders, merges, and column sizing
- Optional OCR (pytesseract) for scanned pages when no text is found
- Optional ConvertAPI (cloud) for raw table extraction — not a visual clone of the PDF; use Local for TRACES-style layout and colors
- Multi-page PDFs with per-page sections in the workbook
- Processing on your Python backend — files stored under uploads/ and outputs/
- Review outputs before sharing; verify financial or legal documents manually
- Local engine: no external API. ConvertAPI engine uploads the PDF to ConvertAPI and returns their XLSX — use only if you accept their terms and pricing.
Disclaimer
Pixel-perfect PDF fidelity in Excel is not always possible; results depend on the PDF. Always verify the spreadsheet against the source PDF.
- ConvertAPI turns PDF tables into spreadsheet cells; it does not reproduce the PDF’s full look (logos, fonts, colors, exact layout). For outputs styled closer to the source (e.g. TRACES 26AS), use the Local engine.
- ConvertAPI layout options: Multiple sheets vs One sheet and optional Include formatting map to ConvertAPI’s parameters. They adjust worksheets and extra content, not full PDF-style appearance.
- Vector/text PDFs give the best layout match; heavy scans rely on OCR quality
- Camelot may require Ghostscript for some files — see camelot docs
- Very large PDFs may take longer; progress is shown during conversion
Typical Use Cases
Frequently Asked Questions
Will the Excel file look exactly like the PDF?
We map coordinates, styles, images, and detected tables as closely as Excel allows. Complex PDF graphics may require manual touch-up.
Do you support scanned PDFs?
If no text is detected, the backend attempts OCR when pytesseract and Tesseract are installed on the server.
What do I download?
A single .xlsx file built with OpenPyXL from the extracted structure.