From PDFs to AI-ready structured data: a deep dive