AI for precise long text summarization that doesn't miss important details

  • while building AI applications I noticed a perpetual need for a really good summarization engine for applications to work well.

    Voice transcripts are a common use case. I've a long text transcript that I love to throw at an LLM to give me insights for:

    1. generating summary of insights relevant for learning/researching/sales prospecting about X from the Zoom transcript

    2. generate question and answer pairs using YouTube videos as a synthetic data finetuning data source

    3. Synthesising most important facts of multi long documents in Perplexity-type of AI search

    After going down the rabbit hole of past and present solutions, I found most projects leverage long context length LLMs, which I wasn't too satisfied with. Studies have also shown this is not a long term solution since such LLMs seem unable to extract facts from the early sections of long text.

    Inspired by code agents, I decided to try an agent-based approach to summarization. Like code agents who significantly improve when used in an agentic approach instead of a zero-shot way, I implemented a hierarchical approach to generating detailed summaries with long text documents like research papers.

    boy, I was blown away by the results. You can view them in the Google Docs I attached.

    What do you think of these summaries? I would love the community's feedback, especially on summarization techniques that have worked well from your experience that don't require significant set up.

    Disclaimer: I did not fact check the figures in the summaries so they might be subject to some hallucination

    will the community be keen to see more of this project?