It's 2025, AI is everywhere, why can't something summarize podcasts for me?

  • I can do this locally.

    Run MacWhisper to transcribe, or for Apple Podcasts just cut and paste the transcription they give you. For Linux there are CLI speech to text tools using the same open whisper models.

    Then use any model to summarize the text.

    I haven’t done this but with CLI whisper and CLI ollama or llama.cpp I could write a shell script to summarize all podcasts in a directory. It’d take a few hours on my laptop but it’d work and would cost nothing but a little electricity.

    The same techniques could be used to sift through other archives of text or audio.

  • I'm sure tools exist, but for a DIY solution that any LLM can help you piece together:

    - Silero VAD for chunking audio;

    - Whisper for transcribing;

    - phi3 or bart-large-cnn (which is finetuned to summarize) for summarizing;

    This entire stack can run with your machine in airplane mode once you've downloaded the models.

  • Because where's the fun in that?