Audapolis: Edit audio files by transcript, not waveform

  • I remember when Adobe demoed this idea of being able to edit waveforms by the recognized text back in 2016 and it was pretty mind blowing for the time.

    https://youtu.be/I3l4XLZ59iw

    EDIT: I could also definitely see Audapolis being useful if you could integrate it into a podcast's post processing flow (volume normalization, de-essing) by recognizing certain verbal tics and automatically removing them from the audio such as "ummmm...", etc.

  • A genuinely free alternative to Descript sounds very useful.

    I've always liked the idea of Descript and was considering building something similar before it came out. The problem is my use case is a couple of videos a year so doesn't fit with an expensive monthly subscription

  • I've spent some of my free time over the past couple of months working on something similar. It's in a decent state but I need help from somebody who understands the .fcpxml format so you can export your edits to Davinci and FCP.

    Take a look at https://matcha.video

  • This is awesome to see as an open source project.

    This functionality is some of my favorite when editing videos in Descript. It’s so much easier than chopping up waveforms in Audacity

  • This is pretty dated and doesn't support whisper which is the de-facto speech recognition model currently

  • Demo Video: https://pajowu.de/audapolis_intro.mp4

  • The other day I was using the voice memos app on iOS 18 and was surprised to find that it also supports editing the recording by transcript

  • One of the hosts of a podcast that I listen to has had positive things to say about DeScript.[0] Just mentioning it because he's been talking about it for a few years so I expect its had a good amount of feature development over time.

    [0] descript.com/

  • If the maintainer is reading, having a demo video would be nice.

  • Hindenburg also added this capability.

    > Hindenburg’s manuscript feature gives you a complete overview of your audio. You can select the text just as you would in a text document and watch as your edits are made in real-time. If you need to export your text in a specific format, no problem. Hindenburg supports the most common text and transcription export formats.

    https://hindenburg.com/

  • Nice, are there plans to notarize the mac app?

    I built something similar here: https://bigwav.app

  • this looks great! will try out. I built a similar but very scrappy tool for the same usecase last year, I'd probably not build it if i found this.

    [0] https://github.com/geekodour/wscribe-editor

  • This really needs a video demo or at least a more in depth text description of the features. Will download later to try but curious does this just do simple hard cuts on audio text or is there any ai magic for blending sentence timing if that makes sense?

    A number of comments turned me onto Descript -- made a similar comment on another audio thread recently: drives me absolutely insane how all audio tools with any AI are web based monthly saas instead of offline private gpu upfront purchase.

  • Combine this with the tech to generate new audio matching the speaker's voice profile, and you've really got something cool.

  • That’s awesome!

    Is 1 emoji for each commit title a new trend?

  • This is exciting to see - it seems the last release of was a year ago.

    Can anyone clarify if this project is active?

  • Call me a jerk, but anyone who is editing audio seriously, probably wants the waveform, no?

  • Somewhat off-topic: I saw the funding note at the bottom - it’s pretty cool that the German government is giving some funding to projects like this. I wonder how much the US is doing in that regard, like if there’s a list of projects that tax dollars goes towards.

  • IMHO you should really change the headline on this. I'm an audio person, and my first thought was "that's stupid, words are awful at describing sound". But then I looked, and editing transcriptions of voice recordings by word is actually a great idea. That was not the impression the headline gave me, FWIW!

  • [dead]

  • And here I was expecting that I could edit the text and the app would change the audio file to say what I had typed...