Extreme video compression with prediction using pre-trainded diffusion models

  • Extreme compression will be when you put in a movie and get a SORA prompt back that regenerates something close enough to the movie.

  • Ahhh, Sloot's digital coding system [1] is finally here ;).

    [1] https://en.m.wikipedia.org/wiki/Sloot_Digital_Coding_System

  • How fast is this and how big is the decoder/encoder? The model weights are not accessible.

    From the description, it looks like it's only being tested with 128x128 frames, which implies that the speed is very low.

  • > It can be observed that our model outperforms them at low bitrates

    It can? Maybe I'm misunderstanding the graphs but it doesn't look like it to me?

  • Back in 2005 there was a collegue at my first job writing video format converters software. He was considered a genius and the stereo type of an introvert software developer. He claimed that one day an entire movie could be compressesed on a single floppydisk. Everybody laughed and thought he was weird. He might be right after all.

  • Here's the research behind this: https://arxiv.org/html/2402.08934v1

    As a casual non-scholar, non-AI person trying to parse this though, it's infuriatingly convoluted. I was expecting a table of "given source file X, we got file size Y with quality loss Z", but while quality (SSIM/LPIPS) is compared to standard codecs like H.264, for the life of me I can't find any measure of how efficient the compression is here.

    Applying AI to image compression has been tried before though, with distinctly mediocre results: some may recall the Xerox debacle about 10 years, when it turned out copiers were helpfully "optimizing" images by replacing digits with others in invoices, architectural drawings, etc.

    https://www.theverge.com/2013/8/6/4594482/xerox-copiers-rand...

  • It’s uncanny how much of the current stuff has been predicted by the sitcom -“Silicon Valley”

  • It's important to remember that any compression gains must include the size of the decompressor which, I assume, will include an enormous diffusion model.

  • Does anyone remember the https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System?

  • Can you share example videos?

  • > Extreme video compression with prediction using pre-trainded diffusion models

    Is this more extreme than youtube ?

  • I wonder how effective a speed focused variation could be for quality among 264, 265, and AV1.

  • Middle-out.