Hacker News

Extreme video compression with prediction using pre-trainded diffusion models

by john_gon 2/19/2024, 2:51:55 PM with 13 comments

by Animatson 2/19/2024, 10:39:39 PM
Extreme compression will be when you put in a movie and get a SORA prompt back that regenerates something close enough to the movie.
by ToJanson 2/20/2024, 12:25:51 AM
Ahhh, Sloot's digital coding system [1] is finally here ;).
[1] https://en.m.wikipedia.org/wiki/Sloot_Digital_Coding_System
by userbinatoron 2/19/2024, 11:29:52 PM
How fast is this and how big is the decoder/encoder? The model weights are not accessible.
From the description, it looks like it's only being tested with 128x128 frames, which implies that the speed is very low.
by IshKebabon 2/19/2024, 10:34:34 PM
> It can be observed that our model outperforms them at low bitrates
It can? Maybe I'm misunderstanding the graphs but it doesn't look like it to me?
by holodukeon 2/20/2024, 6:55:03 AM
Back in 2005 there was a collegue at my first job writing video format converters software. He was considered a genius and the stereo type of an introvert software developer. He claimed that one day an entire movie could be compressesed on a single floppydisk. Everybody laughed and thought he was weird. He might be right after all.
by resolutebaton 2/20/2024, 3:55:11 AM
Here's the research behind this: https://arxiv.org/html/2402.08934v1
As a casual non-scholar, non-AI person trying to parse this though, it's infuriatingly convoluted. I was expecting a table of "given source file X, we got file size Y with quality loss Z", but while quality (SSIM/LPIPS) is compared to standard codecs like H.264, for the life of me I can't find any measure of how efficient the compression is here.
Applying AI to image compression has been tried before though, with distinctly mediocre results: some may recall the Xerox debacle about 10 years, when it turned out copiers were helpfully "optimizing" images by replacing digits with others in invoices, architectural drawings, etc.
https://www.theverge.com/2013/8/6/4594482/xerox-copiers-rand...
by sbalamuruganon 2/19/2024, 10:55:41 PM
It’s uncanny how much of the current stuff has been predicted by the sitcom -“Silicon Valley”
by LeoPantheraon 2/20/2024, 5:24:13 AM
It's important to remember that any compression gains must include the size of the decompressor which, I assume, will include an enormous diffusion model.
by smerikon 2/20/2024, 11:19:29 AM
Does anyone remember the https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System?
by zaptremon 2/19/2024, 9:28:13 PM
Can you share example videos?
by hulituon 2/20/2024, 12:00:51 PM
> Extreme video compression with prediction using pre-trainded diffusion models
Is this more extreme than youtube ?
by mjevanson 2/19/2024, 10:23:52 PM
I wonder how effective a speed focused variation could be for quality among 264, 265, and AV1.
by hosejaon 2/20/2024, 7:37:25 AM
Middle-out.