Hey folks! We just built a cost-effective, lightweight way to generate audiovisual summaries for videos.
* Process videos up to 12x faster than realtime * Costs <$0.01 / min of video * Combines visual and audial components
The goal here is not to build a single E2E model but something that could actually be used in production while preserving relatively high quality.
You can try it out yourself here: https://www.sievedata.com/functions/sieve/describe How we built it: https://www.sievedata.com/blog/describe-video-summary-beta-l... The code: https://github.com/sieve-community/describe
Hey folks! We just built a cost-effective, lightweight way to generate audiovisual summaries for videos.
* Process videos up to 12x faster than realtime * Costs <$0.01 / min of video * Combines visual and audial components
The goal here is not to build a single E2E model but something that could actually be used in production while preserving relatively high quality.
You can try it out yourself here: https://www.sievedata.com/functions/sieve/describe How we built it: https://www.sievedata.com/blog/describe-video-summary-beta-l... The code: https://github.com/sieve-community/describe