In case you missed it, here's a link to the full training example that you can run yourself: https://www.paperspace.com/console/jobs/jvqssfqawv5zn/logs
Inference example: https://www.paperspace.com/console/jobs/js4mqzm91fj2lg
Disclosure: I work on Paperspace
A step in the right direction for machine learning in science, but they could have done some more research into naming conflicts:
$ apt-cache show quilt
Package: quilt
[..]
Description-en: Tool to work with series of patches
Quilt manages a series of patches by keeping track of the changes each of them makes. They are logically organized as a stack, and you can apply, un-apply, refresh them easily by traveling into the stack (push/pop). . Quilt is good for managing additional patches applied to a package received as a tarball or maintained in another version control system. The stacked organization is proven to be efficient for the management of very large patch sets (more than hundred patches). As matter of fact, it was designed by and for Linux kernel hackers (Andrew Morton, from the -mm branch, is the original author), and its main use by the current upstream maintainer is to manage the (hundreds of) patches against the kernel made for the SUSE distribution. . This package provides seamless integration into Debhelper or CDBS, allowing maintainers to easily add a quilt-based patch management system in their packages. The package also provides some basic support for those not using those tools. See README.Debian for more information.
$ zcat /usr/share/doc/quilt/changelog.gz | tail -n3
Version 0.26 (Tue Oct 21 2003) - Change summary not available
Was not aware of Quilt for hosting datasets. Is it the go-to in this area? What other alternatives are there?
It seems to me like the machine learning algorithm here is mostly learning how to add JPEG compression artifacts to images.
Isn't quilt just bluring the pixels to an extend?
Please please please don't kill our favorite plot device. Make sure the process takes exactly 3 days.
Oh, this resonates with me so much! I'm running 4 different DeepSpeech models right now, each using a differently processed version of LibriSpeech dataset (mfcc/fbanks/linear spectrograms, deltas? energy? padding? etc). Because the original DS papers didn't bother describing it, and every implementation I found uses completely different methods and libraries.
Not to mention every one of those implementation packages their preprocessed version into a different data format, and then creates a different data pipeline (and I only looked at tensorflow implementations)