"This is an experiment [...] soundtrack was being read in the same way as the picture is – stopped 24 times per second? Would this be the ultimate flutter distortion?"
That bothered me like crazy - how did they scan the audio whilst the frame was physically stopped? Then I realized how it was done: the soundtrack of frame A is stored in frame B, where frame B is far enough away that it has constant velocity.
The film has been played on endless repeat, with "original sound", since the 1970s, at Disneyland's Main Street Cinema. https://en.wikipedia.org/wiki/Main_Street_Cinema
It would be interesting to see if anyone has made remarks on the sound quality and whether the flutter has been noticeable over the past 50+ years.
I've noticed a lot of jitter in the video, even in "stabilized" versions. It seems like there was jitter in how the cels were placed on the backgrounds, in addition to jitter of the whole frame. It's not so easy to stabilize it out. It would be cool to clean that up too with modern techniques.
More speculatively it seems to me like AI is approaching the level of image understanding where it could draw new in between frames in a much, much smarter way than typical terrible frame interpolation. That would be a great project and potentially commercially valuable. Steamboat Willie is animated at 24 FPS but it's common for animated TV shows to be done at 8 FPS and they could really benefit.
very interesting indeed, reminds me of some of the work done by Jamie Howarth and co with the 'Plangent Process' which uses the bias tone recorded by analogue tape machines to correct flutter on high-sample-rate transfers from tape...
Neural upscaling would make it even better
I've often wondered, if there are multiple different prints of an older film available, they could be averaged or something using modern ML or computational statistics. I get the sense that "unsupervised" methods are underutilized a bit in film and audio restoration but I know next to nothing about this area.