The Unreasonable Effectiveness of Deep Feature Extraction

  • Deep feature extraction is important for not only image analysis but also in other areas where specialized tools might be useful such as listed below:

    o https://github.com/Featuretools/featuretools - Automated feature engineering with main focus on relational structures and deep feature synthesis

    o https://github.com/blue-yonder/tsfresh - Automatic extraction of relevant features from time series

    o https://github.com/machinalis/featureforge - creating and testing machine learning features, with a scikit-learn compatible API

    o https://github.com/asavinov/lambdo - Feature engineering and machine learning: together at last! The workflow engine allows for integrating feature training and data wrangling tasks with conventional ML

    o https://github.com/xiaoganghan/awesome-feature-engineering - other resource related to feature engineering (video, audio, text)

  • As the author acknowledges, we might be living in a window of opportunity where big data firms are giving something away for free that may yet turn out to be a big part of their furure IP. Grab it while you can.

    On a tangent, I really like the tone of voice in this article. Wide eyed, optimistic and forward looking while at the same time knowledgeable and practical. (Thanks!)

  • This is very interesting and timely to my work, I had been struggling with training a Mobilenet CNN for classification of human emotions ("in the wild"), and struggling to get the model to converge. I tried multiclass to binary models e.g. angry|not_angry etc. But couldn't get past the 60-70% accuracy range.

    I switched to extracting features from Imagenet and trained an xgboost binary and boom...right out of the box am seeing ~88% accuracy.

    Also the author's points about speed of training and flexibility is major plus for my work. Hope this helps others.

  • >But in the future, I think ML will look more like a tower of transfer learning. You'll have a sequence of models, each of which specializes the previous model, which was trained on a more general task with more data available.

    He's almost describing a future where we might buy/license pre-trained models from Google/Facebook/etc that are trained on huge datasets, and then extend that with more specific training from other sources of data in order to end up with a model suited to the problem being solved.

    It also sounds like we can feed the model's learnings back into new models with new architectures as well as we discover better approaches later.

  • A few caveats here:

    - It works (that well) only for vision (for language it sort-of-works only at the word level: http://p.migdal.pl/2017/01/06/king-man-woman-queen-why.html)

    - "Do Better ImageNet Models Transfer Better?" https://arxiv.org/abs/1805.08974

    And if you want to play with transfer learning, here is a tutorial with a working notebook: https://deepsense.ai/keras-vs-pytorch-avp-transfer-learning/

  • Hi everyone! Author here. Let me know if you have any questions, this is one of my favorite subjects in the world to talk about.

  • Very interesting article! It answered some questions I've had for a long time.

    I'm curious about how this works in practice. Is it always good enough to take the outputs of the next-to-last layer as features? When doing quick iterations, I assume the images in the data set have been run through the big net as a preparation step? And the inputs to the net you're training is the features? Does the new net always only need 1 layer?

    What are some examples of where this worked well (except for the flowers mentioned in the article)?

  • It's hard to ask my question without sounding a bit naive :-) Back in the early nineties I did some work with convoluted neural nets, except that at that time we didn't call them "convoluted". They were just the neural nets that were not provably uninteresting :-) My biggest problem was that I didn't have enough hardware and so I put that kind of stuff on a shelf waiting for hardware to improve (which it did, but I never got back to that shelf).

    What I find a bit strange is the excitement that's going on. I find a lot of these results pretty expected. Or at least this is what I and anybody I talked to at the time seemed to think would happen. Of course, the thing about science is that sometimes you have to do the boring work of seeing if it does, indeed, work like that. So while I've been glancing sidelong at the ML work going on, it's been mostly a checklist of "Oh cool. So it does work. I'm glad".

    The excitement has really been catching me off guard, though. It's as if nobody else expected it to work like this. This in turn makes me wonder if I'm being stupidly naive. Normally I find when somebody thinks, "Oh it was obvious" it's because they had an oversimplified view of it and it just happened to superficially match with reality. I suspect that's the case with me :-)

    For those doing research in the area (and I know there are some people here), what have been the biggest discoveries/hurdles that we've overcome in the last 20 or 30 years? In retrospect, what were the biggest worries you had in terms of wondering if it would work the way you thought it might? Going forward, what are the most obvious hurdles that, if they don't work out might slow down or halt our progression?

  • Contrast a similar writeup on some interesting observations about solving ImageNet with a network that only sees small patches (largest is 33px on a side)

    https://medium.com/bethgelab/neural-networks-seem-to-follow-...

  • Question to me is, can you do this with i.e. Random Forest too, or is it specific to NN.

  • This is probably naive, but I’m imagining something like the US Library of Congress providing these models in the future. E.g., some federally funded program to procure / create enormous data sets / train.

  • I'm wondering how this compares to transfer learning applied to the same model. That is compare deep feature extraction plus linear model at the end vs just transferring the weights to the same model and retraining to your specific dataset.

  • From the article:

    Where are things headed?

    There's a growing consensus that deep learning is going to be a centralizing technology rather than a decentralizing one. We seem to be headed toward a world where the only people with enough data and compute to train truly state-of-the-art networks are a handful of large tech companies.

    This is terrifying, but the same conclusion that I've come to.

    I'm starting to feel more and more dread that this isn't how the future was supposed to be. I used to be so passionate about technology, especially about AI as the last solution in computer science.

    But anymore, the most likely scenario I see for myself is moving out into the desert like OB1 Kenobi. I'm just, so weary. So unbelievably weary, day by day, in ever increasing ways.