Microsoft deletes face recognition database

  • (Throwaway due to my employment.)

    Meanwhile this[0] was posted on some of the entrances to the common buildings on Microsoft's campus for some time. They seemed to be training an algorithm for "fairness" by taking video of employees that were entering via certain doors. You could "opt out" by choosing another entrance though.

    [0] https://imgur.com/a/qYuelxD

  • Same story, different actors: Company throws Machine Learning at a problem, gets a good data set for training, uncovers horrifying implications (the data set is racist, the tools enable abuses of power, the model is incomprehensible, the tool is spreading anti-vax messages to sell more ads).

  • Q: "Why is this database unavailable"

    A: "Because we don't have anyone to maintain it"

    News Media: "Microsoft deletes database".

    Does anyone feel like today's news articles, are written by gossip/tabloid writers?

  • > Last year Microsoft President Brad Smith asked the US Congress to take on the task of regulating the use of facial recognition systems because they had "broad societal ramifications and potential for abuse".

    > More recently, Microsoft rejected a request from police in California to use its face-spotting systems in body cameras and cars.

    Sounds like they are actually starting to get some principles and are standing up for them.

  • I'm surprised at the title. Microsoft only said the database was "unavailable" because someone left the company.

    Are there any citations or press releases of Microsoft saying it was deleted?

  • "Database" seems like the wrong word for the headline. Did they delete the training data (the photos and labels)? The models trained on the data? Both? The article only appears to talk about the training set.

    If they only deleted the training data, but not the ML models generated from them, then you get the worst of both worlds (people still using the models to do things, and no way to validate or improve the fairness of said models by adding or removing labelled training data).

  • I think this is Microsofts way to limit legal, but more importantly, public image liability. These models can be used for bad stuff, like the profiling thats happening in <insert unethical organization/government here> right now. And Microsoft wants nothing to do with it.

  • You can create an account trillionpairs and access the downloads. It's free and instant.

    It's the Asian subset, 4 files, about 300 Gigs.

    http://trillionpairs.deepglint.com/overview

    addl info

    https://megapixels.cc/datasets/msceleb/

  • > but it added that it also included a lot of people who "must maintain an online presence for their professional lives".

    Maybe some of the images in the database were scraped from LinkedIn or GitHub

  • Anyone have a bittorrent hash for the data set?