Ping me (b@kaggle.com) if you're interested in running this competition more formally on https://kaggle.com.
We've run hundreds of machine learning competitions & offer a real-time leaderboard to encourage competitive participation, a very active community of data scientists, and many other features that simplify running this type of challenge.
So basically they are asking people to build them an algorithm that will be a critical part of their business, in exchange for a free service that will be based on this algorithm. Right...
This is interesting, but given your parameters (predict the most friendships), all you're technically asking for is recall. I'll write an algorithm that has 100% recall: predict that all people become friends with each other.
If this is really a competition (and not just "Here, have fun with our dataset!"), you need to define the rules a little bit more clearly. How are you weighing recall vs. precision? Or are you just looking at % correct labels, where the only two labels possible are "FRIENDS" and "NOT FRIENDS"?
This would be a little more fun if there was a cash prize. No offense meant, groupers look cool, but you'd probably get some more participation that way.
Hey HN, Grouper founder here. Let me know if you have any questions about the contest.
I'll be the first to say it: Your data is either incorrect, arbitrary, or we're missing some information here.
Why does everyone have "7.5" - 8 siblings and 7.5 - 8 "weekly workouts" and 7.5 - 8 platinum albums?
Mutual Information for the fields:
I(f_facebook_friends_count,members_became_friends) = 0.117320113379
I(m_facebook_friends_count,members_became_friends) = 0.113972809724
I(m_facebook_photos_count,members_became_friends) = 0.0449092782303
I(f_facebook_photos_count,members_became_friends) = 0.0426531483254
I(m_shoe_size,members_became_friends) = 0.00276175766018
I(m_height,members_became_friends) = 0.00255043390135
I(f_shoe_size,members_became_friends) = 0.00233148724025
I(m_age,members_became_friends) = 0.00198005768283
I(f_height,members_became_friends) = 0.0013606978915
I(m_weekly_workouts,members_became_friends) = 0.00123271513215
I(f_age,members_became_friends) = 0.00122660347743
I(m_platinum_albums,members_became_friends) = 0.00111710129455
I(f_number_of_pets,members_became_friends) = 0.00108593667378
I(f_pokemon_collected,members_became_friends) = 0.000880040104571
I(m_number_of_siblings,members_became_friends) = 0.000830295252089
I(f_platinum_albums,members_became_friends) = 0.000820683185117
I(m_number_of_pets,members_became_friends) = 0.000768855827053
I(m_pokemon_collected,members_became_friends) = 0.000720822383999
I(f_weekly_workouts,members_became_friends) = 0.000620666529567
I(f_number_of_siblings,members_became_friends) = 0.00019278884716
I(f_gender,members_became_friends) = 0.000124279429698
I(m_gender,members_became_friends) = 0.000124279429698
This reminds of http://robrhinehart.com/?p=1005
That fact that the women are depicted as just three pairs of legs doesn't help, though.
Ok, let's make this more interesting. I'll pay $50 to the first person to de-anonymize their training set.
members_became_friends = 1/(1+ exp(-1297.88087 * f_shoe_size + m_shoe_size * m_facebook_friends_count - 11761.6138))
This is a cool challenge but the prize is definitely lacking. I think anyone capable of writing an algorithm of the caliber you're looking for isn't likely to participate. I could be wrong but I think you're going to have to pony up some serious cash to get developers taking this seriously. Or you could go the more standard route and just hire someone to do the job.