HN is in the same cluster as 2ch, not Techcrunch, on Twitter

  • 2d projections of complex multidimensional data are unreliable in the extreme as to adjacency meaning. Most adjacency especially are an artifact of the chosen projection method.

  • I wonder if I could post a randomly generated graph, label it with HN-interested labels arbitrarily, and get a serious talk started on HN about nonexistent correlations.

  • TechCrunch reports on us. It is journalism for the spectators. The twitter cluster of people sharing TC links is TC's audience, not participants in TC's subject matter.

    Why in blue hell would anyone on HN be sharing TC links? Intuitively it seems more likely that people who share HN links are discussing these matters directly.

  • Interesting parallel observation: when I worked for a regional newspaper some years ago, we rolled out products for the same demo as "mommy blog Twitter". We saw the same sort of isolated behavior - visitors to "mommy blog content" almost never strayed onto our mainstream products.

    The same sorts of products delivered to "puppy and kitty" people didn't have the same effect, though the level of vitriol in the comments was similar.

  • Considering nicovideo is anti-establishment media (it's owned by Kadokawa, which is an underdog media company with strong subculture roots) and that 2chan "summary sites" double as news sources for the anti-establishment these days, the association seems apt.

  • This is amazing, one of my favorite articles on HN ever.

    I'm really curious what the heck that "eye" is in the bottom right space of the clusters. Some cluster so radically orthogonal to any other content it has an order of magnitude more distance in differentiation?

  • This is cool. How many sampled tweets did HN links appear in? How many sampled tweets did you have overall?

    I'm curious if a sampling error could explain why an English website like HN would get placed with the Japanese language sites. StackOverflow isn't placed by any related sites either.

    If the weird results aren't from sampling artifacts, my best guess is that a lot of spambots must be linking to multiple legit sites regardless of relevance.

  • I really hope someday we get spambots that start off by trying to make useful contributions. Then later, after building a following, start advertising scams.

    I'm confident that, given the right incentives, spam kings could discover conversational AI before any lab.

  • This is fantastic. Feature request: drag a rectangle over a group of dots, and see them as a text list of websites. As is it's hard to see all the sites that are in a dense dot cluster.

  • Quran quotes being grouped with archive.org might be explained by the Internet Archive frequently being used to host Islamist materials.

  • The interactive version is powered by this dataset - http://pile-of-junk.s3.amazonaws.com/domain_similarity_tsne_... - processed by JavaScript inside the page: https://pile-of-junk.s3.amazonaws.com/twitter_scatter_10k.ht...

  • > Japanese social media twitter (which I'm labelling as "2ch", though it's not just 2ch) is almost completely distinct from what I'm calling "upstanding japanese twitter" (links to mainstream news sites like news24)

    I have no idea what the point of the headline is after reading the above part of the post.

  • That's interesting. Never would've made the connection myself, although now that I think about it, some of the most fascinating discussions I've read on HN involved Japanese work culture.

  • This is some fascinating analysis. And like the Author I am amazed that Twitter doesn't crack down harder on their spambots.

  • what is 2ch?

  • So HN is close to nico (Japanese youtube) and pixiv (Japanese-centric art and fanart site)? Interesting.

  • What are all of the other twitters? There is so much undocumented space! I want to know what it all is!

  • Is the regex search in the demo not working for anyone else (tested both Chrome and Firefox on Win7)

  • why does the hella.cheap site have an SSL cert with an unknown authority?