The dark side of Graph Neural Networks

  • I am really interested in GNN in the context of compilers.

    * Predicting the color of a node in a graph, could be used for example speculative devirtualization.

    * Predicting edges weight could give us a better estimate of hot basic blocks statically.

    * Running performance experiments is as easy as running the benchmark and introducing some metric of performance which you can give back to the GNN to learn from.

    Imagine also for debugging and IDEs. I haven't played with copilot, but I imagine that something like guessing the graph based on node labels and on some edges is feasible using GNN? This means that the IDE could try to match the name of your variables, the control flow, and the name of the function to other known functions and could potentially point out differences. Potentially giving us a way to fix logic errors, or better algos. E.g., "Mmm... it looks like you are implementing your own bubble sort from scratch. Would you like to click here and use <insert better sort>."

    I am not an expert on GNN, but if anyone has resources for someone to learn more about the state of the art of GNNs (a link to a literature review or something similar) do let me know.

  • Hypothesizing that graphs could lead to AGI is tantamount to equating part of the neocortex to the whole body. A model does not make reality, especially when the two are designed to work in a permanent feedback tandem.

    Since writing A Thousand Brains, Jeff Hawkins has revealed fascinating structures within the brain, a finite set of structure 'types' so to speak (families of similarly architectured brain parts).

    Graphs are definitely part of the biological design, but in taking inspiration from nature to build our own beings, we should take notice that the real thing is vastly more complex, and investigate more exhaustively the ins and outs of real brain structures.

  • There is plenty in this article that is just wrong.

    1. GNNs are no more "sequential" than CNNs and are therefore just as parallelizable in this respect (caveat below). A single GNN layer simply aggregates the features of the connected neighbors, just as a CNN aggregates the values of a nearby pixel. This can be parallelized across the nodes/pixels. The next layer depends on the output of the previous layer and is sequential in that sense, but that's true of all forms of neural networks. If other architectures have "won the hardware library" relative to GNNs, it's because GNNs depend heavily on sparse*dense multiplication. The real thing that makes it hard to parallelize is that you have to partition the graph intelligently when splitting across machines because there's a computational dependence between nodes, and you don't want connected nodes to be on different devices. In the metaphor with CNNs, that would be like needing to split a single image across multiple machines and still carry out the convolution operation.

    2. It's not true that pre-training doesn't work. It's very common to use unsupervised/self-supervised pre-training to e.g., get node embeddings, which are then fine-tuned on a down-stream task.

    3. It's true that the naive application of deep GNN architectures leads to problems like over-smoothing and the information bottleneck, but there are known solutions to each, and it's just rarely the case that you reasonably want/need information from far away in the graph except in special applications. In those cases, you likely want a different graph representation of the data rather than the perhaps obvious one.

    4. It's true that GNNs improperly applied to problems, whether choosing the wrong graph representation, pathological architecture, or simply a problem that doesn't have dependence between the data points, will have poor performance. But I don't think that's surprising and I'm sure that there are many problems where simply throwing a CNN at the data doesn't help as well. Obviously, the modeling approach needs to fit the inductive priors of the problem.

  • Wow this is surprisingly wrong.

    ConvNets _are_ message-passing networks. It is easy to see that bitmaps can be seen as graphs, with pixels as nodes and connections to their 8 neighbors (and themselves). Treat every neighbor as a connection of a different type and you can build a ConvNet out of heterogeneous graph convolutions.

    A 2D convolution operator is just an efficient implementation that takes this structure as a given and doesn‘t require the graph structure as another input.

    This means that the basic arguments of the article no longer hold. Yes, in cases GNNs might be slower or harder to train, but it is not a general rule.

  • There are many more open questions that we have not found the answer to -- the two blog posts [1&2] on our experience on creating a GNN based project is meant to spark a discussion and to clarify our own thinking on the topic.

    We are here to continue the discussion on hn! eg.: We are interested if someone encountered pretraining for Graphs Neural Networks?

    [1] https://www.appliedexploration.com/p/graph-neural-networks-f... [2] https://www.appliedexploration.com/p/dark-side-of-graph-neur...

  • There are four questions at the end of the post.

    For sure, the second one is answered - it is possible to parallelize GNNs to the billion-scale, while still using message passing. It requires rethinking how message passing is implemented, modifying objective functions that work in parallel, and changing ML infrastructure. You're not going to get to large graphs with generic distributed Tensorflow.

    I don't know if the third question is fully answered, but there are many approaches to preserving locality, either by changing architectures or changing objective functions.

    Also, errata: PinSage was developed for Pinterest, not Etsy (hence, not EtsySage).

  • I've found success using GNNs for point cloud classification, by creating edges between each point using a k nearest neighbor scheme.

  • That was a good read. Some slightly different perspectives:

    It feels like a wild west in GNNs right now, with a big paper or tool every few weeks. Interestingly, many of the issues discussed in the article are already components in modern OSS frameworks:

    - heterogeneity: RGCNs split weight matrices by node type, with diff versions in most frameworks now. Issues like smoothing are interesting too, as discussed.

    - scaling: sampling and various memory techniques are enabling handling massive graphs on single nodes, metapaths & other structures enable farther communication, etc. Especially impressive is OSS scaling work by cugraph+dgl teams.

    At the same time, even with that, we're finding it's still too hard for non-academic teams to use this stuff in production/operational scenarios: model zoo (esp. some particularly important modeling areas not discussed like time), GPUs, clusters, data pipeline integrations, etc. Imagine having 100K cyber alerts aggreated every day or a 50M users/transactions and wanting to score them... It's nowhere as easy (yet) as something like adding a layer to BERT like NLP people can do. If that's more your speed:

    - We're working on OSS automl for graph ai, trying to get typical internal team pipelines to go from events data to decisions & UIs in a few lines. First out was for UMAP and we've been pushing on GNNs more recently, http://github.com/graphistry/pygraphistry . Elsewhere, we're also working on the MLOps side and some key modeling scenarios.

    - ... both graphistry + almost all our customers & partners are hiring here! If you're into data (analytics/mlops/dataeng/ds), or general js/python fullstack dev, a lot happening here for missions like supply chain, cyber, fraud, & misinfo. Would love to chat!

    EDIT: Another two things I've found interesting:

    -- GNNs are probably most researched by folks in material sciences (chem, physics, ...), and it's really changing things like protein folding, and maybe next biggest on social networks. We see equal practicality for other key problem (cyber, fraud, supply chain, ...), but we're seeing much less academic work in these, and I think that's b/c academics are at a significant data disadvantage vs most industry teams. So even though big interest outside of the academic team areas, it's quite early days.

    -- Industrially, we're seeing GNNs promoted primarily by graph databases.. but almost all graph databases are CPU-based vs GPU-based, and in our polls of commercial GNN usage, typically not as part of a graph DB pipeline but something like regular data lakes (parquet, ...) feeding into regular scalable GPU compute tiers