Exploring graph embeddings: DeepWalk and Node2Vec
Exploring DeepWalk and Node2Vec to extract embeddings from graphs
Some months ago, my Twitter timeline started to show me some tweets about researchers talking about the “new” deep learning revolution. This new revolution is the application of deep learning to graphs. This new field seems very new to me, that’s why I take the opportunity to learn more about it.
In these series of posts, I am going to share what I am learning about this new field. Starting from the beginning: Get a dense representation of nodes, similar to what we can do with word embeddings in the text field. But first, let’s take a step back and let me explain what a graph is.
What is a graph?
A graph can be defined as G = (V, E) where V is a set of nodes and E is a list of edges. An edge is a connection between two nodes, for example, node A and D has an edge. Also, it is important to notice that a graph can be directed or undirected. For example, the graph below is undirected because A is connected with D and D is connected with A. One more thing, a graph can get different nodes attributes and also edge attributes but for our purpose, today is not important.

Now that we know more or less what a graph is, we can try to extract node embeddings from graphs. As I said previously, these two methods are kind of similar to what was presented by Tomas Mikolov with Word2Vec, but first…
Why do we need node embeddings?🧐
Let’s imagine you need to solve a scenario like the following one:
- We have the interaction of users in a social network and we need to predict when two users are connected. Nodes represent users and edges represent when two users are “friends”. (Link prediction task)
- We have a citation network of research publications and we need to predict the topic of each publication. Nodes represent the publications and edges are citations from one publication to another. (Node prediction task)
- We have a set of proteins that are classified as enzymes or non-enzymes. Nodes represent the amino acids and two nodes are connected by an edge if they are less than 6 Angstroms apart. (Graph classification task)
For all the mentioned tasks we need to have a representation of the nodes. So, if we need to run our machine learning algorithm we need to transform our graph structure into a vectorial space. Here we can find two main methods DeepWalk and Node2Vec.
DeepWalk
DeepWalk was presented by Stony Brook University researchers in the paper “DeepWalk: Online Learning of Social Representations“ (2014).
It introduces for the first time the concept of Random walk for embedding generation. Basically, a random walk is a way of converting a graph into a sequence of nodes for then training a Word2Vec model. Basically, for each node in the graph, the model generates a random path of nodes connected. Once we have these random paths of nodes it trains a Word2Vec (skip-gram) model to obtain the node embeddings.
For learning purposes, find below an implementation of the algorithm, notice that this code is not ready for scale applications, some parallelization and memory improvements can be made.
Node2Vec
Node2Vec was presented by Stanford University researchers in the paper: “node2vec: Scalable Feature Learning for Networks” (2016).
This algorithm uses some of the ideas presented by Deepwalk but goes a step deeper. It uses a combination of the algorithms DFS and BFS to extract the random walks. This combination of algorithms is controlled by two parameters P (return parameter) and Q (in-out parameter).
Basically, if P is large the random walks will be large, so it does exploration and if P is small we stay locally. Similar but opposite behaviour happens with Q, if Q is small it is going to do exploration and if Q is large it is going to stay locally. More details can be found in the original paper.
We can test Node2Vec using PyTorch geometric. This library implements a bunch of graph neural networks architectures and methods to speed the work with GNN. For testing it, I am going to use a small part of the tutorial proposed on Pytorch geometric. For that, they use the Cora dataset. The Cora dataset consists of 2708 scientific publications classified into seven classes. The citation network consists of 5429 links. Each publication in the dataset is described by a 0/1-valued word vector indicating the absence/presence of the corresponding word from the dictionary. The dictionary consists of 1433 unique words.
Once the model is trained we will have an embedding for each node in the graph and each embedding will be 128 dimensional. After the training, we can save the embeddings and see in the embedding projector how “good” the representation is compared with the labels. For that, I am using the T-SNE algorithm to reduce from 128 dimensions to 3-dimensional data, so we can plot it.

As we can see, nodes that have similar topics appear together in the representation. Notice that we are doing a big dimensionality reduction in order to plot in a visual space, so maybe there are some samples that are not perfect.
Now that we have our node embeddings we can train machine learning algorithms to predict the topics. So, we don’t need the graph ;)
Not all that glitters is gold 🏅
As we can see we can build node embeddings for our graphs but, these node embeddings have several issues, some of them are:
- The embeddings are dependent on the graph, so if a new node appears we need to recompute the embeddings. Is not a big deal, and we can do improvements to solve this but definitely is something to take into account especially if the graph is going to change from time to time.
- We only take information about the connection between nodes, but we are not using the features of the nodes or edges. So, Can we generate better embeddings taking into account the information or features of nodes and edges? Yes, using Graph neural networks and the message passing framework, this is another post…
Conclusions📚
In this post, I explained how we can train node embeddings for our graphs. Once we have these embeddings we can run machine learning models as usual. Also, I focused on the practical questions, but there are some interesting maths behind this technology. I found this video really interesting to learn more about the maths behind the models. I hope you found it interesting.
About the author
Marcos Esteve is a machine learning engineer at Ravenpack. At his work, Marcos develops machine and deep learning models for a huge variety of Natural Language Processing tasks. He is quite interested in multimodality tasks and building data science apps. Contact him on Linkedin or Twitter