Temporally Evolving Community Detection and Prediction in Heterogeneous Networks

This project was Certified by IBM Data Science Board 

We focus the problem of combining link, content and temporal analysis for community detection and prediction in evolving networks. Such temporal and content-rich networks occur in many real-life settings, such as bibliographic networks and question answering forums. 

We propose Chimera, a shared factorization model that can simultaneously account for graph links, content, and temporal analysis. 

We assume we have T graphs G1 ...GT that form a time- series. The graphs are defined over a fixed set of nodes N of cardinality n. In each timestamp, a different set of edges may exist over time.

We assume that for each timestamp t, we have an n × d content matrix Ct. Ct contains one row for each node, and each row contains d attribute values representing the content for that node at the tth timestamp. 

Therefore, one can fully represent the content and structural pair at the tth timestamp with the triplet (N, At, Ct). 

The optimization model for converting the temporal sequences of graphs and content to a multidimensional time-series is based on non-negative matrix factorization framework. Although the non-negativity is not essential, one advantage is that it leads to a more interpretable analysis. 

The matrix Ut is an n x k matrix, which is specific to each timestamp t. Each row of the matrix Ut describes the k-dimensional latent factors of the corresponding node at timestamp t, while taking into account both the structural and content information.

To achieve this goal, we use a three sets of latent factor matrices in a shared factorization process, which is able to combine content and structure in a holistic way: 

The experimental results illustrate the effectiveness of Chimera, since it outperforms the baseline methods. Our experiments also show that the prediction is efficient in using embeddings to predict near future communities, which opens a vast array of new possibilities for exploration. 

Chimera a novel shared factorization overtime model that can simultaneously take the link, content, and temporal information of networks into account improving over the state-of-the- art approaches for community detection. Chimera model and solve community detection in efficient time. 

Our experiments also show that the prediction is efficient in using embeddings to predict near future communities, which opens a vast array of new possibilities for exploration.  

This paper was published on PKDD 2018: 

Ana Paula Appel, Renato Luiz de Freitas Cunha, Charu C. Aggarwal, Marcela Megumi Terakado: Temporally Evolving Community Detection and Prediction in Content-Centric Networks. ECML/PKDD (2) 2018: 3-18

And also in Arxiv: 

Ana Paula Appel, Renato Luiz de Freitas Cunha, Charu C. Aggarwal, Marcela Megumi Terakado: Temporally Evolving Community Detection and Prediction in Content-Centric Networks. CoRR abs/1807.06560 (2018)

You can find the GitHub of this project here!