
- Cet évènement est passé
Séminaire du CIMMUL | Samin Aref
février 21 @ 13 h 30 min - 14 h 30 min
Supervised and unsupervised comparison of 30 community detection algorithms reveal that proximity to globally optimal partitions matters
Samin Aref
University of Toronto
Résumé
Network clustering (descriptive community detection) is a fundamental network problem with extensive applications in various fields. The first part of this talk assesses the performance of various modularity-based algorithms in obtaining optimal partitions. Our analysis utilizes 104 networks, including both real-world instances from diverse contexts and modular graphs from two families of synthetic benchmarks. We analyze ten inexact modularity-based algorithms against the exact integer programming baseline that globally optimizes modularity. Our comparative analysis includes eight heuristics, two variants of a graph neural network algorithm, and an exact modularity-based method. Our findings reveal that the average modularity-based heuristic yields optimal partitions in only 43.9% of the 104 networks analyzed. Additionally, our analysis of three partition similarity metrics exposes substantial dissimilarities between high-modularity sub-optimal partitions and any optimal partition of the networks. We observe that near-optimal partitions are often disproportionately dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of the commonly used modularity-based methods: they rarely produce an optimal partition or a partition resembling an optimal partition even on networks with modular structures. If modularity is to be used for detecting communities in small and mid-sized networks, exact and approximate optimization algorithms are recommendable for a more methodologically sound usage of modularity within its applicability limits.
In the second part of the talk. we propose a specialized algorithm, called Bayan, which returns partitions with a guarantee of proximity to an optimal partition. At the core of the Bayan algorithm is a branch-and-cut scheme that solves an integer programming formulation of the problem to optimality or approximates it within a factor. Using structurally diverse networks, we compare 30 community detection methods including the Bayan algorithm. Our results show the distinctive accuracy and stability of globally maximum-modularity partitions in retrieving planted partitions at rates higher than most alternatives for a wide range of parameter settings in two standard benchmarks. Compared to the partitions from 29 other algorithms, maximum-modularity partitions have the best medians for description length, coverage, performance, average conductance, and well clusteredness. These advantages come at the cost of additional computations which Bayan makes possible for small networks (networks that have up to 3000 edges in their largest connected component). Bayan is several times faster than using open-source and commercial solvers for modularity maximization, making it capable of finding optimal partitions for instances that cannot be optimized by any other existing method. Our results point to a few well performing algorithms, among which Bayan stands out as the most reliable method for small networks. A Python implementation of the Bayan algorithm (bayanpy) is publicly available through the package installer for Python.
In collaboration with: Hriday Chheda (University of Toronto) and Mahdi Mostajabdaveh (Polytechnique Montréal)
Papers related to the talk:
arxiv.org/abs/2302.14698
arxiv.org/abs/2310.10898
arxiv.org/abs/2209.04562
Bio
Samin Aref is an assistant professor, teaching stream in data science at the Department of Mechanical and Industrial Engineering, University of Toronto (Canada, 2021-present). He holds a PhD in Computer Science from the University of Auckland (New Zealand, 2019) and an MSc in Industrial Engineering from Sharif University (Iran, 2014). He has worked as a visiting assistant professor at the University of British Columbia (Canada, 2022) and a research scientist at the Max Planck Institute for Demographic Research (Germany, 2018-2021). Samin’s research is at the intersections of network science, machine learning, operations research and optimization, data science, and social computing for which he has secured funding from several organizations in Germany, New Zealand, and Canada. Samin has contributed to the fundamental mathematics of analyzing groups and systems and has developed new methods for studying the migration and mobility of researchers in Russia, Mexico, Germany and the UK.
____
Le séminaire aura lieu au local 3860 du pavillon Alexandre-Vachon et en ligne.
Pour rejoindre la réunion Zoom :
https://ulaval.zoom.us/j/62680136430?pwd=eldBYjdNTG5QR2VxTTFqbVM4UGVRZz09
Meeting ID: 626 8013 6430
Passcode: 693150