1 Evaluating Overfit and Underfit in Models of Network Community Structure Amir Ghasemian, Homa Hosseinmardi and Aaron Clauset Abstract—A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into...
More
1 Evaluating Overfit and Underfit in Models of Network Community Structure Amir Ghasemian, Homa Hosseinmardi and Aaron Clauset Abstract—A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network’s connectivity. Although many methods now exist, the recently proved No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over- or under-fit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a “ground truth” will produce misleading conclusions about general accuracy. As a result, little is known about how over- and under-fitting varies by algorithm and input. Here, arXiv:1802.10582v1 [stat.ML] 28 Feb 2018 we present a broad evaluation of over- and under-fit
Less