Predicting Central Topics in a Blog Corpus from a Networks Perspective
Srayan Datta
University of Michigan
Ann Arbor, Michigan 48105
srayand@umich.edu
Abstract
In today’s content-centric Internet, blogs
are becoming increasingly popular and
important from a...
More
Predicting Central Topics in a Blog Corpus from a Networks Perspective
Srayan Datta
University of Michigan
Ann Arbor, Michigan 48105
srayand@umich.edu
Abstract
In today’s content-centric Internet, blogs
are becoming increasingly popular and
important from a data analysis perspective.
According to Wikipedia, there were over
156 million public blogs on the Internet as
of February 2011. Blogs are a reflection
of our contemporary society. The contents
of different blog posts are important from
social, psychological, economical and political perspectives. Discovery of important topics in the blogosphere is an area
which still needs much exploring. We try
to come up with a procedure using probabilistic topic modeling and network centrality measures which identifies the central topics in a blog corpus.
1 Introduction
This paper presents an algorithm to identify and
rank the most important topics given a set of blog
entries. The topics are identified using Latent
Dirichlet Allocation (LDA) (Blei
Less