Thesis Defense

 

"Scalable Algorithms for Mining Graphs and Social Networks via Sampling"

Ahmad Mahmoody

Friday, April 28, 2017, at 2:30 P.M.

Lubrano Conference Room (CIT 4th Floor)

Sampling based and randomized algorithms are powerful tools in studying big data problems. In this thesis we provide sampling based algorithms for mining graphs and social network in three different contexts:

Detecting Valuable Information. Detecting new information and events in a dynamic system by probing individual nodes has many practical applications: discovering new webpages, analyzing influence properties in network, and detecting failure propagation in electronic circuits or infections in public drinkable water systems. In practice, it is infeasible for anyone but the owner of the network (if existent) to monitor all nodes at all times. We study the constrained setting when the observer can only probe a small set of nodes at each time step to check whether new pieces of information (items) have reached those nodes.

Centrality Maximization. Betweenness centrality (BWC) is a fundamental centrality measure in social network analysis. Given a large-scale network, how can we find the most central nodes? This question is of great importance to many key applications that rely on BWC, including community detection and understanding graph vulnerability. Despite the large amount of work on scalable approximation algorithm design for BWC, estimating BWC on large-scale networks remains a computational challenge. In this paper, we study the Centrality Maximization problem (CMP): given a graph G = (V,E) and a positive integer k, find a subset S of nodes V that maximizes BWC subject to the cardinality constraint |S| ≤ k. We present an efficient randomized algorithm that provides a (1 − 1/e − eps)-approximation with high probability, where eps> 0. Our results improve the current state-of-the-art result [127].

Influence Estimation. Social networks are important communication and information media. Individuals in a social network share information and influence each other through their social connections. Understanding social influence and information diffusion is a fundamental research endeavor and it has important applications in online social advertising, viral marketing, and trending topic prediction. We first study the Targeted-Influence problem (TIP): Given a network G = (V, E) and a model of influence, we want to be able to estimate in real-time (say in a few seconds per query) the influence of an arbitrary subset of users S over a arbitrary subset of users T, for any possible query (S; T). To do so, we allow an efficient preprocessing. Finally, we conclude the thesis by studying the problem of Conditional Influence Estimation (CIE) whose goal is to estimate the influence of a node given that a cascade started from that node has already reached a group of nodes (target nodes).

Host: Eli Upfal