Seminar on Probability and Computing, Chalmers, August 2010

Date: We will start from Aug. 2 with one or two meetings every day, the first between 10am and 12pm, and the second from 2pm to 4 pm, unless otherwise specified (see the schedule) We will meet in EDIT 3364 or EDIT 3320.

First meeting: on Aug. 2 in EDIT 3364, it will be particularly important to discuss logistics and a set of papers or topics to be covered. Participants are encouraged to bring their topics of interest.

This page is maintained by Matteo Riondato, matteo@cs.brown.edu .

Schedule of Talks
Date, Time, and Place	Speaker	Title	Abstract
Mon Aug 2 — 10 am — 3320		Introductory meeting.	Overview of topics and papers by Eli Upfal, Devdatt Dubhashi, and Peter Damaschke. Invitation to suggestions from others.
Tue Aug 3 — 10 am — 3364	Devdatt Dubhashi (Chalmers)	The Johnson-Lindenstrauss Lemma	The Johnson-Lindenstrauss Lemma is a fundamental measure concentration result that is the basis of dimension reducing techniques in many applications in machine learning and high dimensional search. I will present a unifying exposition of variants of the lemma due to Matousek and outline the close connections to recent developments in the field of compressive sensing.
Wed Aug 4 — 10 am — 3364	Osmar Zaiane (University of Alberta)	Social Network Analysis for the Assessment of Learning	Using computer-supported collaborative learning tools, learners interact forming relationships and complex flows of information. In a forum with very few learners it is customary to quickly collect thousands of messages in few months, and these are interrelated in intricate discussion threads. Assessing the participation and interaction between learners can become a daunting task. Social network analysis is a field of study attempting to understand and measure relationships between entities in networked information. Can social network analysis techniques and data mining techniques for information networks help examine and assess online interactions? We examine some work done in this area, particularly the application of community mining, and discuss some open problems pertaining to social network analysis in the e-learning domain.
Thu Aug 5 — 10 am — 3364	Matteo Riondato (Brown University)	Mining Top-K Frequent Itemsets Through Progressive Sampling	We study the use of sampling for efficiently mining the top-K frequent itemsets of cardinality at most w. To this purpose, we define an approximation to the top-K frequent itemsets to be a family of itemsets which includes (resp., excludes) all very frequent (resp., very infrequent) itemsets, together with an estimate of these itemsets' frequencies with a bounded error. Our first result is an upper bound on the sample size which guarantees that the top-K frequent itemsets mined from a random sample of that size approximate the actual top-K frequent itemsets, with probability larger than a specified value. We show that the upper bound is asymptotically tight when w is constant. Our main algorithmic contribution is a progressive sampling approach, combined with suitable stopping conditions, which on appropriate inputs is able to extract approximate top-K frequent itemsets from samples whose sizes are smaller than the general upper bound. In order to test the stopping conditions, this approach maintains the frequency of all itemsets encountered, which is practical only for small w. However, we show how this problem can be mitigated by using a variation of Bloom filters. A number of experiments conducted on both synthetic and real bench- mark datasets show that using samples substantially smaller than the original dataset (i.e., of size defined by the upper bound or reached through the progressive sampling approach) enable to approximate the actual top-K frequent itemsets with accuracy much higher than what analytically proved. Paper available on ArXiv.
Thu Aug 5 — 2 pm — 3364	Chiranjib Bhattacharyaa (Machine Learning Lab, Indian Institute of Science Bangalore)	Multiple Kernel Learning	This paper presents a novel algorithm and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ a block L1 norm regularization for promoting sparsity within RKHS norms of each group and L_q (q \ge 2) norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved with suitable choices of q and grouping of kernels \| hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits e cient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices. A detailed proof of convergence of the algorithm is also presented.
Fri Aug 6 — 10 am — 3364	Osmar Zaiane (University of Alberta)	A blessing of high dimensionality for outlier detection and class separation	Increasing data dimensionality causes an exponential increase in sparcity. This is known as the curse of dimensionality and is problematic for data clustering, outlier detection and many data analysis tasks. There are also blessings. Concentration of measure is such a blessing. We exploit this phenomenon to find and rank outliers in high dimensional data. On a second stage, we take advantage of this ranking to separate classes with different variances whether the classes are overlapping or even concentric.
Fri Aug 6 — 11 am — 3364	Chiranjib Bhattacharyaa (Machine Learning Lab, Indian Institute of Science Bangalore)	Randomized Algorithms for Large Scale SVM	We propose a randomized algorithm for training Support vector machines(SVMs) on large datasets. By using ideas from Random projections we show that the combinatorial dimension of SVMs is O(log n) with high probability. This estimate of combinatorial dimension is used to derive an iterative algorithm, called RandSVM, which at each step calls an existing solver to train SVMs on a randomly chosen subset of size O(log n). The algorithm has probabilistic guarantees and is capable of training SVMs with Kernels for both classification and regression problems. Experiments done on synthetic and real life data sets demonstrate that the algorithm scales up existing SVM learners, without loss of accuracy.
Fri Aug 6 — 1 pm — 3364	Henk Wymeersch (S2, Chalmers)	Probabilistic Graphical Models for Distributed Cooperative Localization Algorithms	Probabilistic graphical models are a powerful framework to develop algorithms in a rigorous and systematic fashion. They have found applications in many engineering disciplines, and have revealed connections between existing algorithms. In this talk, I will provide an introduction to graphical models, and briefly discuss connections with statistical physics, variational methods, and convex optimization. Then, we will move on to applications, and show how probabilistic graphical models can be used in the development of generic wireless receivers for digital communications, as well as for distributed algorithms in wireless networks.
Aug 9 — Aug 13		Institute of Mathematical Statistics Annual Meeting	Held at Chalmers this year. Some of the sessions may be interest for the partecipants of the seminar.
Wed Aug 11 — 3pm — 3364	Eli Upfal (Brown University)	The Perceptron Algorithm	Following up on the sequence of ML talks Eli will present the following papers: Yoav Freund and Robert E. Shapire: Large Margin Classification Using the Perceptron Algorithm. Shai Shalev-Shwartz and Yoram Singer: New Perspective on an Old Perceptron Algorithm You're most encouraged to come with a copy of the paper since we'll together try to understand the main ideas and how to explore them for further results.
Thu Aug 12 — 2pm — 3364	Peter Damaschke (Chalmers)	Nonadaptive group testing	Nonadaptive group testing (NGT) is a "boolean counterpart" of compressed sensing (CS). The goal is to reconstruct a boolean vector where at most k entries are 1, by group tests that only tell us whether there exists some 1 in a subset. The purpose of the talk is to relate NGT to CS, to review the principal complexity results for NGT, to discuss some extensions to 2-stage testing and to the case when k is not known in advance, and to explain open questions in our ongoing research.
Fri Aug 13 — 2pm — 3364	Eli Upfal (Brown University)	The Perceptron Algorithm and its variations	This is the continuation of what was presented on Wednesday.
Mon Aug 16 — 2pm — 3364	Devdatt Dubhashi (Chalmers)	The Multiplicative Weights Update Method: a Meta Algorithm and Applications	See this paper. Algorithms in varied fields use the idea of maintaining a distribution over a certain set and use the multiplicative update rule to iteratively change these weights. Their analysis are usually very similar and rely on an exponential potential function. We present a simple meta algorithm that unifies these disparate algorithms and drives them as simple instantiations of the meta algorithm.
Tue Aug 17 — 2pm — 3364	Eli Upfal (Brown University)	Infectious Random Walks	TBA
Tue Aug 24 — 3pm — MV:L15	Eli Upfal (Brown University)	Algorithms for Detecting Significantly Mutated Pathways in Cancer	TBA

List of Topics of Interest and Related Papers

Random Projections
- J. Matousek, On variants of the Johnson–Lindenstrauss lemma", Random Structures & Algorithms Volume 33, Issue 2 (September 2008)
- N. Ailon and B. Chazelle, "Faster Dimension Reduction", CACM, Volume 53 , Issue 2 (February 2010)
- R.G. Baraniuk and M.B. Wakin, "Random projections of smooth manifolds", Foundations of computationla mathematics, Volume 9, Number 1 / February, 2009
Concentration of Measure
- R. Kannan, "Two New Probability Inequalities and Concentration results", arXiv
Compressed Sensing
- Alfred M. Bruckstein, David L. Donoho, and Michael Elad "From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images" SIAM Rev. Volume 51, Issue 1, pp. 34-81 (2009)
- R. Baraniuk, M. Davenport, R. DeVore and M. Waken), A simple proof of the restricted isometry property for random matrices", Constructive Approximation, 28 (2008), 253-263.

Last Updated: Aug 19 2010