- Guy Blanc, Neha Gupta, Gregory Valiant and Paul Valiant.
*Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process*. arXiv:1904.09080, 2019.

We consider deep networks, trained via stochastic gradient descent to minimize L2 loss, with the training labels perturbed by independent noise at each iteration. We characterize the behavior of the training dynamics near any parameter vector that achieves zero training error, in terms of an implicit regularization term corresponding to the sum over the data points, of the squared L2 norm of the gradient of the model with respect to the parameter vector, evaluated at each data point. We then leverage this general characterization, which holds for networks of any connectivity, width, depth, and choice of activation function, to show that for 2-layer ReLU networks of arbitrary width and L2 loss, when trained on one-dimensional labeled data (x

_{1},y_{1}),...,(x_{n},y_{n}), the only stable solutions with zero training error correspond to functions that: 1) are linear over any set of three or more co-linear training points (i.e. the function has no extra "kinks"); and 2) change convexity the minimum number of times that is necessary to fit the training data. Additionally, for 2-layer networks of arbitrary width, with tanh or logistic activations, we show that when trained on a single d-dimensional point (x,y) the only stable solutions correspond to networks where the activations of all hidden units at the datapoint, and all weights from the hidden units to the output, take at most two distinct values, or are zero. In this sense, we show that when trained on "simple" data, models corresponding to stable parameters are also "simple"; in short, despite fitting in an over-parameterized regime where the vast majority of expressible functions are complicated and badly behaved, stable parameters reached by training with noise express nearly the "simplest possible" hypothesis consistent with the data. These results shed light on the mystery of why deep networks generalize so well in practice.

- Jasper C. H. Lee and Paul Valiant.
*Uncertainty about Uncertainty: Near-Optimal Adaptive Algorithms for Estimating Binary Mixtures of Unknown Coins*. arXiv:1904.09228, 2019.

Given a mixture between two populations of coins, "positive" coins that have (unknown and potentially different) probabilities of heads >=1/2+Delta and negative coins with probabilities <=1/2-Delta, we consider the task of estimating the fraction rho of coins of each type to within additive error epsilon. We introduce new techniques to show a fully-adaptive lower bound of Omega(rho/(epsilon

^{2}Delta^{2})) samples (for constant probability of success). We achieve almost-matching algorithmic performance of O(rho(1+rho log (1/epsilon))/(epsilon^{2}Delta^{2})) samples, which matches the lower bound except in the regime where rho=omega(1/log(1/epsilon)). The fine-grained adaptive flavor of both our algorithm and lower bound contrasts with much previous work in distributional testing and learning.

- Gregory Valiant and Paul Valiant.
*Estimating the Unseen: Improved Estimators for Entropy and Other Properties*. Journal of the ACM, 64(6) pp. 37:1-37:41, 2017.

We show that a class of statistical properties of distributions, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a sublinear sized sample. Specifically, given a sample consisting of independent draws from any distribution over at most k distinct elements, these properties can be estimated accurately using a sample of size O(k/log k). For these estimation tasks, this performance is optimal, to constant factors. Complementing these theoretical results, we also demonstrate that our estimators perform exceptionally well, in practice, for a variety of estimation tasks, on a variety of natural distributions, for a wide range of parameters. The key step in our approach is to first use the sample to characterize the "unseen" portion of the distribution — effectively reconstructing this portion of the distribution as accurately as if one had a logarithmic factor larger sample. This goes beyond such tools as the Good-Turing frequency estimation scheme, which estimates the total probability mass of the unobserved portion of the distribution: We seek to estimate the shape of the unobserved portion of the distribution. This work can be seen as introducing a robust, general, and theoretically principled framework that, for many practical applications, essentially amplifies the sample size by a logarithmic factor; we expect that it may be fruitfully used as a component within larger machine learning and statistical analysis systems.

- Gregory Valiant and Paul Valiant.
*An Automatic Inequality Prover and Instance Optimal Identity Testing*. SIAM Journal on Computing, 46(1) pp. 429-455, 2017.

We consider the problem of verifying the identity of a distribution: Given the description of a distribution over a discrete finite or countably infinite support, p=(p1, p2,...), how many samples (independent draws) must one obtain from an unknown distribution, q, to distinguish, with high probability, the case that p=q from the case that the total variation distance (L1 distance) ||p-q||

_{1}≥ε? We resolve this question, up to constant factors, on an instance by instance basis: there exist universal constants c, c' and a function f(p, ε) on the known distribution p and error parameter ε such that our tester distinguishes p=q from ||p-q||_{1}≥ε using f(p, ε) samples with success probability >2/3, but no tester can distinguish p=q from ||p-q||_{1}≥cε when given c' f(p, ε) samples.The analysis of our very simple testing algorithm involves several hairy inequalities. To facilitate this analysis, we give a complete characterization of a general class of inequalities - generalizing Cauchy-Schwarz, Hölder's inequality, and the monotonicity of Lp norms. Our characterization is constructive and algorithmic, leveraging linear programming to prove or refute an inequality, which would otherwise have to be investigated, through trial and error, by hand. We hope the computational nature of our characterization will be useful to others, and facilitate analyses like the one here.

- Gregory Valiant and Paul Valiant.
*Information Theoretically Secure Databases*. Electronic Colloquium on Computational Complexity TR16-078, 2016.

We introduce the notion of a database system that is information theoretically

*secure in between accesses*- a database system with the properties that 1) users can efficiently access their data, and 2) while a user is not accessing their data, the user's information is*information theoretically*secure to malicious agents, provided that certain requirements on the maintenance of the database are realized. We stress that the security guarantee is information theoretic and everlasting: it relies neither on unproved hardness assumptions, nor on the assumption that the adversary is computationally or storage bounded.We propose a realization of such a database system and prove that a user's stored information, in between times when it is being legitimately accessed, is information theoretically secure both to adversaries who interact with the database in the prescribed manner, as well as to adversaries who have installed a virus that has access to the entire database and communicates with the adversary.

The central idea behind our design of an information theoretically secure database system is the construction of a "re-randomizing database" that periodically changes the internal representation of the information that is being stored. To ensure security, these remappings of the representation of the data must be made sufficiently often in comparison to the amount of information that is being communicated from the database between remappings and the amount of local memory in the database that a virus may preserve during the remappings. While this changing representation provably foils the ability of an adversary to glean information, it can be accomplished in a manner transparent to the legitimate users, preserving how database users access their data. The core of the proof of the security guarantee relies on a new communication/data tradeoff for the problem of learning sparse parities from uniformly random n-bit examples.

- Jasper C.H. Lee and Paul Valiant.
*Optimizing Star-Convex Functions*. Proc. 57nd Foundations of Computer Science (FOCS), 2016.

Star-convexity is a significant relaxation of the notion of convexity, that allows for functions that do not have (sub)gradients at most points, and may even be discontinuous everywhere except at the global optimum. We introduce a polynomial time algorithm for optimizing the class of star-convex functions, under no Lipschitz or other smoothness assumptions whatsoever, and no restrictions except exponential boundedness on a region about the origin, and Lebesgue measurability. The algorithm's performance is polynomial in the requested number of digits of accuracy and the dimension of the search domain. This contrasts with the previous best known algorithm of Nesterov and Polyak [2006] which has exponential dependence on the number of digits of accuracy, but only n

^{ω}dependence on the dimension n (where ω is the matrix multiplication exponent), and which further requires Lipschitz second differentiability of the function.Despite a long history of successful gradient-based optimization algorithms, star-convex optimization is a uniquely challenging regime because 1) gradients and/or subgradients often do not exist; and 2) even in cases when gradients exist, there are star-convex functions for which gradients provably provide no information about the location of the global optimum. We thus bypass the usual approach of relying on gradient oracles and introduce a new randomized cutting plane algorithm that relies only on function evaluations. Our algorithm essentially looks for structure at all scales, since, unlike with convex functions, star-convex functions do not necessarily display simpler behavior on smaller length scales. Thus, while our cutting plane algorithm refines a feasible region of exponentially decreasing volume by iteratively removing "cuts", unlike for the standard convex case, the structure to efficiently discover such cuts may not be found within the feasible region: our novel star-convex cutting plane approach discovers cuts by sampling the function exponentially far outside the feasible region.

We emphasize that the class of star-convex functions we consider is as unrestricted as possible: the class of Lebesgue measurable star-convex functions has theoretical appeal, introducing to the domain of polynomial-time algorithms a huge class with many interesting pathologies. We view our results as a step forward in understanding the scope of optimization techniques beyond the garden of convex optimization and local gradient-based methods.

- Stephen Childress, Andrew Gilbert, and Paul Valiant.
*Eroding dipoles and vorticity growth for Euler flows in R*. Journal of Fluid Mechanics, 805, pp. 1-30, 2016.^{3}: Axisymmetric flow without swirl

A review of analyses based upon anti-parallel vortex structures suggests that structurally stable dipoles with eroding circulation may offer a path to the study of vorticity growth in solutions of Euler's equations in R

^{3}. We examine here the possible formation of such a structure in axisymmetric flow without swirl, leading to maximal growth of vorticity as t^{4/3}. Our study suggests that the optimizing flow giving the t^{4/3}growth mimics an exact solution of Euler's equations representing an eroding toroidal vortex dipole which locally conserves kinetic energy. The dipole cross-section is a perturbation of the classical Sadovskii dipole having piecewise constant vorticity, which breaks the symmetry of closed streamlines. The structure of this perturbed Sadovskii dipole is analysed asymptotically at large times, and its predicted properties are verified numerically. We also show numerically that if mirror symmetry of the dipole is not imposed but axial symmetry maintained, an instability leads to breakup into smaller vortical structures.

- James Zou, Gregory Valiant, Paul Valiant, Konrad Karczewski, Siu On Chan, Kaitlin Samocha, Monkol Lek, Shamil Sunyaev, Mark Daly and Daniel G. MacArthur.
*Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects*. Nature Communications 7, Article 13293, 2016.

As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of <0.00001 in healthy humans, consistent with very strong intolerance to gene inactivation.

- Gregory Valiant and Paul Valiant.
*Instance Optimal Learning of Discrete Distributions*. Proc. 47rd Symposium on Theory of Computing (STOC), 2015.

We consider the following basic learning task: given independent draws from an unknown distribution over a discrete support, output an approximation of the distribution that is as accurate as possible in L1 distance (equivalently, total variation distance, or "statistical distance"). Perhaps surprisingly, it is often possible to "de-noise" the empirical distribution of the samples to return an approximation of the true distribution that is significantly more accurate than the empirical distribution, without relying on any prior assumptions on the distribution. We present an instance optimal learning algorithm which optimally performs this de-noising for every distribution for which such a de-noising is possible. More formally, given n independent draws from a distribution p, our algorithm returns a labelled vector whose expected distance from p is equal to the minimum possible expected error that could be obtained by any algorithm, even one that is given the true unlabeled vector of probabilities of distribution p and simply needs to assign labels - up to an additive subconstant term that is independent of p and goes to zero as n gets large. This somewhat surprising result has several conceptual implications, including the fact that, for any large sample from a distribution over discrete support, prior knowledge of the rates of decay of the tails of the distribution (e.g. power-law type assumptions) is not significantly helpful for the task of learning the distribution. As a consequence of our techniques, we also show that given a set of n samples from an arbitrary distribution, one can accurately estimate the expected number of distinct elements that will be observed in a sample of any size up to n log n. This sort of extrapolation is practically relevant, particularly to domains such as genomics where it is important to understand how much more might be discovered given larger sample sizes, and we are optimistic that our approach is practically viable.

- Paul Valiant.
*Evolvability of Real Functions*. Transactions on Theory of Computation 6(3): 12:1-12:19, 2014.

We formulate a notion of evolvability for functions with domain and range that are real-valued vectors, a compelling way of expressing many natural biological processes. We show that linear and fixed-degree polynomial functions are evolvable in the following dually-robust sense: There is a single evolution algorithm that, for all convex loss functions, converges for all distributions. It is possible that such dually-robust results can be achieved by simpler and more-natural evolution algorithms. Towards this end, we introduce a simple and natural algorithm that we call wide-scale random noise and prove a corresponding result for the L2 metric. We conjecture that the algorithm works for a more general class of metrics.

- Gregory Valiant and Paul Valiant.
*An Automatic Inequality Prover and Instance Optimal Identity Testing*. Proc. 55nd Foundations of Computer Science (FOCS), 2014. ACM "Best of Computing" Notable Article for 2014

We consider the problem of verifying the identity of a distribution: Given the description of a distribution over a discrete finite or countably infinite support, p=(p1, p2,...), how many samples (independent draws) must one obtain from an unknown distribution, q, to distinguish, with high probability, the case that p=q from the case that the total variation distance (L1 distance) ||p-q||

_{1}≥ε? We resolve this question, up to constant factors, on an instance by instance basis: there exist universal constants c, c' and a function f(p, ε) on the known distribution p and error parameter ε such that our tester distinguishes p=q from ||p-q||_{1}≥ε using f(p, ε) samples with success probability >2/3, but no tester can distinguish p=q from ||p-q||_{1}≥cε when given c' f(p, ε) samples.The analysis of our very simple testing algorithm involves several hairy inequalities. To facilitate this analysis, we give a complete characterization of a general class of inequalities - generalizing Cauchy-Schwarz, Hölder's inequality, and the monotonicity of Lp norms. Our characterization is constructive and algorithmic, leveraging linear programming to prove or refute an inequality, which would otherwise have to be investigated, through trial and error, by hand. We hope the computational nature of our characterization will be useful to others, and facilitate analyses like the one here.

- Siu-On Chan, Ilias Diakonikolas, Gregory Valiant, and Paul Valiant.
*Optimal Algorithms for Testing Closeness of Discrete Distributions*. Proc. 25th Symposium on Discrete Algorithms (SODA), 2014.

We study the question of closeness testing for two discrete distributions. More precisely, given samples from two distributions p and q over an n-element set, we wish to distinguish whether p=q versus p is at least ε-far from q, in either L1 or L2 distance. Batu et al [2000] gave the first sub-linear time algorithms for these problems, which matched the lower bounds of Valiant [2011] up to a logarithmic factor in n, and a polynomial factor of ε.

In this work, we present simple testers for both the L1 and L2 settings, with sample complexity that is information-theoretically optimal, to constant factors, both in the dependence on n, and the dependence on ε; for the L1 testing problem we establish that the sample complexity is Θ(max{n

^{2/3}/ε^{4/3}, n^{1/2}/ε^{2}}).

- Gregory Valiant and Paul Valiant.
*Estimating the Unseen: Improved Estimators for Entropy and other Properties*. Proc. Neural Information Processing Systems (NIPS) 2013.

Recently, we showed that a class of distributional properties, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a sublinear sized sample. Specifically, given a sample consisting of independent draws from any distribution over at most n distinct elements, these properties can be estimated accurately using a sample of size O(n/log n). We propose a novel modification of this approach and show: 1) theoretically, this estimator is optimal (to constant factors, over worst-case instances), and 2) in practice, it performs exceptionally well for a variety of estimation tasks, on a variety of natural distributions, for a wide range of parameters. Perhaps unsurprisingly, the key step in our approach is to first use the sample to characterize the "unseen" portion of the distribution. This goes beyond such tools as the Good-Turing frequency estimation scheme, which estimates the total probability mass of the unobserved portion of the distribution: we seek to estimate the shape of the unobserved portion of the distribution. This approach is robust, general, and theoretically principled; we expect that it may be fruitfully used as a component within larger machine learning and data analysis systems.

- Constantinos Daskalakis, Ilias Diakonikolas, Rocco A. Servedio, Gregory Valiant, and Paul Valiant.
*Testing k-Modal Distributions: Optimal Algorithms via Reductions*. Proc. 24th Symposium on Discrete Algorithms (SODA), 2013.

We give highly efficient algorithms, and almost matching lower bounds, for a range of basic statistical problems that involve testing and estimating the L1 (total variation) distance between two k-modal distributions p and q over the discrete domain {1,...,n}. For each of four problems we give sub-logarithmic sample algorithms, and show that our algorithms have optimal sample complexity up to additive poly(k) and multiplicative polylog log n + polylog k factors.

As our main conceptual contribution, we introduce a new reduction-based approach for distribution-testing problems that lets us obtain all the above results in a unified way. Roughly speaking, this approach enables us to transform various distribution testing problems for k-modal distributions over {1,...,n} to the corresponding distribution testing problems for unrestricted distributions over a much smaller domain {1,...,L} where L=O(k log n).

- Georg Gottlob, Stephanie Tien Lee, Gregory Valiant, and Paul Valiant.
*Size and Treewidth Bounds for Conjunctive Queries*. Journal of the ACM, 59(3): 16:1-16:35, 2012.

This article provides new worst-case bounds for the size and treewith of the result Q(D) of a conjunctive query Q applied to a database D. We derive bounds for the result size |Q(D)| in terms of structural properties of Q, both in the absence and in the presence of keys and functional dependencies. These bounds are based on a novel "coloring" of the query variables that associates a coloring number C(Q) to each query Q. Intuitively, each color used represents some possible entropy of that variable. Using this coloring number, we derive tight bounds for the size of Q(D) in case (i) no functional dependencies or keys are specified, and (ii) simple functional dependencies (keys) are given. These results generalize recent size-bounds for join queries obtained by Atserias et al. [2008]. In the case of arbitrary (compound) functional dependencies, we use tools from information theory to provide lower and upper bounds, establishing a close connection between size bounds and a basic question in information theory. Our new coloring scheme also allows us to precisely characterize (both in the absence of keys and with simple keys) the treewidth-preserving queries---the queries for which the treewidth of the output relation is bounded by a function of the treewidth of the input database. Finally, we give some results on the computational complexity of determining the size bounds, and of deciding whether the treewidth is preserved.

- Andrew McGregor and Paul Valiant.
*The Shifting Sands Algorithm*. Proc. 23rd Symposium on Discrete Algorithms (SODA), 2012.

We resolve the problem of small-space approximate selection in random-order streams. Specifically, we present an algorithm that reads the

*n*elements of a set in random order and returns an element whose rank differs from the true median by at most*n*^{1/3+o(1)}while storing a constant number of elements and counters at any one time. This is optimal: it was previously shown that achieving better accuracy required poly(*n*) memory. However, it was conjectured that the lower bound was not tight and that a previous algorithm achieving an*n*^{1/2+o(1)}approximation was optimal. We therefore consider the new result a surprising resolution to a natural and basic question.

- Paul Valiant.
*Distribution Free Evolvability of Polynomial Functions over all Convex Loss Functions*. Proc. 3rd Innovations in Theoretical Computer Science (ITCS), 2012.

We formulate a notion of evolvability for functions with domain and range that are real-valued vectors, a compelling way of expressing many natural biological processes. We show that linear and fixed degree polynomial functions are evolvable in the following dually robust sense: There is a single evolution algorithm that for all convex loss functions converges for all distributions. Existing results suggest that for Boolean function evolution no correspondingly general result is possible.

It is possible that such dually robust results can be achieved by simpler and more natural evolution algorithms. In the second part of the paper we introduce a simple and natural algorithm that we call "wide-scale random noise" and prove a corresponding result for the L_2 metric. We conjecture that the algorithm works for more general classes of metrics.

- Gregory Valiant and Paul Valiant.
*The power of linear estimators*. Proc. 52nd Foundations of Computer Science (FOCS), 2011.

For a broad class of practically relevant distribution properties, which includes entropy and support size, nearly all of the proposed estimators have an especially simple form. Given a set of independent samples from a discrete distribution, these estimators tally the vector of summary statistics---the number of domain elements seen once, twice, etc. in the sample---and output the dot product between these summary statistics, and a fixed vector of coefficients. We term such estimators linear. This historical proclivity towards linear estimators is slightly perplexing, since, despite many efforts over nearly 60 years, all proposed such estimators have significantly suboptimal convergence, compared to the bounds shown in [Valiant and Valiant, 2011].

Our main result, in some sense vindicating this insistence on linear estimators, is that for any property in this broad class, there exists a near-optimal linear estimator. Additionally, we give a practical and polynomial-time algorithm for constructing such estimators for any given parameters.

While this result does not yield explicit bounds on the sample complexities of these estimation tasks, we leverage the insights provided by this result, to give explicit constructions of near-optimal linear estimators for three properties: entropy, L1 distance to uniformity, and for pairs of distributions, L1 distance.

Our entropy estimator, when given O(n/(c log n)) independent samples from a distribution of support at most n, will estimate the entropy of the distribution to within accuracy c, with probability of failure o(1/poly(n)). From the recent lower bounds given in [Valiant and Valiant, 2011], this estimator is optimal, to constant factor, both in its dependence on n, and its dependence on c. In particular, the inverse-linear convergence rate of this estimator resolves the main open question of [Valiant and Valiant, 2011], which left open the possibility that the error decreased only with the square root of the number of samples.

Our distance to uniformity estimator, when given O(m/(c^2 log m)) independent samples from any distribution, returns an c-accurate estimate of the L1 distance to the uniform distribution of support m. This is the first sublinear-sample estimator for this problem, and is constant-factor optimal, for constant c.

Finally, our framework extends naturally to properties of pairs of distributions, including estimating the L1 distance and KL-divergence between pairs of distributions. We give an explicit linear estimator for estimating L1 distance to accuracy c using O(n/(c^2 log n)) samples from each distribution, which is constant-factor optimal, for constant c.

- Gregory Valiant, and Paul Valiant.
*Estimating the unseen: a sublinear-sample canonical estimator of distributions*. Merged with below paper in: Proc. 43rd Symposium on Theory of Computing (STOC), 2011.

We introduce a new approach to characterizing the unobserved portion of a distribution, which provides sublinear-sample additive estimators for a class of properties that includes entropy and distribution support size. Together with the lower bounds proven in the companion paper [VV'10a], this settles the longstanding question of the sample complexities of these estimation problems (up to constant factors). Our algorithm estimates these properties up to an arbitrarily small additive constant, using O(n log n) samples; [VV'10a] shows that no algorithm on o(n log n) samples can achieve this (where n is a bound on the support size, or in the case of estimating the support size, 1/n is a lower bound the probability of any element of the domain). Previously, no explicit sublinear-sample algorithms for either of these problems were known. Additionally, our algorithm runs in time linear in the number of samples used.

- Gregory Valiant and Paul Valiant.
*A CLT and tight lower bounds for estimating entropy*.

2010.

We prove two new multivariate central limit theorems; the first relates the sum of independent distributions to the multivariate Gaussian of corresponding mean and covariance, under the earthmover distance matric (also known as the Wasserstein metric). We leverage this central limit theorem to prove a stronger but more specific central limit theorem for ``generalized multinomial" distributions---a large class of discrete distributions, parameterized by matrices, that generalize binomial and multinomial distributions, and describe many distributions encountered in computer science. This central limit theorem relates a generalized multinomial distribution to a multivariate Gaussian distribution, discretized by rounding to the nearest lattice points. In contrast to the metric of our first central limit theorem, this bound is in terms of statistical distance, which immediately implies that any algorithm with input drawn from a generalized multinomial distribution behaves essentially as if the input were drawn from a discretized Gaussian with the same mean and covariance. Such tools in the multivariate setting are rare, and we hope this new tool will be of use to the community.

In the second part of the paper, we employ this central limit theorem to establish a lower bound of Omega(n/ log n) on the sample complexity of additively estimating the entropy or support size of a distribution (where 1/n is a lower bound on the probability of any element in the domain). Together with the canonical estimator constructed in the companion paper [VV'10b], this settles the longstanding open question of the sample complexities of these estimation problems, up to constant factors. In particular, for any constants c_1 < log 2, and c_2 < 1/2, there is a family of pairs of distributions D,D' each of whose elements occurs with probability at least 1/n, whose entropies satisfy H(D)-H(D') > c_1, and whose support sizes differ by at least c_2 n, such that no algorithm on o(n/ log n) samples can distinguish D from D' with probability greater than 2/3. For the problem of estimating entropy, we also provide a bound on the

rate of convergence of an optimal estimator, showing that the sample complexity of estimating entropy to within additive c is Omega(n/ (c log n). The previous lower-bounds on these sample complexities were n/2^Theta(Sqrt(log n)), for constant c. We explicitly exhibit such a family of pairs of distributions D,D' via a Laguerre polynomial construction that may be of independent interest.

- Arnab Bhattacharyya, Eldar Fischer, Ronitt Rubinfeld, and Paul Valiant.
*Testing Monotonicity of Distributions over General Partial Orders*. Proc. 2nd Innovations in Computer Science (ICS), 2011.

We investigate the number of samples required for testing the monotonicity of a distribution with respect to an arbitrary underlying partially ordered set. Our first result is a nearly linear lower bound for the sample complexity of testing monotonicity with respect to the poset consisting of a directed perfect matching. This is the first nearly linear lower bound known for a natural non-symmetric property of distributions. Testing monotonicity with respect to the matching reduces to testing monotonicity with respect to various other natural posets, showing corresponding lower bounds for these posets also. Next, we show that whenever a poset has a linear-sized matching in the transitive closure of its Hasse digraph, testing monotonicity with respect to it requires Omega(Sqrt(n)) samples. Previous such lower bounds applied only to the total order. We also give upper bounds to the sample complexity in terms of the chain decomposition of the poset. Our results simplify the known tester for the two dimensional grid and give the first sublinear bounds for higher dimensional grids and the Boolean cube.

- Jing Chen, Silvio Micali, and Paul Valiant.
*Robustly Leveraging Collusion in Combinatorial Auctions*. Proc. 1st Innovations in Computer Science (ICS), 2010, pp. 81-93.

Because of its devastating effects in auctions and other mechanisms, collusion is prohibited and legally prosecuted. Yet, colluders have always existed, and may continue to exist. We thus raise the following question for mechanism design:

*What desiderata are achievable, and by what type of mechanisms, when any set of players who wish to collude are free to do so without any restrictions on the way in which they cooperate and coordinate their actions?*In response to this question we put forward and exemplify the notion of a

*collusion-leveraging mechanism*. In essence, this is a mechanism aligning its desiderata with the incentives of all its players, including colluders, to a significant and mutually beneficial extent. Of course such mechanisms may exist only for suitable desiderata.In unrestricted combinatorial auctions, where classical mechanisms essentially guarantee 0 social welfare and 0 revenue in the presence of just two colluders, we prove that it is possible for collusion-leveraging mechanisms to guarantee that the sum of social welfare and revenue is always high, even when all players are collusive.

To guarantee better performance, collusion-leveraging mechanisms in essence "welcome" collusive players, rather than pretending they do not exist, raising a host of new questions at the intersection of cooperative and non-cooperative game theory.

- Constantinos Daskalakis, Grant Schoenebeck, Gregory Valiant, and Paul Valiant.
*On the Complexity of Nash Equilibria of Action-Graph Games*. Proc. 20th Symposium on Discrete Algorithms (SODA), 2009, pp. 710-719.

In light of much recent interest in finding a model of multi-player multi-action games that allows for efficient computation of Nash equilibria yet remains as expressive as possible, we investigate the computational complexity of Nash equilibria in the recently proposed model of

*action- graph games*(AGGs). AGGs, introduced by Bhat and Leyton-Brown, are succinct representations of games that encapsulate both local dependencies as in graphical games, and partial indifference to other agents' identities as in anonymous games, which occur in many natural settings such as financial markets. This is achieved by specifying a graph on the set of actions, so that the payoff of an agent for selecting a strategy depends only on the number of agents playing each of the neighboring strategies in the action graph. We present a simple Fully Polynomial Time Approximation Scheme for computing mixed Nash equilibria of AGGs with constant degree, constant treewidth and a constant number of agent types (but an arbitrary number of strategies), and extend this algorithm to a broader set of instances. However, the main results of this paper are negative, showing that when either of the latter conditions are relaxed the problem becomes intractable. In particular, we show that even if the action graph is a tree but the number of agent-types is unconstrained, it is NP-complete to decide the existence of a pure-strategy Nash equilibrium and PPAD-complete to compute a mixed Nash equilibrium (even an approximate one). Similarly for AGGs with a constant number of agent types but unconstrained treewidth. These hardness results suggest that, in some sense, our FPTAS is as strong a positive result as one can expect. In the broader context of trying to pin down the boundary where the equilibria of multi-player games can be computed efficiently, these results complement recent hardness results for graphical games and algorithmic results for anonymous games.

- Paul Valiant.
*Testing Symmetric Properties of Distributions*, MIT PhD Thesis (2008); previous version published in Proc. 40th Symposium on Theory of Computing (STOC), 2008, pp. 383-392.

We introduce the notion of a

*Canonical Tester*for a class of properties on distributions, that is, a tester strong and general enough that "a distribution property in the class is testable if and only if the Canonical Tester tests it''. We construct a Canonical Tester for the class of properties of one or two distributions that are symmetric and satisfy a certain weak continuity condition. Analyzing the performance of the Canonical Tester on specific properties resolves several open problems, establishing lower bounds that match known upper bounds: we show that distinguishing between entropy $<\alpha$ or $>\beta$ on distributions over $[n]$ requires $n^{\alpha/\beta- o(1)}$ samples, and distinguishing whether a pair of distributions has statistical distance $<\alpha$ or $>\beta$ requires $n^{1- o(1)}$ samples. Our techniques also resolve a conjecture about a property that our Canonical Tester does not apply to: distinguishing identical distributions from those with statistical distance $>\beta$ requires $\Omega(n^{2/3})$ samples.

- Silvio Micali and Paul Valiant.
*Leveraging Collusion in Combinatorial Auctions*. Manuscript, April 2008. ("MV2")

We investigate dominant-strategy auction mechanisms that, should a sufficiently informed coalition of players be present, exploit it so as to guarantee more efficiency and revenue than is otherwise possible.

(Coming from a cryptographic tradition and prizing extreme settings, we refrain from relying on weaker notions of equilibrium; the availability of Bayesian information; any restrictions on the number of players, goods, and colluders; or any restrictions on the type of valuations and the ability of the colluders.)

- Silvio Micali and Paul Valiant.
*Resilient Mechanisms for Truly Combinatorial Auctions*. Manuscript, November 2008. ("MV1")

Dominant-strategy truthfulness is traditionally considered the best possible solution concept in mechanism design, as it enables one to predict with confidence which strategies

*independent*players will actually choose. Yet, as with any other form of equilibrium, it too can be extremely vulnerable to*collusion*. The problem of collusion is particularly evident for*unrestricted combinatorial auctions*, arguably the hardest type of auctions to design mechanisms for.We thus investigate how much revenue can be guaranteed, in unrestricted combinatorial auctions, by dominant-strategy-truthful mechanisms that are

*collusion-resilient*in a very strong sense; and obtain almost matching upper- and lower-bounds.

- Paul Valiant.
*Incrementally Verifiable Computation or Proofs of Knowledge Imply Time/Space Efficiency*. Proc. 5th Theory of Cryptography Conference (TCC), 2008, pp. 1-18. (Winner of best student paper award)

A probabilistically checkable proof (PCP) system enables proofs to be veried in time polylogarithmic in the length of a classical proof. Computationally sound (CS) proofs improve upon PCPs by additionally shortening the length of the transmitted proof to be polylogarithmic in the length of the classical proof.

In this paper we explore the ultimate limits of non-interactive proof systems with respect to time and space effciency. We present a proof system where the prover uses space polynomial in the space of a classical prover and time essentially linear in the time of a classical prover, while the verier uses time and space that are essentially constant. Further, this proof system is composable: there is an algorithm for merging two proofs of length k into a proof of the conjunction of the original two theorems in time polynomial in k, yielding a proof of length

*exactly k*.We deduce the existence of our proposed proof system by way of a natural new assumption about proofs of knowledge. In fact, a main contribution of our result is showing that knowledge can be "traded" for time and space effciency in noninteractive proof systems. We motivate this result with an explicit construction of noninteractive CS proofs of knowledge in the random oracle model.

- Xi Chen, Shang-Hua Teng, and Paul Valiant,
*The approximation complexity of win-lose games*. Proc. 18th Symposium on Discrete Algorithms (SODA), 2007, pp. 159-168.

We further our algorithmic and structural understanding of Nash equilibria. Specifically:

-- We distill the hard core of the complexity of Nash equilibria, showing that even correctly computing a logarithmic number of bits of the equilibrium strategies of a two-player win-lose game is as hard as the general problem.

-- We prove the following structural result about Nash equilibria: “the set of approximate equilibria of a zero-sum game is convex.”

- Mythili Vutukuru, Paul Valiant, Swastik Kopparty, and Hari Balakrishnan.
*How to Construct a Correct and Scalable iBGP Configuration*, Proc. 25th Conference on Computer Communications (INFOCOM), 2006.

The Internet's current interdomain routing protocol, BGP (Border Gateway Protocol), has two modes of operation: eBGP (external BGP), used to exchange routing information between autonomous systems, and iBGP (internal BGP), used to propagate that information within an autonomous system (AS). In a “full mesh” iBGP configuration, every router has a BGP session with every border router in the AS. Because a full mesh configuration has a large number of iBGP sessions and does not scale well, configurations based on route reflectors are commonly used for intra-AS route dissemination. Unfortunately, route reflector configurations violate important correctness properties, including loop-free forwarding and complete visibility to all eBGP-learned best routes, especially in the face of router and link failures.

This paper presents and analyzes the first (to our knowledge) algorithm, BGPSep, to construct an iBGP session configuration that is both correct and more scalable than a full mesh. BGPSep uses the notion of a graph separator--—a small set of nodes whose removal partitions a graph into connected components of roughly equal sizes—--to choose route reflectors and iBGP sessions in a way that guarantees correctness. We evaluate an implementation of the BGPSep algorithm on several real-world and simulated network topologies and find that iBGP configurations generated by BGPSep have between 2.5 to 5x fewer iBGP sessions than a full mesh.

- Timothy Abbot, Daniel Kane, Paul Valiant.
*On the Complexity of Two-Player Win-Lose Games*. Proc. 46th Foundations of Computer Science (FOCS), 2005, pp. 113-122. (Co-winner of the best student paper award)

The efficient computation of Nash equilibria is one of the most formidable challenges in computational complexity today. The problem remains open for two-player games.

We show that the complexity of two-player Nash equilibria is unchanged when all outcomes are restricted to be 0 or 1. That is, win-or-lose games are as complex as the general case for two-player games.

- Paul Valiant.
*The Tensor Product of Two Codes Is Not Necessarily Robustly Testable*. Proc. 9th International Workshop on Randomization and Computation (RANDOM), 2005, pp. 472-481.

There has been significant interest lately in the task of constructing codes that are testable with a small number of random probes. Ben-Sasson and Sudan show that the repeated tensor product of codes leads to a general class of locally testable codes. One question that is not settled by their work is the local testability of a code generated by a single application of the tensor product. Special cases of this question have been studied in the literature in the form of "tests for bivariate polynomials", where the tensor product has been shown to be locally testable for certain families of codes. However the question remained open for the tensor product of generic families of codes. Here we resolve the question negatively, giving families of codes whose tensor product does not have good local testability properties.

- Mart de Graaf and Paul Valiant.
*Polynomial Representations of Symmetric Partial Boolean Functions*. SIAM J. Discrete Math 19(2):481-488 (2005).

For Boolean polynomials in Z_p of sufficiently low degree we derive a relation expressing their values on one level set in terms of their values on another level set. We use this relation to derive linear upper and lower bounds, tight to within constant factor, on the degrees of various

*approximate majority functions*, namely functions that take the value 0 on one level set, the value 1 on a different level set, and arbitrary 0-1 values on other Boolean inputs. We show sub-linear upper bounds in the case of moduli that are not prime powers.