For example, N(mu,sigma^2) represents the family of normal distributions where mu corresponds to the mean of a normal distribution and ranges over the real numbers (R) and sigma^2 corresponds to the variance of a normal distribution and ranges over the positive real numbers (R^+). mu and sigma^2 are called parameters with R and R^+ their corresponding parameter spaces. The two-dimensional space R cross R^+ is the parameter space for the family of normal distributions. By fixing mu = 0 we can restrict attention to the family corresponding to the set of zero-mean normal distributions. In the following, we use theta to denote a real-valued parameter (or vector of real-valued parameters theta = (theta_1,...,theta_k)) with parameter space Omega_theta (Product_{i=1}^k Omega_{theta_i}).
f_n(x|theta) xi(theta) xi(theta|x) = ----------------------------------------------- Integral_{Omega_theta} f_n(x|theta) xi(theta)where f_n is the probability function for sequences of length n.
f_n(x|theta) = f(x_1|theta) f(x_2|theta) ... f(x_n|theta)The function f_n is often called the likelihood function. Since the denominator in Bayes rule is independent of theta, we have the following proportionality.
xi(theta|x) ~ f_n(x|theta) xi(theta)If the observations are obtained one at a time, we can update the posterior distribution as follows.
xi(theta|x_1) ~ f(x_1|theta) xi(theta) xi(theta|x_1,x_2) ~ f(x_2|theta) xi(theta|x_1) ... xi(theta|x_1,...,x_n) ~ f_n(x_n|theta) xi(theta|x_1,...,x_{n-1})
probability distribution corresponding family of generating the sample conjugate prior distributions ------------------------ ----------------------------- Poisson gamma normal normal exponential gamma binomial beta multinomial Dirichlet
If a is an estimate for theta and L(theta,a) is a real-valued loss function, the expected loss of choosing a before observing any data is
E(L(theta,a)) = Integral_{Omega_theta} L(theta,a) xi(theta)or, after observing the data x,
E(L(theta,a)|x) = Integral_{Omega_theta} L(theta,a) xi(theta|x)The squared-error loss function is perhaps the most common loss function used in practice.
L(theta,a) = (theta - a)^2A Bayes estimator delta^* is an estimator such that
E(L(theta,delta^*(x))|x) = min_a E(L(theta,a)|x)
In many practical cases, we are forced to choose a family of distributions not knowing whether or not the distribution generating the observed samples is in this particular family. A robust estimator is an estimator that chooses an estimate that is a good estimate even if the generating distribution is not in the chosen family of distributions (see Chapter 9 [DeGroot, 1986]).
Suppose that T = r(X) is a statistic and t any value of T. Let f_n(x|theta)|_{r(x)=t}, i.e., conditional joint distribution for x given that r(x)=t. In general, f_n(x|theta)|_{r(x)=t} will depend on theta. If f_n(x|theta)|_{r(x)=t} does not depend on theta then T is called a sufficient statistic. A sufficient statistic summarizes all of the information in a random sample so that knowledge of the individual values in the sample is irrelevant in searching for a good esimator for theta. For example, if the generating distribution is a zero-mean normal distribution, then the sample variance is a sufficient statistic for estimating sigma^2. Sufficient statistics are often used where a maximum likelihood or Bayes estimator is unsuitable. The sample mean and sample variance are said to be jointly sufficient statistics for the mean and variance of normal distributions.