Pr(B_S,D) = Integral_{B_P} Pr(D|B_S,B_P)f(B_P|B_S)dB_P (1)where B_P is a conditional probability assignment over the variables in B_S and f is a conditional probability density function.
Pr(D|B_S,B_P) = Product_{i=1}^m Pr(C_i|B_S,B_P) (2)where C_i is the ith case and Pr(C_i|B_S,B_P) can be computed directly from the network structure B_S and the conditional probability assignment B_P.
Let Z={x_1,...,x_n} be the set of variables, D the database of m cases, a variable x_i has r_i possible value assignments {v_{i1},...,v_{i r_i}}, let Parents(x_i) be the parents of x_i in a particular Bayesian network structure, w_{ij} be the jth unique instantiation of Parents(x_i) relative to the ordering of cases in D, and there are q_i such instantiations.
Define N_{ijk} to be the number of cases in D in which variable x_i has the value v_ik and Parents(x_i) is instantiated as w_{ij}. Let
N_{ij} = Sum_{k=1}^{r_i} N_{ijk}Given (i-iv), Cooper and Herskovits show that
Pr(B_S,D) = Pr(B_S) Product_{i=1}^n Product_{j=1}^{q_i} Fraction{(r_i-1)!}{(N_{ij}+r_i-1)!} Product_{k=1}^{r_i} N_{ij}! (3)which can be computed in O(m n) time given constant r = max_i r_i, constant u = max_i|Parents(x_i)|, and the assumption that Pr(B_S) can be computed in O(u n r).
The probability of a structure given the data is given by
Pr(B_S|D) = Fraction{Pr(B_S,D)}{Pr(D)} = Fraction{Pr(B_S,D)}{Sum_{B_S in Q}Pr(B_S,D)} (4)where Q is the set of possible belief-network structures. For comparing two structures, Pr(B_S,D) is sufficient. Note, however, that enumerating Q will be impractical for most Z. Similarly, finding max_{B_S}Pr(B_S|D) (i.e., inferring the maximum likelihood network from data) will be problematic unless you can somehow the restrict the search to a small subset of Q.
Cooper and Herskovits suggest imposing an ordering on the variables in Z. If x_i precedes x_j in the ordering, then we do not allow structures in which there is an arc from x_j to x_i. The resulting subset of Q is considerably smaller than Q but still too large to enumerate explicitly. Instead, Cooper and Herskovits suggest a greedy heuristic algorithm (called K2) proceeds by first assuming a node has no parents and then incrementally adding as a parent that predecessor in the ordering whose addition most increases the probability of the resulting structure until no improvement is possible or a prespecified threshold on the maximum number of parents is reached.
The above equations can easily be extended to handle missing data and hidden variables by marginalizing out the missing information. Straightforward implementations of this extension are not practical however.