|
|
BNGenerator Version 0.3A generator for random Bayesian network
|
|
|
Decision Making Lab Escola Politécnica - University of São Paulo |
|
|
|
|
|
|
|
|
|
Thanks for the visit; you're visitor
since January 13 2003.
Last up date: September 29 2004.
BNGenerator is a program that generates random Bayesian networks, guaranteeing that the distribution of generated networks is (asymptotically) uniform (we do not generate networks from data!!).The program is coded in Java and is freely distributed under the GNU license.
The program accepts constraints on the maximum degree for nodes and on the maximum number of arcs in the network; these constraints limit how "connected" the networks are. Also, it is possible to control the induced width of the network, leading to more "realistic" networks. Other constraints (for example, maximum number of parents for each node) could be coded with relatively simple modifications in the program; the theoretical analysis of the algorithm and the source code are available, and these modifications should not be too hard.
BNGenerator was coded by
Jaime Shinsuke Ide
, as a result of research
conducted by Jaime S. Ide and
Fabio G. Cozman
. Ideas and results were based on the work of
G. Melançon
and M. Bousquet-Melou, and some parts of BNGenerator were directly
inspired by the DagAlea
software. The research was funded by
FAPESP
.
There is a huge number of directed acyclic graphs for any reasonably large number of nodes. Even though there are expressions that would allow one to compute the number of directed acyclic graphs, there does not seem to exist a simple method that would allow us to sample uniformly from this huge space. Basically, there does not seem to be a method to map the integers (from 1 to some large number M) to the directed acyclic graphs with N nodes. But even if we knew how to sample from the space of all possible directed acyclic graphs, that knowledge would not give us a practical solution to the problem of generating random Bayesian networks. It has been observed that in practice Bayesian networks are rather sparse graphs, in the sense that nodes have a few parents and children (that is, the degree of the nodes is not too large). Now, our problem then is to generate uniformly distributed samples from the class of directed acyclic graphs with restrictions on node degree (or possibly, restrictions on the number of parents, number of children, number of arcs, etc). This is a hard problem.
Our approach, and the proofs that our algorithms for generating random directed acyclic graphs are correct, are in the paper
Jaime Shinsuke Ide , Fabio Gagliardi Cozman . Generating random Bayesian networks , Brazilian Symposium on Artificial Intelligence , Recife, Pernambuco, Brazil, 2002.To generate random directed acyclic graphs, BNGenerator uses MCMC methods (Markov chain Monte Carlo). We define a set of possible transitions for a graph (that is, a set of possible modifications for the graph) and we select one transition at random. Given the characteristics of the possible transitions and the selection process, a random directed acyclic graph with uniform distribution is produced after several transitions. In fact, the result is asymptotic: in the limit of infinitely many transitions, a uniformly distributed sample will be produced. In the paper indicated previously, we present the transitions and selection process used by BNGenerator, and prove that they have the right properties: after many transitions, we obtain a uniform random Bayesian network. In theory we would have to wait for infinitely many iterations, but in practice we stop at a large number. Empirical studies suggest that for a graph with N nodes, we should perform at least 4 * N * N transitions. That is the number of transitions that BNGenerator adopts by default, but the user can specify a different number.
When we have a random directed acyclic graph, it is easy to generate a number V for each node. We just sample the uniform distribution between 2 and some maximum allowed number Vm . Once we have the number of values for all variables in the graph, we can generate probability distributions by sampling from Dirichlet distributions. See the paper indicated previously for a more detailed discussion.
BNGenerator starts from a very simple Bayesian network (a chain of nodes where each node has at most a parent and a child) and keeps performing transitions. The user can specify how many transitions are performed before a network is saved in file. After a network is saved, BNGenerator can keep performing transitions and saving more networks.
An important point is that we differentiate between generic graphs and
graphs that do have a single path (directed or undirected) between any
two nodes. The latter graphs are said singly connected, and also
called polytrees. The distinction is important in Bayesian networks
because singly connected networks have interesting properties that do not
generalize to other classes of networks. It should be noted that no guaranteed
algorithm for generating random polytrees seems to exist at the moment (that
is, an algorithm that can guarantee the distribution of the generated samples).
BNGenerator can produce random singly connected Bayesian networks if the
user wishes to do so.
We eventually found that the most appropriate quantity to control is the induced width of networks. The induced width of a network can be easily explained and understood; it conveys the algorithmic complexity of inferences and, indirectly, it captures how dense the network is. Our tests indicate (rather subjectively) that a network with low induced width ''looks like'' real networks in the literature. Besides, it makes sense to control induced width, as we are usually interested in comparing algorithms or parameterizing results with respect to the complexity of the underlying network, and induced width is the main indicator of such complexity. Unfortunately, the generation of random graphs with constraints on induced width is significantly more involved than the generation of graphs with constraints on node degree and number of edges. In the paper above, we report on new algorithms that accomplish generation of graphs with simultaneous constraints on all these quantities: induced width, node degree, and number of edges.
- Jaime Shinsuke Ide , Fabio Gagliardi Cozman and F.T. Ramos . Generating Random Bayesian Networks with Constraints on Induced Width . Proceedings of the 16th European Conference on Artificial Intelligence (ECAI-04), IOS Press, Amsterdam, p. 323-327, 2004.
- Jaime Shinsuke Ide , Fabio Gagliardi Cozman . Generation of Random Bayesian Networks with Constraints on Induced Width, with Applications to the Average Analysis od d-Connectivity, Quasi-random Sampling, and Loopy Propagation. University of São Paulo. Technical Report June 2003.
Finding the exact value of induced width is NP-hard problem with no easy solution. Then, in order to get more efficiency, we compute as ''induced width'' a value obtained with a particular efficient heuristic. We observed that the heuristic approximates very well the exact value of the induced width.
For working with the new version of BNGenerator, you need in addition an adapted version of EBayes (embayes.jar ) for computing the induced width of generated networks.
Please, if you have downloaded BNGenerator , contact Jaime Shinsuke Ide , in order to notice you for new updates!
The colt.jar library is distributed by CERN , with the following notice:
packages cern.colt* , cern.jet*, cern.clhepThe MersenneTwister pseudo-random number generator is distributed with the BSD license.
Written by Wolfgang Hoschek. Check the Colt home page for more info.
Copyright © 1999 CERN - European Organization for Nuclear Research.
Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation. CERN makes no representations about the suitability of this software for any purpose. It is provided "as is" without expressed or implied warranty.
If you make modifications to the source files and want to compile the program again, you can use the compiler provided by the Sun distribution of Java 2. Assuming all source files and the library colt.jar are in the same directory, you can compile the program as follows:
BNGenerator is run as follows (assuming all files and the colt.jar library are in the same directory):
Every option has a default value that is assumed if the option is not specified. The options are:
This research was conducted at the
Decision Making Lab
at the University of Sao
Paulo
, with support from the funding agency
FAPESP
; both institutions are in Sao Paulo, Brazil.