Tutorial: Probabilistic modeling with Bayesian networks in bnlearn

Lecture notes by Sara Taheri

Installing bnlearn

Open RStudio and in console type:

install.packages(bnlearn)
install.packages(Rgraphviz)

If you experience problems installing Rgraphviz, try the following
script:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("Rgraphviz")

## Warning: package 'Rgraphviz' was built under R version 3.4.2

## Warning: package 'graph' was built under R version 3.4.2

## Warning: package 'BiocGenerics' was built under R version 3.4.2

Then type “bnlearn” in the window that appears and click on the install
button. Do the same thing for the other package.

Understanding the directed acyclic graph representation

In this part, we introduce survey data set and show how we can visualize
it with bnlearn package.

The survey data dataset

Survey data is a data set that contains information about usage of
different transportation systems with a focus on cars and trains for
different social groups. It includes these factors:

  • Age (A): It is recorded as young (young) for individuals
    below 30 years, adult (adult) for individuals between 30 and
    60 years old, and old (old) for people older than 60.

  • Sex (S): The biological sex of individual, recorded as male
    (M) or female (F).

  • Education (E): The highest level of education or training
    completed by the individual, recorded either high school
    (high) or university degree (uni).

  • Occupation (O): It is recorded as an employee (emp) or a
    self employed (self) worker.

  • Residence (R): The size of the city the individual lives in,
    recorded as small (small) or big (big).

  • Travel (T): The means of transport favoured by the individual,
    recorded as car (car), train (train) or other
    (other)

Travel is the target of the survey, the quantity of interest whose
behaviour is under investigation.

Visualizing a Bayesian network

We can represent the relationships between the variables in the survey
data by a directed graph where each node correspond to a variable in
data and each edge represents conditional dependencies between pairs of
variables.

In bnlearn, we can graphically represent the relationships between
variables in survey data like this:

# empty graph
library(bnlearn)
## Warning: package 'bnlearn' was built under R version 3.4.4

## 
## Attaching package: 'bnlearn'

## The following object is masked from 'package:BiocGenerics':
## 
##     score

## The following object is masked from 'package:stats':
## 
##     sigma
dag <- empty.graph(nodes = c("A","S","E","O","R","T"))
arc.set <- matrix(c("A", "E",
                    "S", "E",
                    "E", "O",
                    "E", "R",
                    "O", "T",
                    "R", "T"),
                  byrow = TRUE, ncol = 2,
                  dimnames = list(NULL, c("from", "to")))
arcs(dag) <- arc.set
nodes(dag)
## [1] "A" "S" "E" "O" "R" "T"
arcs(dag)
##      from to 
## [1,] "A"  "E"
## [2,] "S"  "E"
## [3,] "E"  "O"
## [4,] "E"  "R"
## [5,] "O"  "T"
## [6,] "R"  "T"

Plotting the DAG

In this section we discuss the ways that we can visually demonstrate
Bayesian networks.

You can either use the simple plot function or use the
graphviz.plot function from Rgraphviz package.

# plot dag with plot function
plot(dag)

# plot dag with graphviz.plot function. Default layout is dot
graphviz.plot(dag, layout = "dot")

# plot dag with graphviz.plot function. change layout to "fdp"
graphviz.plot(dag, layout = "fdp")

# plot dag with graphviz.plot function. change layout to "circo"
graphviz.plot(dag, layout = "circo")

Highlighting specific nodes

If you want to change the color of the nodes or the edges of your graph,
you can do this easily by adding a highlight input to the
graphviz.plot function.

Let’s assume that we want to change the color of all the nodes and edges
of our dag to blue.

hlight <- list(nodes = nodes(dag), arcs = arcs(dag),
               col = "blue", textCol = "blue")
pp <- graphviz.plot(dag, highlight = hlight)

The look of the arcs can be customised as follows using the
edgeRenderInfo function from Rgraphviz.

edgeRenderInfo(pp) <- list(col = c("S~E" = "black", "E~R" = "black"),
                           lwd = c("S~E" = 3, "E~R" = 3))

Attributes being modified (i.e., col for the colour and lwd for the line
width) are specified again as the elements of a list. For each
attribute, we specify a list containing the arcs we want to modify and
the value to use for each of them. Arcs are identified by labels of the
form parent∼child, e.g., S → E is S~E.

Similarly, we can highlight nodes with nodeRenderInfo. We set their
colour and the colour of the node labels to black and their background
to grey.

nodeRenderInfo(pp) <- list(col = c("S" = "black", "E" = "black", "R" = "black"),
                           fill = c("E" = "grey"))

Once we have made all the desired modifications, we can plot the DAG
again with the renderGraph function from Rgraphviz.

renderGraph(pp)

The directed acyclic graph as a representation of joint probability

The DAG represents a factorization of the joint probability distribution
into a joint probability distribution. In this section we show how to
add custom probability distributions to a DAG, as well as how to
estimate the parameters of the conditional probability distribution
using maximum likelihood estimation or Bayesian estimation.

Specifying the probability distributions on your own

Given the DAG, the joint probability distribution of the survey data
variables factorizes as follows:

Pr(A, S, E, O, R, T) = Pr(A)Pr(S)Pr(E|A, S)Pr(O|E)Pr(R|E)*P**r*(*T*|*O*, *R*).

A.lv <- c("young", "adult", "old")
S.lv <- c("M", "F")
E.lv <- c("high", "uni")
O.lv <- c("emp", "self")
R.lv <- c("small", "big")
T.lv <- c("car", "train", "other")

A.prob <- array(c(0.3,0.5,0.2), dim = 3, dimnames = list(A = A.lv))
S.prob <- array(c(0.6,0.4), dim = 2, dimnames = list(S = S.lv))
E.prob <- array(c(0.75,0.25,0.72,0.28,0.88,0.12,0.64,0.36,0.70,0.30,0.90,0.10), dim = c(2,3,2), dimnames = list(E = E.lv, A = A.lv, S = S.lv))
O.prob <- array(c(0.96,0.04,0.92,0.08), dim = c(2,2), dimnames = list(O = O.lv, E = E.lv))
R.prob <- array(c(0.25,0.75,0.2,0.8), dim = c(2,2), dimnames = list(R = R.lv, E = E.lv))
T.prob <- array(c(0.48,0.42,0.10,0.56,0.36,0.08,0.58,0.24,0.18,0.70,0.21,0.09), dim = c(3,2,2), dimnames = list(T = T.lv, O = O.lv, R = R.lv))
cpt <- list(A = A.prob, S = S.prob, E = E.prob, O = O.prob, R = R.prob, T = T.prob)
# custom cpt table
cpt
## $A
## A
## young adult   old 
##   0.3   0.5   0.2 
## 
## $S
## S
##   M   F 
## 0.6 0.4 
## 
## $E
## , , S = M
## 
##       A
## E      young adult  old
##   high  0.75  0.72 0.88
##   uni   0.25  0.28 0.12
## 
## , , S = F
## 
##       A
## E      young adult old
##   high  0.64   0.7 0.9
##   uni   0.36   0.3 0.1
## 
## 
## $O
##       E
## O      high  uni
##   emp  0.96 0.92
##   self 0.04 0.08
## 
## $R
##        E
## R       high uni
##   small 0.25 0.2
##   big   0.75 0.8
## 
## $T
## , , R = small
## 
##        O
## T        emp self
##   car   0.48 0.56
##   train 0.42 0.36
##   other 0.10 0.08
## 
## , , R = big
## 
##        O
## T        emp self
##   car   0.58 0.70
##   train 0.24 0.21
##   other 0.18 0.09

Now that we have defined both the DAG and the local distribution
corresponding to each variable, we can combine them to form a
fully-specified BN. We combine the DAG we stored in dag and a list
containing the local distributions, which we will call cpt, into an
object of class bn.fit called bn.

# fit cpt table to network
bn <- custom.fit(dag, cpt)

Estimating parameters of conditional probability tables

For the hypothetical survey described in this chapter, we have assumed
to know both the DAG and the parameters of the local distributions
defining the BN. In this scenario, BNs are used as expert systems,
because they formalise the knowledge possessed by one or more experts in
the relevant fields. However, in most cases the parameters of the local
distributions will be estimated (or learned) from an observed sample.

Let’s read the survey data:

survey <- read.table("data/survey.txt", header = TRUE)
head(survey)
##       A     R    E   O S     T
## 1 adult   big high emp F   car
## 2 adult small  uni emp M   car
## 3 adult   big  uni emp F train
## 4 adult   big high emp M   car
## 5 adult   big high emp M   car
## 6 adult small high emp F train

In the case of this survey, and of discrete BNs in general, the
parameters to estimate are the conditional probabilities in the local
distributions. They can be estimated, for example, with the
corresponding empirical frequencies in the data set, e.g.,

$\hat{Pr}(O = emp | E = high) = \frac{\hat{Pr}(O = emp, E = high)}{\hat{Pr}(E = high)}= \frac{\text{number of observations for which O = emp and E = high}}{\text{number of observations for which E = high}}$

This yields the classic frequentist and maximum likelihood estimates. In
bnlearn, we can compute them with the bn.fit function. bn.fit
complements the custom.fit function we used in the previous section;
the latter constructs a BN using a set of custom parameters specified by
the user, while the former estimates the same from the data.

bn.mle <- bn.fit(dag, data = survey, method = "mle")
bn.mle
## 
##   Bayesian network parameters
## 
##   Parameters of node A (multinomial distribution)
## 
## Conditional probability table:
##  adult   old young 
## 0.472 0.208 0.320 
## 
##   Parameters of node S (multinomial distribution)
## 
## Conditional probability table:
##      F     M 
## 0.402 0.598 
## 
##   Parameters of node E (multinomial distribution)
## 
## Conditional probability table:
##  
## , , S = F
## 
##       A
## E          adult       old     young
##   high 0.6391753 0.8461538 0.5384615
##   uni  0.3608247 0.1538462 0.4615385
## 
## , , S = M
## 
##       A
## E          adult       old     young
##   high 0.7194245 0.8923077 0.8105263
##   uni  0.2805755 0.1076923 0.1894737
## 
## 
##   Parameters of node O (multinomial distribution)
## 
## Conditional probability table:
##  
##       E
## O            high        uni
##   emp  0.98082192 0.92592593
##   self 0.01917808 0.07407407
## 
##   Parameters of node R (multinomial distribution)
## 
## Conditional probability table:
##  
##        E
## R            high       uni
##   big   0.7178082 0.8666667
##   small 0.2821918 0.1333333
## 
##   Parameters of node T (multinomial distribution)
## 
## Conditional probability table:
##  
## , , R = big
## 
##        O
## T              emp       self
##   car   0.58469945 0.69230769
##   other 0.19945355 0.15384615
##   train 0.21584699 0.15384615
## 
## , , R = small
## 
##        O
## T              emp       self
##   car   0.54700855 0.75000000
##   other 0.07692308 0.25000000
##   train 0.37606838 0.00000000

Note that we assume we know the structure of the network, so dag is an
input of bn.fit function.

As an alternative, we can also estimate the same conditional
probabilities in a Bayesian setting, using their posterior
distributions. In this case, the method argument of bn.fit must
be set to “bayes”.

bn.bayes <- bn.fit(dag, data = survey, method = "bayes", iss = 10)

The estimated posterior probabilities are computed from a uniform prior
over each conditional probability table. The iss optional argument,
whose name stands for imaginary sample size (also known as equivalent
sample size), determines how much weight is assigned to the prior
distribution compared to the data when computing the posterior. The
weight is specified as the size of an imaginary sample supporting the
prior distribution.

Fit dag to data and predict the value of latent variable

# predicting a variable in the test set.
training = bn.fit(model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]"),
                  gaussian.test[1:2000, ])
test = gaussian.test[2001:nrow(gaussian.test), ]
predicted <- predict(training, node = "A", data = test, method = "bayes-lw")
head(predicted)
## [1] 3.3892418 0.5683240 0.6184728 1.6928813 1.0246984 1.0535917

about the method bayes-lw: the predicted values are computed by
averaging likelihood weighting simulations performed using all the
available nodes as evidence (obviously, with the exception of the node
whose values we are predicting). If the variable being predicted is
discrete, the predicted level is that with the highest conditional
probability. If the variable is continuous, the predicted value is the
expected value of the conditional distribution.

Conditional independence in Bayesian networks

Using a DAG structure we can investigate whether a variable is
conditionally independent from another variable given a set of variables
from the DAG. If the variables depend directly on each other, there will
be a single arc connecting the nodes corresponding to those two
variables. If the dependence is indirect, there will be two or more arcs
passing through the nodes that mediate the association.

If X and Y are separated by Z, we say that X and Y
are conditionally independent given Z and denote it with,

X **​⊥​​​⊥GY | Z**
Conditioning on Z is equivalent to fixing the values of its
elements, so that they are known quantities.

Definition (MAPS). Let M be the dependence structure of the
probability distribution P of data D, that is, the set of
conditional independence relationships linking any triplet X, Y,
Z of subsets of D. A graph G is a dependency map (or D-map)
of M if there is a one-to-one correspondence between the random
variables in D and the nodes V of G such that for all disjoint
subsets X, Y, Z of D we have

X **​⊥​​​⊥P Y | Z ⇒ X ​⊥​​​⊥G Y | Z**

Similarly, G is an independency map (or I-map) of M if

X **​⊥​​​⊥P Y | Z ⇐ X ​⊥​​​⊥G Y | Z**
G is said to be a perfect map of M if it is both a D-map and an
I-map, that is

X **​⊥​​​⊥P Y | Z ⇔ X ​⊥​​​⊥G Y | Z**
and in this case G is said to be faithful or isomorphic to M.

Definition. A variable V is a collider or has a V structure,
if it has 2 upcoming parents.

V is a collider

V is a collider

You can find all the V structures of a DAG:

vstructs(dag)
##      X   Z   Y  
## [1,] "A" "E" "S"
## [2,] "O" "T" "R"

Note that conditioning on a collider induces dependence even though the
parents aren’t directly connected.

Definition (d-separation) If G is a directed graph in which X,
Y and Z are disjoint sets of vertices, then X and Y are
d-connected by Z in G if and only if there exists an undirected
path U between some vertex in X and some vertex in Y such that
for every collider C on U, either C or a descendent of C is in Z,
and no non-collider on U is in Z.

X and Y are d-separated by Z in G if and only if they
are not d-connected by Z in G.

We assume that graphical separation (​⊥​​​⊥G) implies
probabilistic independence (​⊥​​​⊥P) in a Bayesian network.

We can investigate whether two nodes in a bn object are d-separated
using the dsep function. dsep takes three arguments, x, y and z,
corresponding to X, Y and Z; the first two must be the names
of two nodes being tested for d-separation, while the latter is an
optional d-separating set. So, for example,

dsep(dag, x = "S", y = "R")
## [1] FALSE
dsep(dag, x = "O", y = "R")
## [1] FALSE
dsep(dag, x = "S", y = "R", z = "E")
## [1] TRUE

Markov Property, Equivalence classes and CPDAGS

Definition (Local Markov property) Each node Xi is
conditionally independent of its non-descendants given its parents.

Compared to the previous decomposition, it highlights the fact that
parents are not completely independent from their children in the BN; a
trivial application of Bayes’ theorem to invert the direction of the
conditioning shows how information on a child can change the
distribution of the parent.

Second, assuming the DAG is an I-map also means that serial and
divergent connections result in equivalent factorisations of the
variables involved. It is easy to show that

Then Xi → Xj → Xk and
Xi ← Xj → Xk are
equivalent. As a result, we can have BNs with different arc sets that
encode the same conditional independence relationships and represent the
same global distribution in different (but probabilistically equivalent)
ways. Such DAGs are said to belong to the same equivalence class.

Skeleton of a network, CPDAGs and equivalence classes

The skeleton of a network is the network without any direction. Here is
the skeleton of the dag for survey dataset.

graphviz.plot(skeleton(dag))

Theorem. (Equivalence classes) Two DAGs defined over the same set of
variables are equivalent and only they have the same skeleton (i.e., the
same underlying undirected graph) and the same v-structures.

data(learning.test)
learn.net1 = empty.graph(names(learning.test))
learn.net2 = empty.graph(c("A","B","C","D","E","F"))
modelstring(learn.net1) = "[A][C][F][B|A][D|A:C][E|B:F]"
arc.set2 <- matrix(c("B", "A",
                    "A", "D",
                    "C", "D",
                    "B", "E",
                    "F", "E"),
                  byrow = TRUE, ncol = 2,
                  dimnames = list(NULL, c("from", "to")))
arcs(learn.net2) <- arc.set2
graphviz.compare(learn.net1,learn.net2)

score(learn.net1, learning.test, type = "loglik")
## [1] -23832.13
score(learn.net2, learning.test, type = "loglik")
## [1] -23832.13
# type == "loglik" means you get the log likelihood of the data given the dag and the MLE of the parameters

In other words, the only arcs whose directions are important are those
that are part of one or more v-structures.

The skeleton of a DAG and it’s V structures identify the equivalence
class the DAG belongs to, which is represented by the completed
partially directed graph (CPDAG)
. We can obtain it from a DAG with
cpdag function.

X <- paste("[X1][X3][X5][X6|X8][X2|X1][X7|X5][X4|X1:X2]",
           "[X8|X3:X7][X9|X2:X7][X10|X1:X9]", sep = "")
dag2 <- model2network(X)
par(mfrow = c(1,2))
graphviz.plot(dag2)
graphviz.plot(cpdag(dag2))

Moral Graphs

In previous Section we introduced an alternative graphical
representation of the DAG underlying a BN: the CPDAG of the equivalence
class the BN belongs to. Another graphical representation that can be
derived from the DAG is the moral graph.

The moral graph is an undirected graph that is constructed as follows:

  1. connecting the non-adjacent nodes in each v-structure with an
    undirected arc;

  2. ignoring the direction of the other arcs, effectively replacing them
    with undirected arcs.

This transformation is called moralisation because it “marries”
non-adjacent parents sharing a common child. In the case of our example
dag, we can create the moral graph with the moral function as
follows:

graphviz.plot(moral(dag2))

Moralisation has several uses. First, it provides a simple way to
transform a BN into the corresponding Markov network, a graphical model
using undirected graphs instead of DAGs to represent dependencies.

In a Markov network, we say that X​⊥​​​⊥GY|Z
if every path between X and Y contains some node Z ∈ Z.

Plotting Conditional Probability Distributions

Plotting the conditional probabilities associated with a conditional
probability table or a query is also useful for diagnostic and
exploratory purposes. Such plots can be difficult to read when a large
number of conditioning variables is involved, but nevertheless they
provide useful insights for most synthetic and real-world data sets.

As far as conditional probability tables are concerned, bnlearn
provides functions to plot barcharts (bn.fit.barchart) and dot plots
(bn.fit.dotplot) from bn.fit objects. Both functions are based on
the lattice package. For example let’s look at the conditional plot of
*P**r*(*T*|*R*, *O*):

bn.fit.barchart(bn.mle$T, main = "Travel",
                xlab = "Pr(T | R,O)", ylab = "")
## Loading required namespace: lattice

3 thoughts on “3. bnlearn tutorial

Leave a Reply

Your email address will not be published. Required fields are marked *

Name *