# Tutorial: Probabilistic modeling with Bayesian networks in bnlearn

Lecture notes by Sara Taheri

## Installing bnlearn

Open RStudio and in console type:

```
install.packages(bnlearn)
install.packages(Rgraphviz)
```

If you experience problems installing **Rgraphviz**, try the following

script:

```
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Rgraphviz")
## Warning: package 'Rgraphviz' was built under R version 3.4.2
## Warning: package 'graph' was built under R version 3.4.2
## Warning: package 'BiocGenerics' was built under R version 3.4.2
```

Then type “bnlearn” in the window that appears and click on the install

button. Do the same thing for the other package.

## Understanding the directed acyclic graph representation

In this part, we introduce survey data set and show how we can visualize

it with bnlearn package.

### The *survey data* dataset

Survey data is a data set that contains information about usage of

different transportation systems with a focus on cars and trains for

different social groups. It includes these factors:

**Age (A):**It is recorded as*young*(**young**) for individuals

below 30 years,*adult*(**adult**) for individuals between 30 and

60 years old, and*old*(**old**) for people older than 60.-
**Sex (S):**The biological sex of individual, recorded as*male*

(**M**) or*female*(**F**). -
**Education (E):**The highest level of education or training

completed by the individual, recorded either*high school*

(**high**) or*university degree*(**uni**). -
**Occupation (O):**It is recorded as an*employee*(**emp**) or a

*self employed*(**self**) worker. -
**Residence (R):**The size of the city the individual lives in,

recorded as*small*(**small**) or*big*(**big**). -
**Travel (T):**The means of transport favoured by the individual,

recorded as*car*(**car**),*train*(**train**) or*other*

(**other**)

Travel is the *target* of the survey, the quantity of interest whose

behaviour is under investigation.

### Visualizing a Bayesian network

We can represent the relationships between the variables in the survey

data by a directed graph where each node correspond to a variable in

data and each edge represents conditional dependencies between pairs of

variables.

In bnlearn, we can graphically represent the relationships between

variables in survey data like this:

```
# empty graph
library(bnlearn)
```

```
## Warning: package 'bnlearn' was built under R version 3.4.4
##
## Attaching package: 'bnlearn'
## The following object is masked from 'package:BiocGenerics':
##
## score
## The following object is masked from 'package:stats':
##
## sigma
```

```
dag <- empty.graph(nodes = c("A","S","E","O","R","T"))
arc.set <- matrix(c("A", "E",
"S", "E",
"E", "O",
"E", "R",
"O", "T",
"R", "T"),
byrow = TRUE, ncol = 2,
dimnames = list(NULL, c("from", "to")))
arcs(dag) <- arc.set
nodes(dag)
```

```
## [1] "A" "S" "E" "O" "R" "T"
```

```
arcs(dag)
```

```
## from to
## [1,] "A" "E"
## [2,] "S" "E"
## [3,] "E" "O"
## [4,] "E" "R"
## [5,] "O" "T"
## [6,] "R" "T"
```

### Plotting the DAG

In this section we discuss the ways that we can visually demonstrate

Bayesian networks.

You can either use the simple **plot** function or use the

**graphviz.plot** function from **Rgraphviz** package.

```
# plot dag with plot function
plot(dag)
```

```
# plot dag with graphviz.plot function. Default layout is dot
graphviz.plot(dag, layout = "dot")
```

```
# plot dag with graphviz.plot function. change layout to "fdp"
graphviz.plot(dag, layout = "fdp")
```

```
# plot dag with graphviz.plot function. change layout to "circo"
graphviz.plot(dag, layout = "circo")
```

### Highlighting specific nodes

If you want to change the color of the nodes or the edges of your graph,

you can do this easily by adding a **highlight** input to the

**graphviz.plot** function.

Let’s assume that we want to change the color of all the nodes and edges

of our dag to *blue*.

```
hlight <- list(nodes = nodes(dag), arcs = arcs(dag),
col = "blue", textCol = "blue")
pp <- graphviz.plot(dag, highlight = hlight)
```

The look of the arcs can be customised as follows using the

**edgeRenderInfo** function from Rgraphviz.

```
edgeRenderInfo(pp) <- list(col = c("S~E" = "black", "E~R" = "black"),
lwd = c("S~E" = 3, "E~R" = 3))
```

Attributes being modified (i.e., col for the colour and lwd for the line

width) are specified again as the elements of a list. For each

attribute, we specify a list containing the arcs we want to modify and

the value to use for each of them. Arcs are identified by labels of the

form parent∼child, e.g., S → E is S~E.

Similarly, we can highlight nodes with nodeRenderInfo. We set their

colour and the colour of the node labels to black and their background

to grey.

```
nodeRenderInfo(pp) <- list(col = c("S" = "black", "E" = "black", "R" = "black"),
fill = c("E" = "grey"))
```

Once we have made all the desired modifications, we can plot the DAG

again with the **renderGraph** function from **Rgraphviz**.

```
renderGraph(pp)
```

### The directed acyclic graph as a representation of joint probability

The DAG represents a factorization of the joint probability distribution

into a joint probability distribution. In this section we show how to

add custom probability distributions to a DAG, as well as how to

estimate the parameters of the conditional probability distribution

using maximum likelihood estimation or Bayesian estimation.

### Specifying the probability distributions on your own

Given the DAG, the joint probability distribution of the survey data

variables factorizes as follows:

*P r(A, S, E, O, R, T) = Pr*(

*A*)

*P*(

**r**r*(*S*)*P*E*|

*A*,

*S*)

*P*(

**r**r*(*O*|*E*)*P*R*|

*E*)*P**r*(*T*|*O*, *R*).

```
A.lv <- c("young", "adult", "old")
S.lv <- c("M", "F")
E.lv <- c("high", "uni")
O.lv <- c("emp", "self")
R.lv <- c("small", "big")
T.lv <- c("car", "train", "other")
A.prob <- array(c(0.3,0.5,0.2), dim = 3, dimnames = list(A = A.lv))
S.prob <- array(c(0.6,0.4), dim = 2, dimnames = list(S = S.lv))
E.prob <- array(c(0.75,0.25,0.72,0.28,0.88,0.12,0.64,0.36,0.70,0.30,0.90,0.10), dim = c(2,3,2), dimnames = list(E = E.lv, A = A.lv, S = S.lv))
O.prob <- array(c(0.96,0.04,0.92,0.08), dim = c(2,2), dimnames = list(O = O.lv, E = E.lv))
R.prob <- array(c(0.25,0.75,0.2,0.8), dim = c(2,2), dimnames = list(R = R.lv, E = E.lv))
T.prob <- array(c(0.48,0.42,0.10,0.56,0.36,0.08,0.58,0.24,0.18,0.70,0.21,0.09), dim = c(3,2,2), dimnames = list(T = T.lv, O = O.lv, R = R.lv))
cpt <- list(A = A.prob, S = S.prob, E = E.prob, O = O.prob, R = R.prob, T = T.prob)
```

```
# custom cpt table
cpt
```

```
## $A
## A
## young adult old
## 0.3 0.5 0.2
##
## $S
## S
## M F
## 0.6 0.4
##
## $E
## , , S = M
##
## A
## E young adult old
## high 0.75 0.72 0.88
## uni 0.25 0.28 0.12
##
## , , S = F
##
## A
## E young adult old
## high 0.64 0.7 0.9
## uni 0.36 0.3 0.1
##
##
## $O
## E
## O high uni
## emp 0.96 0.92
## self 0.04 0.08
##
## $R
## E
## R high uni
## small 0.25 0.2
## big 0.75 0.8
##
## $T
## , , R = small
##
## O
## T emp self
## car 0.48 0.56
## train 0.42 0.36
## other 0.10 0.08
##
## , , R = big
##
## O
## T emp self
## car 0.58 0.70
## train 0.24 0.21
## other 0.18 0.09
```

Now that we have defined both the DAG and the local distribution

corresponding to each variable, we can combine them to form a

fully-specified BN. We combine the DAG we stored in dag and a list

containing the local distributions, which we will call cpt, into an

object of class **bn.fit** called bn.

```
# fit cpt table to network
bn <- custom.fit(dag, cpt)
```

## Estimating parameters of conditional probability tables

For the hypothetical survey described in this chapter, we have assumed

to know both the DAG and the parameters of the local distributions

defining the BN. In this scenario, BNs are used as expert systems,

because they formalise the knowledge possessed by one or more experts in

the relevant fields. However, in most cases the parameters of the local

distributions will be estimated (or learned) from an observed sample.

Let’s read the survey data:

```
survey <- read.table("data/survey.txt", header = TRUE)
head(survey)
```

```
## A R E O S T
## 1 adult big high emp F car
## 2 adult small uni emp M car
## 3 adult big uni emp F train
## 4 adult big high emp M car
## 5 adult big high emp M car
## 6 adult small high emp F train
```

In the case of this survey, and of discrete BNs in general, the

parameters to estimate are the conditional probabilities in the local

distributions. They can be estimated, for example, with the

corresponding empirical frequencies in the data set, e.g.,

$\hat{Pr}(O = emp | E = high) = \frac{\hat{Pr}(O = emp, E = high)}{\hat{Pr}(E = high)}= \frac{\text{number of observations for which O = emp and E = high}}{\text{number of observations for which E = high}}$

This yields the classic frequentist and maximum likelihood estimates. In

bnlearn, we can compute them with the **bn.fit** function. **bn.fit**

complements the **custom.fit** function we used in the previous section;

the latter constructs a BN using a set of custom parameters specified by

the user, while the former estimates the same from the data.

```
bn.mle <- bn.fit(dag, data = survey, method = "mle")
bn.mle
```

```
##
## Bayesian network parameters
##
## Parameters of node A (multinomial distribution)
##
## Conditional probability table:
## adult old young
## 0.472 0.208 0.320
##
## Parameters of node S (multinomial distribution)
##
## Conditional probability table:
## F M
## 0.402 0.598
##
## Parameters of node E (multinomial distribution)
##
## Conditional probability table:
##
## , , S = F
##
## A
## E adult old young
## high 0.6391753 0.8461538 0.5384615
## uni 0.3608247 0.1538462 0.4615385
##
## , , S = M
##
## A
## E adult old young
## high 0.7194245 0.8923077 0.8105263
## uni 0.2805755 0.1076923 0.1894737
##
##
## Parameters of node O (multinomial distribution)
##
## Conditional probability table:
##
## E
## O high uni
## emp 0.98082192 0.92592593
## self 0.01917808 0.07407407
##
## Parameters of node R (multinomial distribution)
##
## Conditional probability table:
##
## E
## R high uni
## big 0.7178082 0.8666667
## small 0.2821918 0.1333333
##
## Parameters of node T (multinomial distribution)
##
## Conditional probability table:
##
## , , R = big
##
## O
## T emp self
## car 0.58469945 0.69230769
## other 0.19945355 0.15384615
## train 0.21584699 0.15384615
##
## , , R = small
##
## O
## T emp self
## car 0.54700855 0.75000000
## other 0.07692308 0.25000000
## train 0.37606838 0.00000000
```

Note that we assume we know the structure of the network, so dag is an

input of **bn.fit** function.

As an alternative, we can also estimate the same conditional

probabilities in a Bayesian setting, using their posterior

distributions. In this case, the **method** argument of **bn.fit** must

be set to **“bayes”**.

```
bn.bayes <- bn.fit(dag, data = survey, method = "bayes", iss = 10)
```

The estimated posterior probabilities are computed from a uniform prior

over each conditional probability table. The iss optional argument,

whose name stands for imaginary sample size (also known as equivalent

sample size), determines how much weight is assigned to the prior

distribution compared to the data when computing the posterior. The

weight is specified as the size of an imaginary sample supporting the

prior distribution.

### Fit dag to data and predict the value of latent variable

```
# predicting a variable in the test set.
training = bn.fit(model2network("[A][B][E][G][C|A:B][D|B][F|A:D:E:G]"),
gaussian.test[1:2000, ])
test = gaussian.test[2001:nrow(gaussian.test), ]
predicted <- predict(training, node = "A", data = test, method = "bayes-lw")
head(predicted)
```

```
## [1] 3.3892418 0.5683240 0.6184728 1.6928813 1.0246984 1.0535917
```

**about the method bayes-lw:** the predicted values are computed by

averaging likelihood weighting simulations performed using all the

available nodes as evidence (obviously, with the exception of the node

whose values we are predicting). If the variable being predicted is

discrete, the predicted level is that with the highest conditional

probability. If the variable is continuous, the predicted value is the

expected value of the conditional distribution.

## Conditional independence in Bayesian networks

Using a DAG structure we can investigate whether a variable is

conditionally independent from another variable given a set of variables

from the DAG. If the variables depend directly on each other, there will

be a single arc connecting the nodes corresponding to those two

variables. If the dependence is indirect, there will be two or more arcs

passing through the nodes that mediate the association.

If **X** and **Y** are separated by **Z**, we say that **X** and **Y**

are conditionally independent given **Z** and denote it with,

**X **⊥⊥ _{G}**Y

**|**Z**

Conditioning on

**Z**is equivalent to fixing the values of its

elements, so that they are known quantities.

**Definition (MAPS).** Let M be the dependence structure of the

probability distribution P of data **D**, that is, the set of

conditional independence relationships linking any triplet **X**, **Y**,

**Z** of subsets of **D**. A graph G is a dependency map (or **D-map**)

of M if there is a one-to-one correspondence between the random

variables in **D** and the nodes **V** of G such that for all disjoint

subsets **X**, **Y**, **Z** of **D** we have

**X **⊥⊥ _{P}** Y

**|**Z

**⇒**X

**⊥⊥**Y

_{G}**|**Z**

Similarly, G is an independency map (or **I-map**) of M if

**X **⊥⊥ _{P}** Y

**|**Z

**⇐**X

**⊥⊥**Y

_{G}**|**Z**

G is said to be a

**perfect map**of M if it is both a D-map and an

I-map, that is

**X **⊥⊥ _{P}** Y

**|**Z

**⇔**X

**⊥⊥**Y

_{G}**|**Z**

and in this case G is said to be

**faithful**or

**isomorphic**to M.

**Definition.** A variable V is a **collider** or has a **V structure**,

if it has 2 upcoming parents.

You can find all the **V structures** of a DAG:

```
vstructs(dag)
```

```
## X Z Y
## [1,] "A" "E" "S"
## [2,] "O" "T" "R"
```

Note that conditioning on a collider induces dependence even though the

parents aren’t directly connected.

**Definition (d-separation)** If G is a directed graph in which **X**,

**Y** and **Z** are disjoint sets of vertices, then **X** and **Y** are

**d-connected** by **Z** in G if and only if there exists an undirected

path U between some vertex in **X** and some vertex in **Y** such that

for every collider C on U, either C or a descendent of C is in **Z**,

and no non-collider on U is in **Z**.

**X** and **Y** are **d-separated** by **Z** in G if and only if they

are not d-connected by **Z** in G.

We assume that graphical separation (⊥⊥_{G}) implies

probabilistic independence (⊥⊥_{P}) in a Bayesian network.

We can investigate whether two nodes in a **bn** object are d-separated

using the **dsep** function. **dsep** takes three arguments, x, y and z,

corresponding to **X**, **Y** and **Z**; the first two must be the names

of two nodes being tested for d-separation, while the latter is an

optional d-separating set. So, for example,

```
dsep(dag, x = "S", y = "R")
```

```
## [1] FALSE
```

```
dsep(dag, x = "O", y = "R")
```

```
## [1] FALSE
```

```
dsep(dag, x = "S", y = "R", z = "E")
```

```
## [1] TRUE
```

### Markov Property, Equivalence classes and CPDAGS

**Definition (Local Markov property)** Each node *X*_{i} is

conditionally independent of its non-descendants given its parents.

Compared to the previous decomposition, it highlights the fact that

parents are not completely independent from their children in the BN; a

trivial application of Bayes’ theorem to invert the direction of the

conditioning shows how information on a child can change the

distribution of the parent.

Second, assuming the DAG is an I-map also means that serial and

divergent connections result in equivalent factorisations of the

variables involved. It is easy to show that

Then *X*_{i} → *X*_{j} → *X*_{k} and

*X*_{i} ← *X*_{j} → *X*_{k} are

equivalent. As a result, we can have BNs with different arc sets that

encode the same conditional independence relationships and represent the

same global distribution in different (but probabilistically equivalent)

ways. Such DAGs are said to belong to the same **equivalence class**.

### Skeleton of a network, CPDAGs and equivalence classes

The skeleton of a network is the network without any direction. Here is

the skeleton of the dag for survey dataset.

```
graphviz.plot(skeleton(dag))
```

**Theorem. (Equivalence classes)** Two DAGs defined over the same set of

variables are equivalent and only they have the same skeleton (i.e., the

same underlying undirected graph) and the same v-structures.

```
data(learning.test)
learn.net1 = empty.graph(names(learning.test))
learn.net2 = empty.graph(c("A","B","C","D","E","F"))
modelstring(learn.net1) = "[A][C][F][B|A][D|A:C][E|B:F]"
arc.set2 <- matrix(c("B", "A",
"A", "D",
"C", "D",
"B", "E",
"F", "E"),
byrow = TRUE, ncol = 2,
dimnames = list(NULL, c("from", "to")))
arcs(learn.net2) <- arc.set2
graphviz.compare(learn.net1,learn.net2)
```

```
score(learn.net1, learning.test, type = "loglik")
```

```
## [1] -23832.13
```

```
score(learn.net2, learning.test, type = "loglik")
```

```
## [1] -23832.13
```

```
# type == "loglik" means you get the log likelihood of the data given the dag and the MLE of the parameters
```

In other words, the only arcs whose directions are important are those

that are part of one or more v-structures.

The skeleton of a DAG and it’s V structures identify the equivalence

class the DAG belongs to, which is represented by the **completed
partially directed graph (CPDAG)**. We can obtain it from a DAG with

**cpdag**function.

```
X <- paste("[X1][X3][X5][X6|X8][X2|X1][X7|X5][X4|X1:X2]",
"[X8|X3:X7][X9|X2:X7][X10|X1:X9]", sep = "")
dag2 <- model2network(X)
par(mfrow = c(1,2))
graphviz.plot(dag2)
graphviz.plot(cpdag(dag2))
```

### Moral Graphs

In previous Section we introduced an alternative graphical

representation of the DAG underlying a BN: the CPDAG of the equivalence

class the BN belongs to. Another graphical representation that can be

derived from the DAG is the **moral graph**.

The moral graph is an undirected graph that is constructed as follows:

- connecting the non-adjacent nodes in each v-structure with an

undirected arc; -
ignoring the direction of the other arcs, effectively replacing them

with undirected arcs.

This transformation is called **moralisation** because it “marries”

non-adjacent parents sharing a common child. In the case of our example

dag, we can create the moral graph with the **moral** function as

follows:

```
graphviz.plot(moral(dag2))
```

Moralisation has several uses. First, it provides a simple way to

transform a BN into the corresponding Markov network, a graphical model

using undirected graphs instead of DAGs to represent dependencies.

In a Markov network, we say that **X**⊥⊥_{G}**Y**|**Z**

if every path between **X** and **Y** contains some node *Z* ∈ **Z**.

## Plotting Conditional Probability Distributions

Plotting the conditional probabilities associated with a conditional

probability table or a query is also useful for diagnostic and

exploratory purposes. Such plots can be difficult to read when a large

number of conditioning variables is involved, but nevertheless they

provide useful insights for most synthetic and real-world data sets.

As far as conditional probability tables are concerned, **bnlearn**

provides functions to plot barcharts (**bn.fit.barchart**) and dot plots

(**bn.fit.dotplot**) from bn.fit objects. Both functions are based on

the lattice package. For example let’s look at the conditional plot of

*P**r*(*T*|*R*, *O*):

```
bn.fit.barchart(bn.mle$T, main = "Travel",
xlab = "Pr(T | R,O)", ylab = "")
```

```
## Loading required namespace: lattice
```

Awesome post! Keep up the great work! 🙂

Great content! Super high-quality! Keep it up! 🙂

I value the knowledge on your website. Thanks a ton!