# Bei's Study Notes

Back to Intro

### Discrete Markov Processes

Consider a system at any time isin one of a set of distinct states: . The state a time is denoted as .

The first-order Markov Model has:

We further simplify the model - that is, regularize - by assuming that these transition probabilities are independent of time:

satisfying

This can be seen as a stochastic automaton. The initial probabilities, , which is the probability that the first state in the sequence is :

satisfying

In an observable Markov mode, the states are observables. At any time , we know , and as the system moves from one state to another, we get a sequence of states. Given sequence , the probability is given as

### Hidden Markov Model

In a hidden Markove Model, the states are not observable, but when we visit a state, an observation is recorded that is a probabilistic function of the state. We assume a discrete observation in each state from the set :

called the emission probability. The states is not observed and can only be inferred from the observables.

In this case of Markov model, there are two sources of randomness: transitional probability and emission probability .

To summarize and formalize, an HMM has the following elements:

1. : Number of states in the model

2. : Number of distinct observation sybmols in the alphabet

3. State transition probability:

4. Observation probabilities:

5. Initial probability

We can describe an HMM as

### Three basic problems of HMMs

1. (Evaluation Problem) Given a model , evaluate the probability of any given observation sequence, , namely .
2. (State Sequence Problem) Given a model , and a observation sequence , try to find the state sequence with the highest probability, namely

3. (Learning Model Parameters) Given a training set of observation sequences, , learn the model that maximizes the pribability of generating , namely

.

#### Problem 1: Evaluation Problem

However since is unknown, we need to infer the it first

therefore the joint probability is

We can compute by marginalize over the joint:

But this is intractable. Fortunately, there is an efficient procedure to calculate called forward-backward procedure.

Define forward variable

Then we can calculate it recursively. First

Then given :

And eventually, the probability of an observable sequence is:

Computing is . We can similarly define a backward sequence:

This can be computed as

#### Problem 2: State Sequence Problem

Define

The last line is because blocks the observables by d-separation.

Given

We can determine , and

However, this cannot be used to find the highest probable sequence, for the states are not independent. For that we need the Viterbi algorithm.

Given state sequence and observation sequence , we define as the probability of the highest probablity path at time that accounts for the first observations and ends in :

Then recursively calculate as follows:

Once we obtained the , we can trace back and obtain the path.

#### Problem 3: Learning Model Parameters

We define as the probability of being in at time and in at time , given the whole observation and :

which can be computed as

subject to:

Note: this can also be derived by seeing the model as a Baysian network.

In unsupervised learning, we have EM algorothm. In here we have Baum-Welch algorithm. It is an EM procedure. We first compute and given current model , then in the M-step, we recalculate given and . These two steps are alternated until convergence during which, it has been shown, never decreases.

The estimations are

subject to

### Continuous Observations

If the inputs are continous, one possibility is to