###
Machine Learning Study Node - Hidden Markov Model

Last updated: 2017-10-10 15:39:14 PDT.

### Discrete Markov Processes

Consider a system at any time isin one of a set of distinct states: . The state a time is denoted as .

The first-order Markov Model has:

We further simplify the model - that is, regularize - by assuming that these ** transition probabilities** are independent of time:

This can be seen as a ** stochastic automaton**. The

**, , which is the probability that the first state in the sequence is :**

*initial probabilities*### Hidden Markov Model

In a ** hidden Markove Model**, the states are not observable, but when we visit a state, an observation is recorded that is a probabilistic function of the state. We assume a discrete observation in each state from the set :

In this case of Markov model, there are two sources of randomness: transitional probability and emission probability .

To summarize and formalize, an HMM has the following elements:

- : Number of states in the model
- : Number of distinct observation sybmols in the
*alphabet* - State transition probability:
- Observation probabilities:
- Initial probability

We can describe an HMM as

### Three basic problems of HMMs

- (
) Given a model , evaluate the probability of any given observation sequence, , namely .*Evaluation Problem* - (
) Given a model , and a observation sequence , try to find the state sequence with the highest probability, namely*State Sequence Problem* - (
) Given a training set of observation sequences, , learn the model that maximizes the pribability of generating , namely .*Learning Model Parameters*

#### Problem 1: Evaluation Problem

However since is unknown, we need to infer the it first therefore the joint probability isWe can compute by marginalize over the joint:

But this is intractable. Fortunately, there is an efficient procedure to calculate called**.**

*forward-backward procedure*Define *forward variable*

Computing is . We can similarly define a backward sequence:

This can be computed as

#### Problem 2: State Sequence Problem

Define

The last line is because blocks the observables by d-separation.Given

We can determine , andHowever, this cannot be used to find the highest probable sequence, for the states are not independent. For that we need the ** Viterbi algorithm**.

Given state sequence and observation sequence , we define as the probability of the highest probablity path at time that accounts for the first observations and ends in :

Then recursively calculate as follows:

Once we obtained the , we can trace back and obtain the path.

#### Problem 3: Learning Model Parameters

We define as the probability of being in at time and in at time , given the whole observation and :

which can be computed as subject to:Note: this can also be derived by seeing the model as a Baysian network.

In unsupervised learning, we have EM algorothm. In here we have ** Baum-Welch algorithm**. It is an EM procedure. We first compute and given current model , then in the M-step, we recalculate given and . These two steps are alternated until convergence during which, it has been shown, never decreases.

The estimations are

subject to### Continuous Observations

If the inputs are continous, one possibility is to