Machine Learning Study Node - Hidden Markov Model
Last updated: 2017-10-10 15:39:14 PDT.
Discrete Markov Processes
Consider a system at any time isin one of a set of distinct states: . The state a time is denoted as .
The first-order Markov Model has:
We further simplify the model - that is, regularize - by assuming that these transition probabilities are independent of time:satisfying
This can be seen as a stochastic automaton. The initial probabilities, , which is the probability that the first state in the sequence is :satisfying In an observable Markov mode, the states are observables. At any time , we know , and as the system moves from one state to another, we get a sequence of states. Given sequence , the probability is given as
Hidden Markov Model
In a hidden Markove Model, the states are not observable, but when we visit a state, an observation is recorded that is a probabilistic function of the state. We assume a discrete observation in each state from the set :called the emission probability. The states is not observed and can only be inferred from the observables.
In this case of Markov model, there are two sources of randomness: transitional probability and emission probability .
To summarize and formalize, an HMM has the following elements:
- : Number of states in the model
- : Number of distinct observation sybmols in the alphabet
- State transition probability:
- Observation probabilities:
- Initial probability
We can describe an HMM as
Three basic problems of HMMs
- (Evaluation Problem) Given a model , evaluate the probability of any given observation sequence, , namely .
- (State Sequence Problem) Given a model , and a observation sequence , try to find the state sequence with the highest probability, namely
- (Learning Model Parameters) Given a training set of observation sequences, , learn the model that maximizes the pribability of generating , namely .
Problem 1: Evaluation ProblemHowever since is unknown, we need to infer the it first therefore the joint probability is
We can compute by marginalize over the joint:But this is intractable. Fortunately, there is an efficient procedure to calculate called forward-backward procedure.
Define forward variableThen we can calculate it recursively. First Then given : And eventually, the probability of an observable sequence is:
Computing is . We can similarly define a backward sequence:
This can be computed as
Problem 2: State Sequence Problem
DefineThe last line is because blocks the observables by d-separation.
GivenWe can determine , and
However, this cannot be used to find the highest probable sequence, for the states are not independent. For that we need the Viterbi algorithm.
Given state sequence and observation sequence , we define as the probability of the highest probablity path at time that accounts for the first observations and ends in :
Then recursively calculate as follows:
Once we obtained the , we can trace back and obtain the path.
Problem 3: Learning Model Parameters
We define as the probability of being in at time and in at time , given the whole observation and :which can be computed as subject to:
Note: this can also be derived by seeing the model as a Baysian network.
In unsupervised learning, we have EM algorothm. In here we have Baum-Welch algorithm. It is an EM procedure. We first compute and given current model , then in the M-step, we recalculate given and . These two steps are alternated until convergence during which, it has been shown, never decreases.
The estimations aresubject to
If the inputs are continous, one possibility is to