Bei's Study Notes

Machine Learning Study Node - Clustering
Last updated: 2017-10-10 15:39:14 PDT.

Back to Intro

In parameteric approaches, we assume the samples are drawn from the a same parametric distribution. This is rarely the case. Now we relax this assumption and assume the samples are from one of a number of distributions.

This approach is called semiparametric density estimation.

Mixture densitie

The mixture density is defined as

p(\mathbf x) = \sum_{i=1}^k p(\mathbf x|\mathcal G_i)P(\mathcal G_i)

where \mathcal G_i are the mixture components. They are also called clusters. p(\mathbf x|\mathcal G_i) are called component densities and P(\mathcal G_i) are called mixture proportions. The number of parameters k is a hyperparameter.

Within each cluster is a parameteric distribution. The samples are assumed to be iid.