Generative model
In probability and statistics, a generative model is a model for randomly generating observable data values, typically given some hidden parameters. It specifies a joint probability distribution over observation and label sequences. Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from a probability density function), or as an intermediate step to forming a conditional probability density function. A conditional distribution can be formed from a generative model through Bayes' rule.
Shannon (1948) gives an example in which a table of frequencies of English word pairs is used to generate a sentence beginning with "representing and speedily is an good"; which is not proper English but which will increasingly approximate it as the table is moved from word pairs to word triplets etc.
Generative models contrast with discriminative models, in that a generative model is a full probabilistic model of all variables, whereas a discriminative model provides a model only for the target variable(s) conditional on the observed variables. Thus a generative model can be used, for example, to simulate (i.e. generate) values of any variable in the model, whereas a discriminative model allows only sampling of the target variables conditional on the observed quantities. Despite the fact that discriminative models do not need to model the distribution of the observed variables, they cannot generally express more complex relationships between the observed and target variables. They don't necessarily perform better than generative models at classification and regression tasks. In modern applications the two classes are seen as complementary or as different views of the same procedure.[1]
Examples of generative models include:
- Gaussian mixture model and other types of mixture model
- Hidden Markov model
- Probabilistic context-free grammar
- Naive Bayes
- Averaged one-dependence estimators
- Latent Dirichlet allocation
- Restricted Boltzmann machine
- Generative adversarial networks
If the observed data are truly sampled from the generative model, then fitting the parameters of the generative model to maximize the data likelihood is a common method. However, since most statistical models are only approximations to the true distribution, if the model's application is to infer about a subset of variables conditional on known values of others, then it can be argued that the approximation makes more assumptions than are necessary to solve the problem at hand. In such cases, it can be more accurate to model the conditional density functions directly using a discriminative model (see above), although application-specific details will ultimately dictate which approach is most suitable in any particular case.
Generative model in context of Machine learning A generative algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumptions, which category is most likely to generate this signal? A discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal.
Suppose the input data is x and the set of labels for x is y. A generative model learns the joint probability distribution p(x,y) while a discriminative model learns the conditional probability distribution p(y|x) “probability of y given x”. Let’s try to understand this with a example. Consider following 4 data points (x,y) –> {(0,0), (0,0), (1,0), (1,1)} For above data, p(x,y) will be following:
y=0 y=1
x=0 | 1/2 0 x=1 | 1/4 1/4
while p(y|x) will be following:
y=0 y=1
x=0 | 1 0 x=1 | 1/2 1/2
So discriminative algorithms tries to learn p(y|x) directly from the data and then tries to classify data. On the other hand, generative algorithms tries to learn p(x,y) which can be transformed into p(y|x) later to classify the data. One of the advantages of generative algorithms is that you can use p(x,y) to generate new data similar to existing data. On the other hand, discriminative algorithms generally give better performance in classification tasks. Generative: Maximum Entropy Markov Model, Naive Bayes, Latent Dirichlet Allocation, Probabilistic Context-Free Grammars Discrimative: Logistic regression, Support Vector Machines, Neural Networks, Conditional Random Fields, Hidden Markov Models
Further reading: Page on stanford.edu
See also
References
- ↑ C. M. Bishop and J. Lasserre, Generative or Discriminative? getting the best of both worlds. In Bayesian Statistics 8, Bernardo, J. M. et al. (Eds), Oxford University Press. 3–23, 2007.
Sources
- Shannon, C.E. (1948) "A Mathematical Theory of Communication", Bell System Technical Journal, vol. 27, pp. 379–423, 623–656, July, October, 1948