Scaled inverse chi-squared distribution

Scaled inverse chi-squared
Probability density function
Cumulative distribution function
Parameters	$\nu > 0\,$ $\tau^2 > 0\,$
Support	$x \in (0, \infty)$
PDF	$\frac{(\tau^2\nu/2)^{\nu/2}}{\Gamma(\nu/2)}~ \frac{\exp\left[ \frac{-\nu \tau^2}{2 x}\right]}{x^{1+\nu/2}}$
CDF	$\Gamma\left(\frac{\nu}{2},\frac{\tau^2\nu}{2x}\right) \left/\Gamma\left(\frac{\nu}{2}\right)\right.$
Mean	$\frac{\nu \tau^2}{\nu-2}$ for $\nu >2\,$
Mode	$\frac{\nu \tau^2}{\nu+2}$
Variance	$\frac{2 \nu^2 \tau^4}{(\nu-2)^2 (\nu-4)}$ for $\nu >4\,$
Skewness	$\frac{4}{\nu-6}\sqrt{2(\nu-4)}$ for $\nu >6\,$
Ex. kurtosis	$\frac{12(5\nu-22)}{(\nu-6)(\nu-8)}$ for $\nu >8\,$
Entropy	$\frac{\nu}{2} \!+\!\ln\left(\frac{\tau^2\nu}{2}\Gamma\left(\frac{\nu}{2}\right)\right)$ $\!-\!\left(1\!+\!\frac{\nu}{2}\right)\psi\left(\frac{\nu}{2}\right)$
MGF	$\frac{2}{\Gamma(\frac{\nu}{2})}\left(\frac{-\tau^2\nu t}{2}\right)^{\!\!\frac{\nu}{4}}\!\!K_{\frac{\nu}{2}}\left(\sqrt{-2\tau^2\nu t}\right)$
CF	$\frac{2}{\Gamma(\frac{\nu}{2})}\left(\frac{-i\tau^2\nu t}{2}\right)^{\!\!\frac{\nu}{4}}\!\!K_{\frac{\nu}{2}}\left(\sqrt{-2i\tau^2\nu t}\right)$

The scaled inverse chi-squared distribution is the distribution for x = 1/s², where s² is a sample mean of the squares of ν independent normal random variables that have mean 0 and inverse variance 1/σ² = τ². The distribution is therefore parametrised by the two quantities ν and τ², referred to as the number of chi-squared degrees of freedom and the scaling parameter, respectively.

This family of scaled inverse chi-squared distributions is closely related to two other distribution families, those of the inverse-chi-squared distribution and the inverse gamma distribution. Compared to the inverse-chi-squared distribution, the scaled distribution has an extra parameter τ², which scales the distribution horizontally and vertically, representing the inverse-variance of the original underlying process. Also, the scale inverse chi-squared distribution is presented as the distribution for the inverse of the mean of ν squared deviates, rather than the inverse of their sum. The two distributions thus have the relation that if

X \sim \mbox{Scale-inv-}\chi^2(\nu, \tau^2)

then

\frac{X}{\tau^2 \nu} \sim \mbox{inv-}\chi^2(\nu)

Compared to the inverse gamma distribution, the scaled inverse chi-squared distribution describes the same data distribution, but using a different parametrization, which may be more convenient in some circumstances. Specifically, if

X \sim \mbox{Scale-inv-}\chi^2(\nu, \tau^2)

then

X \sim \textrm{Inv-Gamma}\left(\frac{\nu}{2}, \frac{\nu\tau^2}{2}\right)

Either form may be used to represent the maximum entropy distribution for a fixed first inverse moment $(E(1/X))$ and first logarithmic moment $(E(\ln(X))$ .

The scaled inverse chi-squared distribution also has a particular use in Bayesian statistics, somewhat unrelated to its use as a predictive distribution for x = 1/s². Specifically, the scaled inverse chi-squared distribution can be used as a conjugate prior for the variance parameter of a normal distribution. In this context the scaling parameter is denoted by σ₀² rather than by τ², and has a different interpretation. The application has been more usually presented using the inverse gamma distribution formulation instead; however, some authors, following in particular Gelman et al. (1995/2004) argue that the inverse chi-squared parametrisation is more intuitive.

Characterization

The probability density function of the scaled inverse chi-squared distribution extends over the domain $x>0$ and is

f(x; \nu, \tau^2)= \frac{(\tau^2\nu/2)^{\nu/2}}{\Gamma(\nu/2)}~ \frac{\exp\left[ \frac{-\nu \tau^2}{2 x}\right]}{x^{1+\nu/2}}

where $\nu$ is the degrees of freedom parameter and $\tau^2$ is the scale parameter. The cumulative distribution function is

F(x; \nu, \tau^2)= \Gamma\left(\frac{\nu}{2},\frac{\tau^2\nu}{2x}\right) \left/\Gamma\left(\frac{\nu}{2}\right)\right.

=Q\left(\frac{\nu}{2},\frac{\tau^2\nu}{2x}\right)

where $\Gamma(a,x)$ is the incomplete Gamma function, $\Gamma(x)$ is the Gamma function and $Q(a,x)$ is a regularized Gamma function. The characteristic function is

\varphi(t;\nu,\tau^2)=

\frac{2}{\Gamma(\frac{\nu}{2})}\left(\frac{-i\tau^2\nu t}{2}\right)^{\!\!\frac{\nu}{4}}\!\!K_{\frac{\nu}{2}}\left(\sqrt{-2i\tau^2\nu t}\right) ,

where $K_{\frac{\nu}{2}}(z)$ is the modified Bessel function of the second kind.

Differential equation

$\left\{2 x^2 f'(x)+f(x) \left(-\nu \tau ^2+\nu x+2 x\right)=0,f(1)=\frac{2^{-\nu /2} e^{-\frac{\nu \tau ^2}{2}} \left(\nu \tau ^2\right)^{\nu /2}}{\Gamma \left(\frac{\nu }{2}\right)}\right\}$

Parameter estimation

The maximum likelihood estimate of $\tau^2$ is

\tau^2 = n/\sum_{i=1}^n \frac{1}{x_i}.

The maximum likelihood estimate of $\frac{\nu}{2}$ can be found using Newton's method on:

\ln(\frac{\nu}{2}) + \psi(\frac{\nu}{2}) = \sum_{i=1}^n \ln(x_i) - n \ln(\tau^2) ,

where $\psi (x)$ is the digamma function. An initial estimate can be found by taking the formula for mean and solving it for $\nu.$ Let $\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i$ be the sample mean. Then an initial estimate for $\nu$ is given by:

\frac{\nu}{2} = \frac{\bar{x}}{\bar{x} - \tau^2}.

Bayesian estimation of the variance of a Normal distribution

The scaled inverse chi-squared distribution has a second important application, in the Bayesian estimation of the variance of a Normal distribution.

According to Bayes theorem, the posterior probability distribution for quantities of interest is proportional to the product of a prior distribution for the quantities and a likelihood function:

p(\sigma^2|D,I) \propto p(\sigma^2|I) \; p(D|\sigma^2)

where D represents the data and I represents any initial information about σ² that we may already have.

The simplest scenario arises if the mean μ is already known; or, alternatively, if it is the conditional distribution of σ² that is sought, for a particular assumed value of μ.

Then the likelihood term L(σ²|D) = p(D|σ²) has the familiar form

\mathcal{L}(\sigma^2|D,\mu) = \frac{1}{\left(\sqrt{2\pi}\sigma\right)^n} \; \exp \left[ -\frac{\sum_i^n(x_i-\mu)^2}{2\sigma^2} \right]

Combining this with the rescaling-invariant prior p(σ²|I) = 1/σ², which can be argued (e.g. following Jeffreys) to be the least informative possible prior for σ² in this problem, gives a combined posterior probability

p(\sigma^2|D, I, \mu) \propto \frac{1}{\sigma^{n+2}} \; \exp \left[ -\frac{\sum_i^n(x_i-\mu)^2}{2\sigma^2} \right]

This form can be recognised as that of a scaled inverse chi-squared distribution, with parameters ν = n and τ² = s² = (1/n) Σ (x_i-μ)²

Gelman et al remark that the re-appearance of this distribution, previously seen in a sampling context, may seem remarkable; but given the choice of prior the "result is not surprising".^[1]

In particular, the choice of a rescaling-invariant prior for σ² has the result that the probability for the ratio of σ² / s² has the same form (independent of the conditioning variable) when conditioned on s² as when conditioned on σ²:

p(\tfrac{\sigma^2}{s^2}|s^2) = p(\tfrac{\sigma^2}{s^2}|\sigma^2)

In the sampling-theory case, conditioned on σ², the probability distribution for (1/s²) is a scaled inverse chi-squared distribution; and so the probability distribution for σ² conditioned on s², given a scale-agnostic prior, is also a scaled inverse chi-squared distribution.

Use as an informative prior

If more is known about the possible values of σ², a distribution from the scaled inverse chi-squared family, such as Scale-inv-χ²(n₀, s₀²) can be a convenient form to represent a less uninformative prior for σ², as if from the result of n₀ previous observations (though n₀ need not necessarily be a whole number):

p(\sigma^2|I^\prime, \mu) \propto \frac{1}{\sigma^{n_0+2}} \; \exp \left[ -\frac{n_0 s_0^2}{2\sigma^2} \right]

Such a prior would lead to the posterior distribution

p(\sigma^2|D, I^\prime, \mu) \propto \frac{1}{\sigma^{n+n_0+2}} \; \exp \left[ -\frac{\sum{ns^2 + n_0 s_0^2}}{2\sigma^2} \right]

which is itself a scaled inverse chi-squared distribution. The scaled inverse chi-squared distributions are thus a convenient conjugate prior family for σ² estimation.

Estimation of variance when mean is unknown

If the mean is not known, the most uninformative prior that can be taken for it is arguably the translation-invariant prior p(μ|I) ∝ const., which gives the following joint posterior distribution for μ and σ²,

\begin{align} p(\mu, \sigma^2 \mid D, I) & \propto \frac{1}{\sigma^{n+2}} \exp \left[ -\frac{\sum_i^n(x_i-\mu)^2}{2\sigma^2} \right] \\ & = \frac{1}{\sigma^{n+2}} \exp \left[ -\frac{\sum_i^n(x_i-\bar{x})^2}{2\sigma^2} \right] \exp \left[ -\frac{\sum_i^n(\mu -\bar{x})^2}{2\sigma^2} \right] \end{align}

The marginal posterior distribution for σ² is obtained from the joint posterior distribution by integrating out over μ,

\begin{align} p(\sigma^2|D, I) \; \propto \; & \frac{1}{\sigma^{n+2}} \; \exp \left[ -\frac{\sum_i^n(x_i-\bar{x})^2}{2\sigma^2} \right] \; \int_{-\infty}^{\infty} \exp \left[ -\frac{\sum_i^n(\mu -\bar{x})^2}{2\sigma^2} \right] d\mu\\ = \; & \frac{1}{\sigma^{n+2}} \; \exp \left[ -\frac{\sum_i^n(x_i-\bar{x})^2}{2\sigma^2} \right] \; \sqrt{2 \pi \sigma^2 / n} \\ \propto \; & (\sigma^2)^{-(n+1)/2} \; \exp \left[ -\frac{(n-1)s^2}{2\sigma^2} \right] \end{align}

This is again a scaled inverse chi-squared distribution, with parameters $\scriptstyle{n-1}\;$ and $\scriptstyle{s^2 = \sum (x_i - \bar{x})^2/(n-1)}$ .

Related distributions

If $X \sim \mbox{Scale-inv-}\chi^2(\nu, \tau^2)$ then $k X \sim \mbox{Scale-inv-}\chi^2(\nu, k \tau^2)\,$
If $X \sim \mbox{inv-}\chi^2(\nu) \,$ (Inverse-chi-squared distribution) then $X \sim \mbox{Scale-inv-}\chi^2(\nu, 1/\nu) \,$
If $X \sim \mbox{Scale-inv-}\chi^2(\nu, \tau^2)$ then $\frac{X}{\tau^2 \nu} \sim \mbox{inv-}\chi^2(\nu) \,$ (Inverse-chi-squared distribution)
If $X \sim \mbox{Scale-inv-}\chi^2(\nu, \tau^2)$ then $X \sim \textrm{Inv-Gamma}\left(\frac{\nu}{2}, \frac{\nu\tau^2}{2}\right)$ (Inverse-gamma distribution)
Scaled inverse chi square distribution is a special case of type 5 Pearson distribution

References

Gelman A. et al (1995), Bayesian Data Analysis, pp 474–475; also pp 47, 480

↑ Gelman et al (1995), Bayesian Data Analysis (1st ed), p.68

Probability distributions

List

Discrete univariate with finite support	Benford Bernoulli beta-binomial binomial categorical hypergeometric Poisson binomial Rademacher discrete uniform Zipf Zipf–Mandelbrot

Discrete univariate with infinite support	beta negative binomial Borel Conway–Maxwell–Poisson discrete phase-type Delaporte extended negative binomial Gauss–Kuzmin geometric logarithmic negative binomial parabolic fractal Poisson Skellam Yule–Simon zeta

Continuous univariate supported on a bounded interval	arcsine ARGUS Balding–Nichols Bates beta beta rectangular Irwin–Hall Kumaraswamy logit-normal noncentral beta raised cosine reciprocal triangular U-quadratic uniform Wigner semicircle

Continuous univariate supported on a semi-infinite interval	Benini Benktander 1st kind Benktander 2nd kind beta prime Burr chi-squared chi Dagum Davis exponential-logarithmic Erlang exponential F folded normal Flory–Schulz Fréchet gamma gamma/Gompertz generalized inverse Gaussian Gompertz half-logistic half-normal Hotelling's T-squared hyper-Erlang hyperexponential hypoexponential inverse chi-squared scaled inverse chi-squared inverse Gaussian inverse gamma Kolmogorov Lévy log-Cauchy log-Laplace log-logistic log-normal Lomax matrix-exponential Maxwell–Boltzmann Maxwell–Jüttner Mittag-Leffler Nakagami noncentral chi-squared Pareto phase-type poly-Weibull Rayleigh relativistic Breit–Wigner Rice shifted Gompertz truncated normal type-2 Gumbel Weibull Discrete Weibull Wilks's lambda

Continuous univariate supported on the whole real line	Cauchy exponential power Fisher's z Gaussian q generalized normal generalized hyperbolic geometric stable Gumbel Holtsmark hyperbolic secant Johnson's S_U Landau Laplace asymmetric Laplace logistic noncentral t normal (Gaussian) normal-inverse Gaussian skew normal slash stable Student's t type-1 Gumbel Tracy–Widom variance-gamma Voigt

Continuous univariate with support whose type varies	generalized extreme value generalized Pareto Tukey lambda q-Gaussian q-exponential q-Weibull shifted log-logistic

Mixed continuous-discrete univariate	rectified Gaussian

Multivariate (joint)	Discrete Ewens multinomial Dirichlet-multinomial negative multinomial Continuous Dirichlet generalized Dirichlet multivariate normal multivariate stable multivariate t normal-inverse-gamma normal-gamma Matrix-valued inverse matrix gamma inverse-Wishart matrix normal matrix t matrix gamma normal-inverse-Wishart normal-Wishart Wishart

Directional	Univariate (circular) directional Circular uniform univariate von Mises wrapped normal wrapped Cauchy wrapped exponential wrapped asymmetric Laplace wrapped Lévy Bivariate (spherical) Kent Bivariate (toroidal) bivariate von Mises Multivariate von Mises–Fisher Bingham

Degenerate and singular	Degenerate Dirac delta function Singular Cantor

Families	Circular compound Poisson elliptical exponential natural exponential location-scale maximum entropy mixture Pearson Tweedie wrapped

This article is issued from Wikipedia - version of the 1/31/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.