5.2 Multivariate Nomral (wk2)

5.2.1 Overview

let rvec Z=(Z1,,Zp),ZiiidN(0,1).

then X=(X1,,Xp)=Ak×pZ+μμk×1 follows MVN.

at here, if rank(A)=p(k),AA=Σ, then XNp(0,Σ).


notation: yyNp(μμ,Σ)

rvec yy=[y1,,yp] have Multivariate Normal Distribution, if pi=1aiyu=aay has Univariate Normal Distribution, for every possible set of values for the elements in aa.

pdf for f(yy)=1(2π)p|Σ|1/2exp{12(yyμμ)Σ1(yyμμ)}.


Ellipsoid: - path of yy values yielding a constant height for the density,
i.e., all yy s.t. {(yyμμ)Σ1(yyμμ)=c2}.


Standard Normal Distribtion: - zz=(Σ1/2)1(yyμμ)Np(00,Ip),
where (Σ1/2)1 satiesfy $ = ( {{1/2}}){-1} ( {{1/2}}){-1}$.


Property of Σ: 1. symmetric Matrix 2. positive definite Matrix 3. $Cov(A y + b) = A A’ $.

※ if A is symmetric and non-singular, then A=CC, where C is lower triangular Matrix. This is called Cholesky Decomposition of A.


  1. E(X)=μ,Cov(X)=AA=Σ
  2. MX(t)=exp(tμ+12tΣt
  3. if Σ=AA is non-singular Matrix rank(A)=p
  4. Σ=Cov(X)는 symmetric, n.n.d.

이상의 X에 대해 이하는 TFAE. 1. XNp(0,Σ). 2. 3. 4. 5.

5.2.2 Spectral Decomposition

if A is symmetric, non-singular, then A=EΛE, where λi are ev (λ1λn), and eei are evec (EE=Ip). This is called Spectral Decomposition of A.

$

=

[λ10000λn]

, ; ; ; ; ; E=

$

이때 Σ=EΛE=EΛΛ1/2E=EΛEEΛ1/2E=Σ1/2Σ1/2.

Center & Axis of ellipsoids of {(yyμ)Σ1(yyμ)=c2}: * center: μμ * axis : ±cλieei



Square root Matrix:

let symmetric non-negative Matrix Ap×p. the square root matrix of A is defined as A1/2=EΛ1/2E, where

$ ^{1/2} =

[λ10000λn]

$



Negative Square Root Matrix:

Let A be of full rank and all of its λi are positive, in addition to symmetry. A1/2=EΛ1/2E, where

$ ^{-1/2} =

[1λ100001λn]

$



Generalized Inverse:

let A be a non-negative M. if λ1>λ2>>λr>0=λr+1==λp, i.e., not full rank, then the Moore-Penrose generalized inverse of A is given by

$

A^{-} =

e_1 e_1 ’ + + e_r e_r ’

$

where

$ ^{-} =

[1λ1001λn0000]

$



Marginal Distribtion:

$ yyNp(μμ,Σ)yiN(μi,σii),i=1,,p⟸̸ $

5.2.3 Properties of MVN

  1. linear combination of the components of \pmb y are normally distributed.
  2. any subset of \pmb y have MVN.
  3. conditional distribution of the components of \pmb y are MVN:

$ y N_p(, ) a ’ y N( a ’ , a ’ a ) $

$ y N_p(, ) , ; ;

A_{n times p} =

\begin{bmatrix} a_{11} & \cdots & a_{1p} \\ \vdots & \ddots & \vdots \\ a_{n1} & \cdots & a_{np} \end{bmatrix}

A y N_n(A , A A’) $ 즉슨 dimension 변화

if $ y N_p(, )$, and cvec \pmb d, then $ y + d N_p(+ d , )$.

  1. If we partition y, μ, S ! ! as follows

Let 1 11 2 ~ ( , ) p y y N y μ é ù ê ú = ê ú S ê ú ë û !” ! ! ! with

5.2.4 \Chi^2 distribution

if \pmb z \sim N_p ( \pmb 0 , I_p ), then \pmb z ' \pmb z = \sum_{i=1}^p z_i^2 \sim \Chi_p^2.

if $ y N_p(, )$, then $(y - )’ ^{-1} (y - ) _p^2 $

the N_p(\pmb \mu , \Sigma) distribution assigns probability 1-\alpha to the solid ellipsoid \left \{ \pmb y : (\pmb y - \pmb \mu)' \Sigma^{-1} (\pmb y - \pmb \mu) \le \chi_p^2 (\alpha) \right \}, where \chi_p^2 (\alpha) denotes upper (100 \ast \alpha) th percentile of the \chi_p^2 distribution.

5.2.5 Linear Combination of Random Vectors

5.2.6 Multivariate Normal Likelihood

5.2.7 Sampling Distribtion of \bar {\pmb y}, S

let rvec $ y_1, y_n N_p(, )$.

\bar {\pmb y} \sim N_p (\pmb \mu , \dfrac{1}{n} \Sigma)

(n-1) S $ Wishart distribution, with df=n-1 * S is random Matrix, e.g., Wishart is distribution of rM.

\bar {\pmb y} \perp S.

5.2.7.0.1 Wishart Distribtion

\dfrac {\sum (x_i - \barx )^2}{\sigma^2} = \dfrac {S^2} {\dfrac{\sigma^2}{n-1}} \sim \chi_{n-1}^2, i.e., \sum (x_i - \barx )^2 = (n-1)S^2 \sim \sigma^2 \ast \chi_{n-1}^

for let rvec $ y_1, y_n N_p(, )$,

$ \begin{align*} \sum_{i=1}^n(\pmb y - \pmb \mu)(\pmb y - \pmb \mu)' &\sim W_p (n, \Sigma) \\ \\ (n-1)S^2 = \sum_{i=1}^n(\pmb y - \bar {\pmb y} )(\pmb y - \bar {\pmb y} )' &\sim W_p (n-1, \Sigma) \end{align*} $

if A \sim W_p (n, \Sigma), B \sim W_p (m, \Sigma), and A \perp B: A+B \sim W_p (n+m, \Sigma)

if A \sim W_p (n, \Sigma), then CAC' \sim W_p (n, C \Sigma C')

if A \sim W_p (n-1, \Sigma), f(A), where gamma function.



5.2.7.0.2 MV t-Distribtion

※ univariate t-Distribtion t=\tfrac{\tfrac{U}{\sigma}}{\sqrt{\tfrac{V}{nu}}} \sim t_{\nu}, where U \sim N(0, \sigma^2), V \sim \chi_{\nu}^2, and U \perp V.

let $ y = (y_1, , y_n)’ N_p(, )$, and V \sim \chi_{\nu}^2, and \pmb y \perp V.

assume rvec \pmb t = (t_1 , \cdots, t_p)',t_i = \tfrac {\tfrac{y_i - \mu_I}{\sigma_i}{\sqrt{V/\nu}}, i=1, \cdots, p * Note that each t_i \sim t.

at here, joint distribution of \pmb t is called MV t-distribution, with df=\nu and matrix parameter \Sigma.

denote this distribution by



5.2.7.0.3 Dirichlet Distribution

※ is MV generalization of BETA.

let $ y D_p(1 , {p+1})$ * parameters: \{\nu_i, i=1, \cdots, p+1\} * pdf: f(y) = _{i=1}^p y_i^{v_i - 1}$

?????????????????????????????????????????????????



5.2.7.0.4 CLT

let

$ y_1 , , y_n {} , < $. then

$ \begin{align*} \sqrt {n} (\hat {\pmb y} - \pmb \mu) &\overset {d} {\rightarrow} N_p (\pmb 0 , \Sigma) \\ n (\hat {\pmb y} - \pmb \mu)' S^{-1} (\hat {\pmb y} - \pmb \mu) &\overset {d} {\rightarrow} \chi_p^2 \end{align*} $




5.2.8 Assessing Normality

5.2.8.0.1 1. Univariate Marginal Distribtion
5.2.8.0.1.1 a. Q-Q Plot

※ Sample quantile vs. quantile of N distribution

let order statitics, or sample quantiles x_{(1)} \le \cdots \le x_{(n)}.

the proportion of sample below x_{(j)} is approximated by \tfrac{j-\tfrac{1}{2}}{n}.

the quantiles q_{(j)} for std. N are defined as

$ P(z q_{j)}) = {-}^{q{(j)}} ( - z^2 ) dz $

if the data arise from a N population, then (\sigma \ast q_{(j)} + \mu \congruent x_{(j)}.

Similarly, the pairs (q_{(j)}, x_{(j)}) will be linearly related.

Proceeds: 1. get x_{(1)} \le \cdots \le x_{(n)} from original obs. 2. calculate probability values \tfrac{j-1/2}{n}, \; \; j= 1, \cdots, n 3. calculate standard normal quantities q_{(1)}, \cdots, q_{(n)} 4. plot the pairs of observations $(q_{(1)}, x_{(1)}), , (q_{(n)}, x_{(n)})


Checking the straightness of Q-Q plot: * using corr coef * Hypothesis tesiting: H_0: \rho=0, $T= t_{n-2}



5.2.8.0.1.2 b. others
    1. Shapiro-Wilks Test:

Test of correlation coefficient b/w x_{(j)}, r_{(j)}. r_{(j)} is function of the expected value of standard normal order statistics, and their Cov.


    1. Kolmogorov-Smirnov Test

Compare cdf’s:

If the data arise from a normal population, the differences are small.

$ T = _x F(x) - S(x) $

where cdf F(x), empirical cdf S(x).


    1. Skewness Test

skewness \sqrt{b_1} = \tfrac{\sqrt{n} \sum_{i=1}^n (x_i - \bar x)^3} {\left[ \sum_{i=1}^n (x_i - \bar x)^2 \right]^{\tfrac{3}{2}}}

When the population is normal, the skewness = 0.


    1. Kurtosis Test:

kurtosis {b_2} = \tfrac{n \sum_{i=1}^n (x_i - \bar x)^4} {\left[ \sum_{i=1}^n (x_i - \bar x)^2 \right]^{3}}

When the population is normal, the kurtosis is 3.


    1. Lin and Mudholkar (1980):

$

Z = ^{-1}(r) = ( ) $

where r is the sample corr of n pair (x_i , q_i), \; \; i=1, \cdots, n with q_i = \tfrac {1}{n} \left( \sum_{i \not = j} x_j^2 - \tfrac{1}{n-1} \left( \sum_{i \not = j} x_j\right)^2 \right)^{\tfrac{1}{3}}.

if the data arise from a normal population, Z \sim N(0, \tfrac 3 n).

5.2.8.0.2 2. Bivariate Normality

※ If the data are generated from a multivariate normal, each bivariate distribution would be normal.

    1. Scatter Plot

the contours of bivariate normal density are ellipses. The pattern of the scatter plot must be near elliptical.


    1. Squared Generalized Distances

\pmb y \sim N_p (\pmb \mu, \Sigma) \; \; \; \Longrightarrow \; \; \; (\pmb y - \pmb \mu)' \Sigma^{-1} (\pmb y - \pmb \mu) \sim \chi_p^2.

it means, for bivariate cases, Squared Generalized Distances d_j^2 = (\pmb x_j - \hat {\pmb x})' S^{-1} (\pmb x_j - \hat {\pmb x}) \sim \chi_2^2.


    1. Chi2 Plot (Gamma Plot)

d_1^2 , \cdots, d_n^2 should behave like \chi_2^2 rv. 1. order the squared distances d_{(1)}^2 \le \cdots \le d_{(n)}^2 2. calculate the probabilitt values \tfrac{j-1/2}{n}, j=1,\cdots, n 3. Calculate quantiles of \chi_2^2 distribution q_{(1)}, \cdots, q_{(n)}. 4. Plot the pairs (q_{(j)}, d_{(j)}^2 ), \; \; j=1, \cdots, n where q_{(j)} = \chi_2^2 \left( \tfrac{j-1/2}{n} \right)

The plot should resemble a straight line through the origin having slope 1.



5.2.8.0.3 2. Multivariate Normality

Practically, it is usually sufficient to investigate the univariate and bivariate distributions.

Chi-square plot is still useful. When the parent population is multivariate normal, and both n and n-p are greater than 25 or 30, the squared generalized distance d_{1}^2 \le \cdots \le d_{n}^2 should behave like \chi_p^2.




5.2.9 Power Transformation

$ x^=

\begin{cases} \tfrac{1}{x}, & \lambda = -1 \tag{\text{Reciprocal Transformation}}\\ \tfrac{1}{\sqrt{x}}, & \lambda = -\tfrac{1}{2} \\ \ln(x), & \lambda = 0 \\ \sqrt{x}, & \lambda = \tfrac{1}{2} \\ x, & \lambda = 1 \tag{\text{No Transformation}} \end{cases}

$

Examine Q-Q plot to see whether the normal assumption is satisfactory after power transformation.



5.2.9.0.1 Power Transformation

$ x^() =

\begin{cases} \tfrac{x^\lambda - 1}{\lambda}, & \lambda \not = 0 \\ \ln(x), & \lambda = 0 \end{cases}

$

at here, find \lambda that maximizes

$

l() = - ln+ () _{j=1}^n x_j

$

where \hat{x_j}^{(\lambda)} = \tfrac{1}{n} \sum_{j=1}^n x_j^{(\lambda)}

x^{()} is the most feasible values for normal distribution, but not guaranteed to follow normal distribution. * Transformation (Box-Cox) usually improves the approximation to normality. * Trial-and-error calculations may be necessary to find \lambda that maximizes l(\lambda) * Usually, change \lambda values from -1 to 1 with increment 0.1. * Examine Q-Q plot after the Box-Cox transformation.

5.2.9.0.2 nqplot, contour plot, cqplot, cqplot and box-cox plot