5.2 Multivariate Nomral (wk2)
5.2.1 Overview
let rvec Z=(Z1,⋯,Zp)′,Ziiid∼N(0,1).
then X=(X1,⋯,Xp)′=Ak×pZ+μμk×1 follows MVN.
at here, if rank(A)=p(≤k),AA′=Σ, then X∼Np(0,Σ).
notation: yy∼Np(μμ,Σ)
rvec yy′=[y1,⋯,yp] have Multivariate Normal Distribution, if ∑pi=1aiyu=aa′y has Univariate Normal Distribution, for every possible set of values for the elements in aa.
pdf for f(yy)=1(2π)p|Σ|1/2exp{−12(yy−μμ)′Σ−1(yy−μμ)}.
Ellipsoid:
- path of yy values yielding a constant height for the density,
i.e., all yy s.t. {(yy−μμ)′Σ−1(yy−μμ)=c2}.
Standard Normal Distribtion:
- zz=(Σ1/2)−1(yy−μμ)∼Np(00,Ip),
where (Σ1/2)−1 satiesfy $ = ( {{1/2}}){-1} ( {{1/2}}){-1}$.
Property of Σ: 1. symmetric Matrix 2. positive definite Matrix 3. $Cov(A y + b) = A A’ $.
※ if A is symmetric and non-singular, then A=CC′, where C is lower triangular Matrix. This is called Cholesky Decomposition of A.
- E(X)=μ,Cov(X)=AA′=Σ
- MX(t)=exp(t′μ+12t′Σt
- if Σ=AA′ is non-singular Matrix ⟺rank(A)=p
- Σ=Cov(X)는 symmetric, n.n.d.
이상의 X에 대해 이하는 TFAE. 1. X∼Np(0,Σ). 2. 3. 4. 5.
5.2.2 Spectral Decomposition
if A is symmetric, non-singular, then A=EΛE′, where λi are ev (λ1≥⋯≥λn), and eei are evec (E′E=Ip). This is called Spectral Decomposition of A.
$
=
[λ100⋱00λn], ; ; ; ; ; E=
$
이때 Σ=EΛE′=EΛΛ1/2E′=EΛE′EΛ1/2E′=Σ1/2Σ1/2.
Center & Axis of ellipsoids of {(yy−μ)′Σ−1(yy−μ)=c2}: * center: μμ * axis : ±c√λieei
Square root Matrix:
let symmetric non-negative Matrix Ap×p. the square root matrix of A is defined as A1/2=EΛ1/2E′, where
$ ^{1/2} =
[√λ100⋱00√λn]$
Negative Square Root Matrix:
Let A be of full rank and all of its λi are positive, in addition to symmetry. A−1/2=EΛ−1/2E′, where
$ ^{-1/2} =
[1√λ100⋱001√λn]$
Generalized Inverse:
let A be a non-negative M. if λ1>λ2>⋯>λr>0=λr+1=⋯=λp, i.e., not full rank, then the Moore-Penrose generalized inverse of A is given by
$
A^{-} =
e_1 e_1 ’ + + e_r e_r ’
$
where
$ ^{-} =
[1λ100⋱1λn0⋱000]$
Marginal Distribtion:
$ yy∼Np(μμ,Σ)⟹yi∼N(μi,σii),i=1,⋯,p⟸̸ $
5.2.3 Properties of MVN
- linear combination of the components of \pmb y are normally distributed.
- any subset of \pmb y have MVN.
- conditional distribution of the components of \pmb y are MVN:
$ y N_p(, ) a ’ y N( a ’ , a ’ a ) $
$ y N_p(, ) , ; ;
A_{n times p} =
\begin{bmatrix} a_{11} & \cdots & a_{1p} \\ \vdots & \ddots & \vdots \\ a_{n1} & \cdots & a_{np} \end{bmatrix}A y N_n(A , A A’) $ 즉슨 dimension 변화
if $ y N_p(, )$, and cvec \pmb d, then $ y + d N_p(+ d , )$.
- If we partition y, μ, S ! ! as follows
Let 1 11 2 ~ ( , ) p y y N y μ é ù ê ú = ê ú S ê ú ë û !” ! ! ! with
5.2.4 \Chi^2 distribution
if \pmb z \sim N_p ( \pmb 0 , I_p ), then \pmb z ' \pmb z = \sum_{i=1}^p z_i^2 \sim \Chi_p^2.
if $ y N_p(, )$, then $(y - )’ ^{-1} (y - ) _p^2 $
the N_p(\pmb \mu , \Sigma) distribution assigns probability 1-\alpha to the solid ellipsoid \left \{ \pmb y : (\pmb y - \pmb \mu)' \Sigma^{-1} (\pmb y - \pmb \mu) \le \chi_p^2 (\alpha) \right \}, where \chi_p^2 (\alpha) denotes upper (100 \ast \alpha) th percentile of the \chi_p^2 distribution.
5.2.7 Sampling Distribtion of \bar {\pmb y}, S
let rvec $ y_1, y_n N_p(, )$.
\bar {\pmb y} \sim N_p (\pmb \mu , \dfrac{1}{n} \Sigma)
(n-1) S $ Wishart distribution, with df=n-1 * S is random Matrix, e.g., Wishart is distribution of rM.
\bar {\pmb y} \perp S.
5.2.7.0.1 Wishart Distribtion
※ \dfrac {\sum (x_i - \barx )^2}{\sigma^2} = \dfrac {S^2} {\dfrac{\sigma^2}{n-1}} \sim \chi_{n-1}^2, i.e., \sum (x_i - \barx )^2 = (n-1)S^2 \sim \sigma^2 \ast \chi_{n-1}^
for let rvec $ y_1, y_n N_p(, )$,
$ \begin{align*} \sum_{i=1}^n(\pmb y - \pmb \mu)(\pmb y - \pmb \mu)' &\sim W_p (n, \Sigma) \\ \\ (n-1)S^2 = \sum_{i=1}^n(\pmb y - \bar {\pmb y} )(\pmb y - \bar {\pmb y} )' &\sim W_p (n-1, \Sigma) \end{align*} $
if A \sim W_p (n, \Sigma), B \sim W_p (m, \Sigma), and A \perp B: A+B \sim W_p (n+m, \Sigma)
if A \sim W_p (n, \Sigma), then CAC' \sim W_p (n, C \Sigma C')
if A \sim W_p (n-1, \Sigma), f(A), where gamma function.
5.2.7.0.2 MV t-Distribtion
※ univariate t-Distribtion t=\tfrac{\tfrac{U}{\sigma}}{\sqrt{\tfrac{V}{nu}}} \sim t_{\nu}, where U \sim N(0, \sigma^2), V \sim \chi_{\nu}^2, and U \perp V.
let $ y = (y_1, , y_n)’ N_p(, )$, and V \sim \chi_{\nu}^2, and \pmb y \perp V.
assume rvec \pmb t = (t_1 , \cdots, t_p)',t_i = \tfrac {\tfrac{y_i - \mu_I}{\sigma_i}{\sqrt{V/\nu}}, i=1, \cdots, p * Note that each t_i \sim t.
at here, joint distribution of \pmb t is called MV t-distribution, with df=\nu and matrix parameter \Sigma.
denote this distribution by
5.2.8 Assessing Normality
5.2.8.0.1 1. Univariate Marginal Distribtion
5.2.8.0.1.1 a. Q-Q Plot
※ Sample quantile vs. quantile of N distribution
let order statitics, or sample quantiles x_{(1)} \le \cdots \le x_{(n)}.
the proportion of sample below x_{(j)} is approximated by \tfrac{j-\tfrac{1}{2}}{n}.
the quantiles q_{(j)} for std. N are defined as
$ P(z q_{j)}) = {-}^{q{(j)}} ( - z^2 ) dz $
if the data arise from a N population, then (\sigma \ast q_{(j)} + \mu \congruent x_{(j)}.
Similarly, the pairs (q_{(j)}, x_{(j)}) will be linearly related.
Proceeds: 1. get x_{(1)} \le \cdots \le x_{(n)} from original obs. 2. calculate probability values \tfrac{j-1/2}{n}, \; \; j= 1, \cdots, n 3. calculate standard normal quantities q_{(1)}, \cdots, q_{(n)} 4. plot the pairs of observations $(q_{(1)}, x_{(1)}), , (q_{(n)}, x_{(n)})
Checking the straightness of Q-Q plot: * using corr coef * Hypothesis tesiting: H_0: \rho=0, $T= t_{n-2}
5.2.8.0.1.2 b. others
- Shapiro-Wilks Test:
Test of correlation coefficient b/w x_{(j)}, r_{(j)}. r_{(j)} is function of the expected value of standard normal order statistics, and their Cov.
- Kolmogorov-Smirnov Test
Compare cdf’s:
If the data arise from a normal population, the differences are small.
$ T = _x F(x) - S(x) $
where cdf F(x), empirical cdf S(x).
- Skewness Test
skewness \sqrt{b_1} = \tfrac{\sqrt{n} \sum_{i=1}^n (x_i - \bar x)^3} {\left[ \sum_{i=1}^n (x_i - \bar x)^2 \right]^{\tfrac{3}{2}}}
When the population is normal, the skewness = 0.
- Kurtosis Test:
kurtosis {b_2} = \tfrac{n \sum_{i=1}^n (x_i - \bar x)^4} {\left[ \sum_{i=1}^n (x_i - \bar x)^2 \right]^{3}}
When the population is normal, the kurtosis is 3.
- Lin and Mudholkar (1980):
$
Z = ^{-1}(r) = ( ) $
where r is the sample corr of n pair (x_i , q_i), \; \; i=1, \cdots, n with q_i = \tfrac {1}{n} \left( \sum_{i \not = j} x_j^2 - \tfrac{1}{n-1} \left( \sum_{i \not = j} x_j\right)^2 \right)^{\tfrac{1}{3}}.
if the data arise from a normal population, Z \sim N(0, \tfrac 3 n).
5.2.8.0.2 2. Bivariate Normality
※ If the data are generated from a multivariate normal, each bivariate distribution would be normal.
- Scatter Plot
the contours of bivariate normal density are ellipses. The pattern of the scatter plot must be near elliptical.
- Squared Generalized Distances
※ \pmb y \sim N_p (\pmb \mu, \Sigma) \; \; \; \Longrightarrow \; \; \; (\pmb y - \pmb \mu)' \Sigma^{-1} (\pmb y - \pmb \mu) \sim \chi_p^2.
it means, for bivariate cases, Squared Generalized Distances d_j^2 = (\pmb x_j - \hat {\pmb x})' S^{-1} (\pmb x_j - \hat {\pmb x}) \sim \chi_2^2.
- Chi2 Plot (Gamma Plot)
d_1^2 , \cdots, d_n^2 should behave like \chi_2^2 rv. 1. order the squared distances d_{(1)}^2 \le \cdots \le d_{(n)}^2 2. calculate the probabilitt values \tfrac{j-1/2}{n}, j=1,\cdots, n 3. Calculate quantiles of \chi_2^2 distribution q_{(1)}, \cdots, q_{(n)}. 4. Plot the pairs (q_{(j)}, d_{(j)}^2 ), \; \; j=1, \cdots, n where q_{(j)} = \chi_2^2 \left( \tfrac{j-1/2}{n} \right)
The plot should resemble a straight line through the origin having slope 1.
5.2.8.0.3 2. Multivariate Normality
Practically, it is usually sufficient to investigate the univariate and bivariate distributions.
Chi-square plot is still useful. When the parent population is multivariate normal, and both n and n-p are greater than 25 or 30, the squared generalized distance d_{1}^2 \le \cdots \le d_{n}^2 should behave like \chi_p^2.
5.2.9 Power Transformation
$ x^=
\begin{cases} \tfrac{1}{x}, & \lambda = -1 \tag{\text{Reciprocal Transformation}}\\ \tfrac{1}{\sqrt{x}}, & \lambda = -\tfrac{1}{2} \\ \ln(x), & \lambda = 0 \\ \sqrt{x}, & \lambda = \tfrac{1}{2} \\ x, & \lambda = 1 \tag{\text{No Transformation}} \end{cases}$
Examine Q-Q plot to see whether the normal assumption is satisfactory after power transformation.
5.2.9.0.1 Power Transformation
$ x^() =
\begin{cases} \tfrac{x^\lambda - 1}{\lambda}, & \lambda \not = 0 \\ \ln(x), & \lambda = 0 \end{cases}$
at here, find \lambda that maximizes
$
l() = - ln+ () _{j=1}^n x_j
$
where \hat{x_j}^{(\lambda)} = \tfrac{1}{n} \sum_{j=1}^n x_j^{(\lambda)}
x^{()} is the most feasible values for normal distribution, but not guaranteed to follow normal distribution. * Transformation (Box-Cox) usually improves the approximation to normality. * Trial-and-error calculations may be necessary to find \lambda that maximizes l(\lambda) * Usually, change \lambda values from -1 to 1 with increment 0.1. * Examine Q-Q plot after the Box-Cox transformation.