5.2 Multivariate Nomral (wk2)
5.2.1 Overview
let rvec \(Z=(Z_1 , \cdots, Z_p)', Z_i \overset {iid}{\sim} N(0,1)\).
then \(X=(X_1 , \cdots, X_p)' = A_{k \times p} Z + \pmb \mu_{k \times 1}\) follows MVN.
at here, if \(rank(A)=p(\le k), AA' = \Sigma\), then \(X \sim N_p (0, \Sigma)\).
notation: \(\pmb y \sim N_p (\pmb \mu , \Sigma)\)
rvec \(\pmb y ' = [y_1 , \cdots , y_p ]\) have Multivariate Normal Distribution, if \(\sum_{i=1}^p a_i y_u = \pmb a' y\) has Univariate Normal Distribution, for every possible set of values for the elements in \(\pmb a\).
pdf for \(f(\pmb y) = \dfrac{1}{(2\pi)^p {\vert \Sigma \vert}^{1/2}} \exp \left\{ -\dfrac{1}{2} (\pmb y - \pmb \mu)' \Sigma^{-1} (\pmb y - \pmb \mu) \right\}\).
Ellipsoid:
- path of \(\pmb y\) values yielding a constant height for the density,
i.e., all \(\pmb y\) s.t. \(\{ (\pmb y - \pmb \mu)' \Sigma^{-1} (\pmb y - \pmb \mu)=c^2 \}\).
Standard Normal Distribtion:
- \(\pmb z = \left( {\Sigma^{1/2}}\right)^{-1} (\pmb y -\pmb \mu) \sim N_p (\pmb 0, I_p)\),
where \(\left( {\Sigma^{1/2}}\right)^{-1}\) satiesfy $ = ( {{1/2}}){-1} ( {{1/2}}){-1}$.
Property of \(\Sigma\): 1. symmetric Matrix 2. positive definite Matrix 3. $Cov(A y + b) = A A’ $.
※ if \(A\) is symmetric and non-singular, then \(A=CC'\), where \(C\) is lower triangular Matrix. This is called Cholesky Decomposition of \(A\).
- \(E(X)=\mu, Cov(X)=AA' = \Sigma\)
- \(M_X (t) = \exp (t' \mu + \dfrac {1}{2} t' \Sigma t\)
- if \(\Sigma=AA'\) is non-singular Matrix \(\iff rank(A)=p\)
- \(\Sigma = Cov(X)\)는 symmetric, n.n.d.
이상의 \(X\)에 대해 이하는 TFAE. 1. \(X \sim N_p (0, \Sigma)\). 2. 3. 4. 5.
5.2.2 Spectral Decomposition
if \(A\) is symmetric, non-singular, then \(A=E \Lambda E'\), where \(\lambda_i\) are ev (\(\lambda_1 \ge \cdots \ge \lambda_n\)), and \(\pmb e_i\) are evec (\(E'E = I_p)\). This is called Spectral Decomposition of \(A\).
$
=
\[\begin{bmatrix} \lambda_1 & & \pmb 0 \\ & \ddots & \\ \pmb 0 & & \lambda_n \end{bmatrix}\], ; ; ; ; ; E=
$
이때 \(\Sigma = E \Lambda E' = E \Lambda \Lambda^{1/2} E' = E \Lambda E' E \Lambda^{1/2} E' = \Sigma^{1/2} \Sigma^{1/2}\).
Center & Axis of ellipsoids of \(\{ (\pmb y - \mu)' \Sigma^{-1} (\pmb y - \mu)=c^2 \}\): * center: \(\pmb \mu\) * axis : \(\pm c \sqrt{\lambda_i \pmb e_i}\)
Square root Matrix:
let symmetric non-negative Matrix \(A_{p \times p}\). the square root matrix of \(A\) is defined as \(A^{1/2} = E \Lambda^{1/2} E'\), where
$ ^{1/2} =
\[\begin{bmatrix} \sqrt{\lambda_1} & & \pmb 0 \\ & \ddots & \\ \pmb 0 & & \sqrt{\lambda_n } \end{bmatrix}\]$
Negative Square Root Matrix:
Let \(A\) be of full rank and all of its \(\lambda_i\) are positive, in addition to symmetry. \(A^{-1/2} = E \Lambda^{-1/2} E'\), where
$ ^{-1/2} =
\[\begin{bmatrix} \dfrac{1}{\sqrt{\lambda_1}} & & \pmb 0 \\ & \ddots & \\ \pmb 0 & & \dfrac{1}{\sqrt{\lambda_n }} \end{bmatrix}\]$
Generalized Inverse:
let \(A\) be a non-negative M. if \(\lambda_1> \lambda_2 > \cdots > \lambda_r > 0 = \lambda_{r+1} = \cdots = \lambda_{p}\), i.e., not full rank, then the Moore-Penrose generalized inverse of \(A\) is given by
$
A^{-} =
e_1 e_1 ’ + + e_r e_r ’
$
where
$ ^{-} =
\[\begin{bmatrix} \dfrac{1}{\lambda_1} & & & & \pmb 0 \\ & \ddots & & & \\ & & \dfrac{1}{\lambda_n } & & & \\ & & & 0 & & \\ & & & & \ddots & \\ \pmb 0 & & & & & 0 \\ \end{bmatrix}\]$
Marginal Distribtion:
$ \[\begin{align*} \pmb y \sim N_p (\pmb \mu , \Sigma) &\Longrightarrow y_i \sim N(\mu_i, \sigma^{ii}), \; \; \; i= 1, \cdots, p \\ &\not \Longleftarrow \end{align*}\] $
5.2.3 Properties of MVN
- linear combination of the components of \(\pmb y\) are normally distributed.
- any subset of \(\pmb y\) have MVN.
- conditional distribution of the components of \(\pmb y\) are MVN:
$ y N_p(, ) a ’ y N( a ’ , a ’ a ) $
$ y N_p(, ) , ; ;
A_{n times p} =
\[\begin{bmatrix} a_{11} & \cdots & a_{1p} \\ \vdots & \ddots & \vdots \\ a_{n1} & \cdots & a_{np} \end{bmatrix}\]A y N_n(A , A A’) $ 즉슨 dimension 변화
if $ y N_p(, )$, and cvec \(\pmb d\), then $ y + d N_p(+ d , )$.
- If we partition y, μ, S ! ! as follows
Let 1 11 2 ~ ( , ) p y y N y μ é ù ê ú = ê ú S ê ú ë û !” ! ! ! with
5.2.4 \(\Chi^2\) distribution
if \(\pmb z \sim N_p ( \pmb 0 , I_p )\), then \(\pmb z ' \pmb z = \sum_{i=1}^p z_i^2 \sim \Chi_p^2\).
if $ y N_p(, )$, then $(y - )’ ^{-1} (y - ) _p^2 $
the \(N_p(\pmb \mu , \Sigma)\) distribution assigns probability \(1-\alpha\) to the solid ellipsoid \(\left \{ \pmb y : (\pmb y - \pmb \mu)' \Sigma^{-1} (\pmb y - \pmb \mu) \le \chi_p^2 (\alpha) \right \}\), where \(\chi_p^2 (\alpha)\) denotes upper \((100 \ast \alpha)\) th percentile of the \(\chi_p^2\) distribution.
5.2.7 Sampling Distribtion of \(\bar {\pmb y}, S\)
let rvec $ y_1, y_n N_p(, )$.
\(\bar {\pmb y} \sim N_p (\pmb \mu , \dfrac{1}{n} \Sigma)\)
(n-1) S $ Wishart distribution, with \(df=n-1\) * \(S\) is random Matrix, e.g., Wishart is distribution of rM.
\(\bar {\pmb y} \perp S\).
5.2.7.0.1 Wishart Distribtion
※ \(\dfrac {\sum (x_i - \barx )^2}{\sigma^2} = \dfrac {S^2} {\dfrac{\sigma^2}{n-1}} \sim \chi_{n-1}^2\), i.e., \(\sum (x_i - \barx )^2 = (n-1)S^2 \sim \sigma^2 \ast \chi_{n-1}^\)
for let rvec $ y_1, y_n N_p(, )$,
$ \[\begin{align*} \sum_{i=1}^n(\pmb y - \pmb \mu)(\pmb y - \pmb \mu)' &\sim W_p (n, \Sigma) \\ \\ (n-1)S^2 = \sum_{i=1}^n(\pmb y - \bar {\pmb y} )(\pmb y - \bar {\pmb y} )' &\sim W_p (n-1, \Sigma) \end{align*}\] $
if \(A \sim W_p (n, \Sigma), B \sim W_p (m, \Sigma)\), and \(A \perp B:\) \(A+B \sim W_p (n+m, \Sigma)\)
if \(A \sim W_p (n, \Sigma)\), then \(CAC' \sim W_p (n, C \Sigma C')\)
if \(A \sim W_p (n-1, \Sigma)\), \(f(A)\), where gamma function.
5.2.7.0.2 MV t-Distribtion
※ univariate t-Distribtion \(t=\tfrac{\tfrac{U}{\sigma}}{\sqrt{\tfrac{V}{nu}}} \sim t_{\nu}\), where \(U \sim N(0, \sigma^2), V \sim \chi_{\nu}^2\), and \(U \perp V\).
let $ y = (y_1, , y_n)’ N_p(, )$, and \(V \sim \chi_{\nu}^2\), and \(\pmb y \perp V\).
assume rvec \(\pmb t = (t_1 , \cdots, t_p)'\),\(t_i = \tfrac {\tfrac{y_i - \mu_I}{\sigma_i}{\sqrt{V/\nu}}, i=1, \cdots, p\) * Note that each \(t_i \sim t\).
at here, joint distribution of \(\pmb t\) is called MV t-distribution, with \(df=\nu\) and matrix parameter \(\Sigma\).
denote this distribution by
5.2.8 Assessing Normality
5.2.8.0.1 1. Univariate Marginal Distribtion
5.2.8.0.1.1 a. Q-Q Plot
※ Sample quantile vs. quantile of N distribution
let order statitics, or sample quantiles \(x_{(1)} \le \cdots \le x_{(n)}\).
the proportion of sample below \(x_{(j)}\) is approximated by \(\tfrac{j-\tfrac{1}{2}}{n}\).
the quantiles \(q_{(j)}\) for std. N are defined as
$ P(z q_{j)}) = {-}^{q{(j)}} ( - z^2 ) dz $
if the data arise from a N population, then \((\sigma \ast q_{(j)} + \mu \congruent x_{(j)}\).
Similarly, the pairs \((q_{(j)}, x_{(j)})\) will be linearly related.
Proceeds: 1. get \(x_{(1)} \le \cdots \le x_{(n)}\) from original obs. 2. calculate probability values \(\tfrac{j-1/2}{n}, \; \; j= 1, \cdots, n\) 3. calculate standard normal quantities \(q_{(1)}, \cdots, q_{(n)}\) 4. plot the pairs of observations $(q_{(1)}, x_{(1)}), , \((q_{(n)}, x_{(n)})\)
Checking the straightness of Q-Q plot: * using corr coef * Hypothesis tesiting: \(H_0: \rho=0\), $T= t_{n-2}
5.2.8.0.1.2 b. others
- Shapiro-Wilks Test:
Test of correlation coefficient b/w \(x_{(j)}, r_{(j)}\). \(r_{(j)}\) is function of the expected value of standard normal order statistics, and their \(Cov\).
- Kolmogorov-Smirnov Test
Compare cdf’s:
If the data arise from a normal population, the differences are small.
$ T = _x F(x) - S(x) $
where cdf \(F(x)\), empirical cdf \(S(x)\).
- Skewness Test
skewness \(\sqrt{b_1} = \tfrac{\sqrt{n} \sum_{i=1}^n (x_i - \bar x)^3} {\left[ \sum_{i=1}^n (x_i - \bar x)^2 \right]^{\tfrac{3}{2}}}\)
When the population is normal, the skewness = 0.
- Kurtosis Test:
kurtosis \({b_2} = \tfrac{n \sum_{i=1}^n (x_i - \bar x)^4} {\left[ \sum_{i=1}^n (x_i - \bar x)^2 \right]^{3}}\)
When the population is normal, the kurtosis is 3.
- Lin and Mudholkar (1980):
$
Z = ^{-1}(r) = ( ) $
where \(r\) is the sample \(corr\) of \(n\) pair \((x_i , q_i), \; \; i=1, \cdots, n\) with \(q_i = \tfrac {1}{n} \left( \sum_{i \not = j} x_j^2 - \tfrac{1}{n-1} \left( \sum_{i \not = j} x_j\right)^2 \right)^{\tfrac{1}{3}}\).
if the data arise from a normal population, \(Z \sim N(0, \tfrac 3 n)\).
5.2.8.0.2 2. Bivariate Normality
※ If the data are generated from a multivariate normal, each bivariate distribution would be normal.
- Scatter Plot
the contours of bivariate normal density are ellipses. The pattern of the scatter plot must be near elliptical.
- Squared Generalized Distances
※ \(\pmb y \sim N_p (\pmb \mu, \Sigma) \; \; \; \Longrightarrow \; \; \; (\pmb y - \pmb \mu)' \Sigma^{-1} (\pmb y - \pmb \mu) \sim \chi_p^2\).
it means, for bivariate cases, Squared Generalized Distances \(d_j^2 = (\pmb x_j - \hat {\pmb x})' S^{-1} (\pmb x_j - \hat {\pmb x}) \sim \chi_2^2\).
- Chi2 Plot (Gamma Plot)
\(d_1^2 , \cdots, d_n^2\) should behave like \(\chi_2^2\) rv. 1. order the squared distances \(d_{(1)}^2 \le \cdots \le d_{(n)}^2\) 2. calculate the probabilitt values \(\tfrac{j-1/2}{n}\), \(j=1,\cdots, n\) 3. Calculate quantiles of \(\chi_2^2\) distribution \(q_{(1)}, \cdots, q_{(n)}\). 4. Plot the pairs \((q_{(j)}, d_{(j)}^2 ), \; \; j=1, \cdots, n\) where \(q_{(j)} = \chi_2^2 \left( \tfrac{j-1/2}{n} \right)\)
The plot should resemble a straight line through the origin having slope 1.
5.2.8.0.3 2. Multivariate Normality
Practically, it is usually sufficient to investigate the univariate and bivariate distributions.
Chi-square plot is still useful. When the parent population is multivariate normal, and both \(n\) and \(n-p\) are greater than 25 or 30, the squared generalized distance \(d_{1}^2 \le \cdots \le d_{n}^2\) should behave like \(\chi_p^2\).
5.2.9 Power Transformation
$ x^=
\[\begin{cases} \tfrac{1}{x}, & \lambda = -1 \tag{\text{Reciprocal Transformation}}\\ \tfrac{1}{\sqrt{x}}, & \lambda = -\tfrac{1}{2} \\ \ln(x), & \lambda = 0 \\ \sqrt{x}, & \lambda = \tfrac{1}{2} \\ x, & \lambda = 1 \tag{\text{No Transformation}} \end{cases}\]$
Examine Q-Q plot to see whether the normal assumption is satisfactory after power transformation.
5.2.9.0.1 Power Transformation
$ x^() =
\[\begin{cases} \tfrac{x^\lambda - 1}{\lambda}, & \lambda \not = 0 \\ \ln(x), & \lambda = 0 \end{cases}\]$
at here, find \(\lambda\) that maximizes
$
l() = - ln+ () _{j=1}^n x_j
$
where \(\hat{x_j}^{(\lambda)} = \tfrac{1}{n} \sum_{j=1}^n x_j^{(\lambda)}\)
x^{()} is the most feasible values for normal distribution, but not guaranteed to follow normal distribution. * Transformation (Box-Cox) usually improves the approximation to normality. * Trial-and-error calculations may be necessary to find \(\lambda\) that maximizes \(l(\lambda)\) * Usually, change \(\lambda\) values from -1 to 1 with increment 0.1. * Examine Q-Q plot after the Box-Cox transformation.