5.2 Multivariate Nomral (wk2)

5.2.1 Overview

let rvec \(Z=(Z_1 , \cdots, Z_p)', Z_i \overset {iid}{\sim} N(0,1)\).

then \(X=(X_1 , \cdots, X_p)' = A_{k \times p} Z + \pmb \mu_{k \times 1}\) follows MVN.

at here, if \(rank(A)=p(\le k), AA' = \Sigma\), then \(X \sim N_p (0, \Sigma)\).

notation: \(\pmb y \sim N_p (\pmb \mu , \Sigma)\)

rvec \(\pmb y ' = [y_1 , \cdots , y_p ]\) have Multivariate Normal Distribution, if \(\sum_{i=1}^p a_i y_u = \pmb a' y\) has Univariate Normal Distribution, for every possible set of values for the elements in \(\pmb a\).

pdf for \(f(\pmb y) = \dfrac{1}{(2\pi)^p {\vert \Sigma \vert}^{1/2}} \exp \left\{ -\dfrac{1}{2} (\pmb y - \pmb \mu)' \Sigma^{-1} (\pmb y - \pmb \mu) \right\}\).

Ellipsoid: - path of \(\pmb y\) values yielding a constant height for the density,
i.e., all \(\pmb y\) s.t. \(\{ (\pmb y - \pmb \mu)' \Sigma^{-1} (\pmb y - \pmb \mu)=c^2 \}\).

Standard Normal Distribtion: - \(\pmb z = \left( {\Sigma^{1/2}}\right)^{-1} (\pmb y -\pmb \mu) \sim N_p (\pmb 0, I_p)\),
where \(\left( {\Sigma^{1/2}}\right)^{-1}\) satiesfy $ = ( {{1/2}}){-1} ( {{1/2}}){-1}$.

Property of \(\Sigma\): 1. symmetric Matrix 2. positive definite Matrix 3. $Cov(A y + b) = A A’ $.

※ if \(A\) is symmetric and non-singular, then \(A=CC'\), where \(C\) is lower triangular Matrix. This is called Cholesky Decomposition of \(A\).

  1. \(E(X)=\mu, Cov(X)=AA' = \Sigma\)
  2. \(M_X (t) = \exp (t' \mu + \dfrac {1}{2} t' \Sigma t\)
  3. if \(\Sigma=AA'\) is non-singular Matrix \(\iff rank(A)=p\)
  4. \(\Sigma = Cov(X)\)는 symmetric, n.n.d.

이상의 \(X\)에 대해 이하는 TFAE. 1. \(X \sim N_p (0, \Sigma)\). 2. 3. 4. 5.

5.2.2 Spectral Decomposition

if \(A\) is symmetric, non-singular, then \(A=E \Lambda E'\), where \(\lambda_i\) are ev (\(\lambda_1 \ge \cdots \ge \lambda_n\)), and \(\pmb e_i\) are evec (\(E'E = I_p)\). This is called Spectral Decomposition of \(A\).



\[\begin{bmatrix} \lambda_1 & & \pmb 0 \\ & \ddots & \\ \pmb 0 & & \lambda_n \end{bmatrix}\]

, ; ; ; ; ; E=


이때 \(\Sigma = E \Lambda E' = E \Lambda \Lambda^{1/2} E' = E \Lambda E' E \Lambda^{1/2} E' = \Sigma^{1/2} \Sigma^{1/2}\).

Center & Axis of ellipsoids of \(\{ (\pmb y - \mu)' \Sigma^{-1} (\pmb y - \mu)=c^2 \}\): * center: \(\pmb \mu\) * axis : \(\pm c \sqrt{\lambda_i \pmb e_i}\)

Square root Matrix:

let symmetric non-negative Matrix \(A_{p \times p}\). the square root matrix of \(A\) is defined as \(A^{1/2} = E \Lambda^{1/2} E'\), where

$ ^{1/2} =

\[\begin{bmatrix} \sqrt{\lambda_1} & & \pmb 0 \\ & \ddots & \\ \pmb 0 & & \sqrt{\lambda_n } \end{bmatrix}\]


Negative Square Root Matrix:

Let \(A\) be of full rank and all of its \(\lambda_i\) are positive, in addition to symmetry. \(A^{-1/2} = E \Lambda^{-1/2} E'\), where

$ ^{-1/2} =

\[\begin{bmatrix} \dfrac{1}{\sqrt{\lambda_1}} & & \pmb 0 \\ & \ddots & \\ \pmb 0 & & \dfrac{1}{\sqrt{\lambda_n }} \end{bmatrix}\]


Generalized Inverse:

let \(A\) be a non-negative M. if \(\lambda_1> \lambda_2 > \cdots > \lambda_r > 0 = \lambda_{r+1} = \cdots = \lambda_{p}\), i.e., not full rank, then the Moore-Penrose generalized inverse of \(A\) is given by


A^{-} =

e_1 e_1 ’ + + e_r e_r ’



$ ^{-} =

\[\begin{bmatrix} \dfrac{1}{\lambda_1} & & & & \pmb 0 \\ & \ddots & & & \\ & & \dfrac{1}{\lambda_n } & & & \\ & & & 0 & & \\ & & & & \ddots & \\ \pmb 0 & & & & & 0 \\ \end{bmatrix}\]


Marginal Distribtion:

$ \[\begin{align*} \pmb y \sim N_p (\pmb \mu , \Sigma) &\Longrightarrow y_i \sim N(\mu_i, \sigma^{ii}), \; \; \; i= 1, \cdots, p \\ &\not \Longleftarrow \end{align*}\] $

5.2.3 Properties of MVN

  1. linear combination of the components of \(\pmb y\) are normally distributed.
  2. any subset of \(\pmb y\) have MVN.
  3. conditional distribution of the components of \(\pmb y\) are MVN:

$ y N_p(, ) a ’ y N( a ’ , a ’ a ) $

$ y N_p(, ) , ; ;

A_{n times p} =

\[\begin{bmatrix} a_{11} & \cdots & a_{1p} \\ \vdots & \ddots & \vdots \\ a_{n1} & \cdots & a_{np} \end{bmatrix}\]

A y N_n(A , A A’) $ 즉슨 dimension 변화

if $ y N_p(, )$, and cvec \(\pmb d\), then $ y + d N_p(+ d , )$.

  1. If we partition y, μ, S ! ! as follows

Let 1 11 2 ~ ( , ) p y y N y μ é ù ê ú = ê ú S ê ú ë û !” ! ! ! with

5.2.4 \(\Chi^2\) distribution

if \(\pmb z \sim N_p ( \pmb 0 , I_p )\), then \(\pmb z ' \pmb z = \sum_{i=1}^p z_i^2 \sim \Chi_p^2\).

if $ y N_p(, )$, then $(y - )’ ^{-1} (y - ) _p^2 $

the \(N_p(\pmb \mu , \Sigma)\) distribution assigns probability \(1-\alpha\) to the solid ellipsoid \(\left \{ \pmb y : (\pmb y - \pmb \mu)' \Sigma^{-1} (\pmb y - \pmb \mu) \le \chi_p^2 (\alpha) \right \}\), where \(\chi_p^2 (\alpha)\) denotes upper \((100 \ast \alpha)\) th percentile of the \(\chi_p^2\) distribution.

5.2.5 Linear Combination of Random Vectors

5.2.6 Multivariate Normal Likelihood

5.2.7 Sampling Distribtion of \(\bar {\pmb y}, S\)

let rvec $ y_1, y_n N_p(, )$.

\(\bar {\pmb y} \sim N_p (\pmb \mu , \dfrac{1}{n} \Sigma)\)

(n-1) S $ Wishart distribution, with \(df=n-1\) * \(S\) is random Matrix, e.g., Wishart is distribution of rM.

\(\bar {\pmb y} \perp S\). Wishart Distribtion

\(\dfrac {\sum (x_i - \barx )^2}{\sigma^2} = \dfrac {S^2} {\dfrac{\sigma^2}{n-1}} \sim \chi_{n-1}^2\), i.e., \(\sum (x_i - \barx )^2 = (n-1)S^2 \sim \sigma^2 \ast \chi_{n-1}^\)

for let rvec $ y_1, y_n N_p(, )$,

$ \[\begin{align*} \sum_{i=1}^n(\pmb y - \pmb \mu)(\pmb y - \pmb \mu)' &\sim W_p (n, \Sigma) \\ \\ (n-1)S^2 = \sum_{i=1}^n(\pmb y - \bar {\pmb y} )(\pmb y - \bar {\pmb y} )' &\sim W_p (n-1, \Sigma) \end{align*}\] $

if \(A \sim W_p (n, \Sigma), B \sim W_p (m, \Sigma)\), and \(A \perp B:\) \(A+B \sim W_p (n+m, \Sigma)\)

if \(A \sim W_p (n, \Sigma)\), then \(CAC' \sim W_p (n, C \Sigma C')\)

if \(A \sim W_p (n-1, \Sigma)\), \(f(A)\), where gamma function. MV t-Distribtion

※ univariate t-Distribtion \(t=\tfrac{\tfrac{U}{\sigma}}{\sqrt{\tfrac{V}{nu}}} \sim t_{\nu}\), where \(U \sim N(0, \sigma^2), V \sim \chi_{\nu}^2\), and \(U \perp V\).

let $ y = (y_1, , y_n)’ N_p(, )$, and \(V \sim \chi_{\nu}^2\), and \(\pmb y \perp V\).

assume rvec \(\pmb t = (t_1 , \cdots, t_p)'\),\(t_i = \tfrac {\tfrac{y_i - \mu_I}{\sigma_i}{\sqrt{V/\nu}}, i=1, \cdots, p\) * Note that each \(t_i \sim t\).

at here, joint distribution of \(\pmb t\) is called MV t-distribution, with \(df=\nu\) and matrix parameter \(\Sigma\).

denote this distribution by Dirichlet Distribution

※ is MV generalization of \(BETA\).

let $ y D_p(1 , {p+1})$ * parameters: \(\{\nu_i, i=1, \cdots, p+1\}\) * pdf: f(y) = _{i=1}^p y_i^{v_i - 1}$

????????????????????????????????????????????????? CLT


$ y_1 , , y_n {} , < $. then

$ \[\begin{align*} \sqrt {n} (\hat {\pmb y} - \pmb \mu) &\overset {d} {\rightarrow} N_p (\pmb 0 , \Sigma) \\ n (\hat {\pmb y} - \pmb \mu)' S^{-1} (\hat {\pmb y} - \pmb \mu) &\overset {d} {\rightarrow} \chi_p^2 \end{align*}\] $

5.2.8 Assessing Normality 1. Univariate Marginal Distribtion a. Q-Q Plot

※ Sample quantile vs. quantile of N distribution

let order statitics, or sample quantiles \(x_{(1)} \le \cdots \le x_{(n)}\).

the proportion of sample below \(x_{(j)}\) is approximated by \(\tfrac{j-\tfrac{1}{2}}{n}\).

the quantiles \(q_{(j)}\) for std. N are defined as

$ P(z q_{j)}) = {-}^{q{(j)}} ( - z^2 ) dz $

if the data arise from a N population, then \((\sigma \ast q_{(j)} + \mu \congruent x_{(j)}\).

Similarly, the pairs \((q_{(j)}, x_{(j)})\) will be linearly related.

Proceeds: 1. get \(x_{(1)} \le \cdots \le x_{(n)}\) from original obs. 2. calculate probability values \(\tfrac{j-1/2}{n}, \; \; j= 1, \cdots, n\) 3. calculate standard normal quantities \(q_{(1)}, \cdots, q_{(n)}\) 4. plot the pairs of observations $(q_{(1)}, x_{(1)}), , \((q_{(n)}, x_{(n)})\)

Checking the straightness of Q-Q plot: * using corr coef * Hypothesis tesiting: \(H_0: \rho=0\), $T= t_{n-2} b. others
    1. Shapiro-Wilks Test:

Test of correlation coefficient b/w \(x_{(j)}, r_{(j)}\). \(r_{(j)}\) is function of the expected value of standard normal order statistics, and their \(Cov\).

    1. Kolmogorov-Smirnov Test

Compare cdf’s:

If the data arise from a normal population, the differences are small.

$ T = _x F(x) - S(x) $

where cdf \(F(x)\), empirical cdf \(S(x)\).

    1. Skewness Test

skewness \(\sqrt{b_1} = \tfrac{\sqrt{n} \sum_{i=1}^n (x_i - \bar x)^3} {\left[ \sum_{i=1}^n (x_i - \bar x)^2 \right]^{\tfrac{3}{2}}}\)

When the population is normal, the skewness = 0.

    1. Kurtosis Test:

kurtosis \({b_2} = \tfrac{n \sum_{i=1}^n (x_i - \bar x)^4} {\left[ \sum_{i=1}^n (x_i - \bar x)^2 \right]^{3}}\)

When the population is normal, the kurtosis is 3.

    1. Lin and Mudholkar (1980):


Z = ^{-1}(r) = ( ) $

where \(r\) is the sample \(corr\) of \(n\) pair \((x_i , q_i), \; \; i=1, \cdots, n\) with \(q_i = \tfrac {1}{n} \left( \sum_{i \not = j} x_j^2 - \tfrac{1}{n-1} \left( \sum_{i \not = j} x_j\right)^2 \right)^{\tfrac{1}{3}}\).

if the data arise from a normal population, \(Z \sim N(0, \tfrac 3 n)\). 2. Bivariate Normality

※ If the data are generated from a multivariate normal, each bivariate distribution would be normal.

    1. Scatter Plot

the contours of bivariate normal density are ellipses. The pattern of the scatter plot must be near elliptical.

    1. Squared Generalized Distances

\(\pmb y \sim N_p (\pmb \mu, \Sigma) \; \; \; \Longrightarrow \; \; \; (\pmb y - \pmb \mu)' \Sigma^{-1} (\pmb y - \pmb \mu) \sim \chi_p^2\).

it means, for bivariate cases, Squared Generalized Distances \(d_j^2 = (\pmb x_j - \hat {\pmb x})' S^{-1} (\pmb x_j - \hat {\pmb x}) \sim \chi_2^2\).

    1. Chi2 Plot (Gamma Plot)

\(d_1^2 , \cdots, d_n^2\) should behave like \(\chi_2^2\) rv. 1. order the squared distances \(d_{(1)}^2 \le \cdots \le d_{(n)}^2\) 2. calculate the probabilitt values \(\tfrac{j-1/2}{n}\), \(j=1,\cdots, n\) 3. Calculate quantiles of \(\chi_2^2\) distribution \(q_{(1)}, \cdots, q_{(n)}\). 4. Plot the pairs \((q_{(j)}, d_{(j)}^2 ), \; \; j=1, \cdots, n\) where \(q_{(j)} = \chi_2^2 \left( \tfrac{j-1/2}{n} \right)\)

The plot should resemble a straight line through the origin having slope 1. 2. Multivariate Normality

Practically, it is usually sufficient to investigate the univariate and bivariate distributions.

Chi-square plot is still useful. When the parent population is multivariate normal, and both \(n\) and \(n-p\) are greater than 25 or 30, the squared generalized distance \(d_{1}^2 \le \cdots \le d_{n}^2\) should behave like \(\chi_p^2\).

5.2.9 Power Transformation

$ x^=

\[\begin{cases} \tfrac{1}{x}, & \lambda = -1 \tag{\text{Reciprocal Transformation}}\\ \tfrac{1}{\sqrt{x}}, & \lambda = -\tfrac{1}{2} \\ \ln(x), & \lambda = 0 \\ \sqrt{x}, & \lambda = \tfrac{1}{2} \\ x, & \lambda = 1 \tag{\text{No Transformation}} \end{cases}\]


Examine Q-Q plot to see whether the normal assumption is satisfactory after power transformation. Power Transformation

$ x^() =

\[\begin{cases} \tfrac{x^\lambda - 1}{\lambda}, & \lambda \not = 0 \\ \ln(x), & \lambda = 0 \end{cases}\]


at here, find \(\lambda\) that maximizes


l() = - ln+ () _{j=1}^n x_j


where \(\hat{x_j}^{(\lambda)} = \tfrac{1}{n} \sum_{j=1}^n x_j^{(\lambda)}\)

x^{()} is the most feasible values for normal distribution, but not guaranteed to follow normal distribution. * Transformation (Box-Cox) usually improves the approximation to normality. * Trial-and-error calculations may be necessary to find \(\lambda\) that maximizes \(l(\lambda)\) * Usually, change \(\lambda\) values from -1 to 1 with increment 0.1. * Examine Q-Q plot after the Box-Cox transformation. nqplot, contour plot, cqplot, cqplot and box-cox plot