5.1 Overview of mva (not ended)

Find relationships b/w \(\pmb x_p, \pmb y_q\), e.g., * Response variables (variable directed) * PCA * Factor Analysis * mv Regression * Cannonical Correlation Analysis * Experiment units (individual directed) * Discriminant Analysis * Cluster Analysis * MANOVA

Multivariate techniques tend to be exploratory. * i.e. not hypothesis testing type

Experimental units must be independent. Time series data are not appropriate for this course.




5.1.1 Notation

Variable \(y_1 , \cdots, y_p\)

One observation \(\pmb y ' = (y_1 , \cdots, y_p )\)

Data Matrix \(Y_{n \times p} = \begin{bmatrix} \pmb y ' = (y_1 , \cdots, y_p ) \\ \vdots \\ \pmb y ' = (y_1 , \cdots, y_p ) \end{bmatrix}\)

Random Samples: Suppose we intend to collect n sets of measurements on p variables, but not been observed yet. If $x_1 ‘, , x_n’ $ are independent observation from pdf \(f(\pmb x) = f(x_1, \cdots, x_p)\), then $x_1 ‘, , x_n’ $ are said to be rs from \(f(\pmb x)\).

rvec \(\pmb X = X_{p \times n} = \begin{bmatrix} \pmb x_1 \\ \vdots \\ \pmb x_p \end{bmatrix}\)

mean vector \(\pmb \mu = E (\pmb x) = \begin{bmatrix} \mu1 \\ \vdots \\ \mup \end{bmatrix}\)

Covariance Matrix \(\Sigma\)

Correlation Matrix \(\rho\), \(\rho_{ij} = \tfrac{\sigma_{ij}}{\sigma_{ii}\sigma_{jj}}\) * Correlation measures linear association. * Correlation is 0 if symmetric non-linear association exists.




5.1.2 Summary Statistics

  1. Sample Mean Vector \(\bar X=\) estimate of \(\pmb \mu\)
  2. Sample Covariance Matrix
  3. Sample Correlation Matrix




5.1.3 Statistical Inference on Correlation

$ H_0: = 0 $

test stat \(T=\dfrac {r \sqrt{n-2}}{\sqrt{1-r^2}} \sim t_{n-2}\), where \(r=Corr(x,y)\) and \(x \sim N_2, y \sim N_2\) Notes: 1. Correlation measures a linear relationships 2. it is still difficult to get a CI for \(\rho\).



5.1.3.0.1 CI for \(\rho\)
  1. Fisher’s Method:

\(100(1-\alpha)%\) CI for \(\rho\) $=((r) )

  1. Ruben’s Method

let \(u=z_{\alpha/2}, r^\ast = \dfrac {r}{\sqrt{1-r^2}\)

$ \[\begin{align*} a &= 2n-3-u^2 b &= r^\ast \ast \sqrt{(2n-3)(2n-5)} c &= (2n-5-u^2} \ast (r^\ast)^2 - 2u^2 \end{align*}\] $

set \(ay^2 - 2by +c =0\), then root of this equation will be \(y_1, y_2 = \dfrac{b}{a} \pm \dfrac {\sqrt{b^2-ac}}{a}\).

이때 \(100(1-\alpha)%\) CI for \(\rho\) \(=\left[ \dfrac{y_1}{\sqrt{1+y_1^2}, \dfrac{y_2}{\sqrt{1+y_2^2}, \right]\) * 이때, input은 \(n, \alpha, r\), output은 \(\rho\)의 CI.

5.1.4 Standardization

5.1.5 Missing Value Treatment