5.7 Factor

PCA FA
concern with
explaining
\(Cov\) and/or \(Corr\) structure
among measured variables
Variability in the variables
Objectives 1. Partition the \(p\) response variables into \(m\) subsets, each consisting of a group of variables tending to be more highly related to others.
2. Create a new set of uncorrelated variables, called underlying factors or underlying characteristics.
3. Use the new variables in future analysis.
Warnings 1. If the original variables are already uncorrelated, no reason to consider FA.
2. Subjective decisions are necessary to determine number of factors, to determine the method to get the underlying factors
3. FA solutions are not unique.
5.7.0.0.1 Orthogonal Factor Model

\(\pmb X \sim \pmb \mu , \Sigma\).

Common Factors $F_1 , F_m $은 $X $와 linearly dependent. errors, Specific Factors $_1 , , _p $.

$ $

loading \(l_{ij}\)\(i\)번째 variable의 \(j\)번째 factor에 대한 loading. matrix of Factor Loadings \(\pmb L\)


즉슨, \(X_i - \mu_i\)\(F_j\)의 선형결합과 \(\epsilon_i\)를 더하는 것으로 서술될 수 있다는 게 요지.

다만 관측되지 않은 quantity가 너무 많아서 Factor Model의 직접적인 검증은 사실상 불가능함. 따라서 \(\pmb F\)\(\pmb \epsilon\)에 추가적인 조건을 덧붙인 후, \(Cov\) 관계성을 체크하는 것으로 대신한다.

\(E(\pmb F) = \pmb 0, Cov(\pmb F) = E(\pmb F \pmb F' ) = I_{m \times m}\)

$E() = , Cov() = E(’) = _{p p} = \[\begin{bmatrix}\Psi_1 & & 0 \\ & \ddots & \\ 0 & & \Psi_p \end{bmatrix}\]

_{p p} $

$ F $, so $Cov(, F ) = E(’) = _{p m} $




이때

$ \[\begin{align*} \Sigma = Cov(\pmb X) &= E \left[ (\pmb X - \pmb \mu) (\pmb X - \pmb \mu) ' \right] \\ &= E \left[ \pmb{LF (LF)' + \epsilon(LF)' + (LF) \epsilon' + \epsilon \epsilon'} \right] \\ &= \pmb {LE(FF')L' + E(\epsilon \epsilon')} \\ &= \pmb{LL' + \Psi} \end{align*}\] $

ㅁㄴㅇㄹ

$ \[\begin{align*} Cov (\pmb {X, F}) = \pmb {E \left[ (X-\mu)(F-0)' \right]} &= E \left[ \pmb {(X-\mu)F' }\right] \\ &= \pmb { E \left[ (LF + \epsilon)F' \right]} \\ &= \pmb { LE(FF') + E(\epsilon F') }\\ &= \pmb { L} \end{align*}\] $




따라서 Total Variation은:


communality \(h_j^2 = \sum_{j=1}^m l_{ij}^2\). * contribution by \(m\) column factors * $ ’ = \[\begin{bmatrix}h_1^2 & & \sigma_{1p} \\ & \ddots & \\ \sigma_{p1} & & h_p^2 \end{bmatrix}\]

_{p p}$.

specific variance \(\Psi_i\)

$

Var(X_i) = {j=1}^m l{ij}^2 + i \ Cov(X_i, X_k) = {j=1}^m l_{ij}l_{kj} \ Cov(X_i, F_j) = l_{ij} $



Notes: * When \(m=p\), any \(Cov\) matrix \(S\) can be reproduced exactly as \(\pmb{LL}'\), so \(\pmb \Psi\) is the zero matrix. * When \(m < p\), the FA is most useful. The FA model provides a “simple” explanation of covariance in \(\pmb X\). * When \(m \lll p\), most \(Cov\) matrices cannot be factored as \(\pmb{LL}'+\pmb \Psi\)(while maintaining basic statistical properties).

5.7.0.0.2 Uniqueness

Orthogonal factor model is not unique, b/c rotation.

5.7.1 Method of Estimation

Choosing the appropriate Number of Factors: 1. Similar to PCA. Determine the number of factors using scree plot or eigenvalue \(\ge 1\) 2. Do not include trivial factors (only one variable assigned to one factor). 3. Test the adequacy of the chosen number of factors.(Use ML method and LRT) for standardized variable. 4. Use AIC. Choose m that produces the minimum value for AIC. 5. Use SBC (Schwarz’s Bayesian Criterion).



Notes: * $= ’ + * objective is estimating \(\pmb L\)

5.7.1.0.1 1. Principal Component Method

$ = = $. 여기서 기여도가 낮은 \(\lambda_i\)에 해당하는 ev를 뒤에서부터 쳐내서 적당한 ev만으로 구성. 그 경우 \(=\pmb {L_{p \times m} L_{m \times p}'}\).

여기서 specific factors \(\pmb \Psi\)\(Var\)\(\Sigma - \pmb {LL'}\)의 diagonal elements 를 사용해서 구할 수 있다. 근사는 \(\Sigma \approx \pmb {LL' + \Psi}\).

$

i = i^2 - {j=1}^m l{ij}^2 = i^2 - {j=1}^m j e{ij}^2

$

이는 위에서 \(l_{ij} = \sqrt {\lambda_j e_{ij}}\)임을 보여놨기에 가능.



the importance of \(j\)th factor

$ \[\begin{align*} = \dfrac {\lambda_j}{\sum_{i=1}^p \lambda_i} &= \dfrac {\sum_{i=1}^p l_{ij}^2} {\sum_{i=1}^p \sigma^2} \\ &= \dfrac {\sum_{i=1}^p l_{ij}^2} {p} \; \; \; \; \; \; \; \; \; \; \text{if} \; \; \Sigma=\pmb \rho \end{align*}\] $

at here, communality \(h_i^2 = \sum_{j=1}^m l_{ij}^2\).

5.7.1.0.2 3. ML Method

assumption is needed: \(\pmb X \sim N_p (\pmb mu , \Sigma)\), where $= + $.

이때 \(L(\pmb \mu, \Sigma)\)는 $= + $ 이기에 \(\pmb L\), \(\Psi\)에 의존.

\(\hat \pmb L_{MLE}\), \(\hat \pmb \Psi_{MLE}\)는 수치해석으로 찾아짐.

estimated communalities들은 $ h_i^2 = {j=1}^m l{ij}^2$, \(i=1, \cdots, p\).

The importance of \(j\)th factor는~.

5.7.1.0.2.1 3.5. Test for the number of factors

\(H_0: \; \; \Sigma_{p \times p} = \pmb L_{p \times m} \pmb L'_{m \times} + \pmb \Psi_{p \times p}\) \(H_1: \Sigma_{p \times p}\)는 any other positive definite matrix.

assume \(\pmb X \sim N_p (\pmb mu , \Sigma)\).

under \(H_0\), $= + $. 이때 $_{MLE} = + $.

under \(H_1\), \(\hat \Sigma_{MLE} = S_n\). 이떄 $ S_n$은 sample \(Cov\) matrix.

LRT for testing \(H_0\):

$ -2log = n log ( ) = n log ( ) $

5.7.1.0.2.2 3.7. Bartlett’s Approx.

reject \(H_0\) if~.

5.7.1.0.3 3. Minimum Residual Method

let $Cov(X) = = + $, and mv regression \(pmb{Y_{n \times m}=Z_{n \times (r+1)} \beta_{(r+1)\times m)} + \epsilon_{n \times m}\)와 유사한 개형.

estimate factor loadings so that the sum of squares of off-diagonal residuals be minimized.

\(\hat \pmb L_{MLE}\), \(\hat \pmb \Psi_{MLE}\)는 수치해석으로 찾아짐.

estimated communalities들은 $ h_i^2 = {j=1}^m l{ij}^2$, \(i=1, \cdots, p\).

The importance of \(j\)th factor는~.




5.7.2 Factor Rotation

All factor loadings obtained from the initial loading by an orthogonal transformation have the same ability to reproduce the covariance matrix. * $= + = + = + $. at here, must be \(TT' = I\) by characteristics of rotation in linear algebra.

From matrix algebra, we know that an orthogonal transformation corresponds to a rigid rotation of the coordinate axes.

An orthogonal transformation of factor loading is called factor rotation.

The communalities \(\hat h_i^2\) and the specific variances \(\hat \Psi_i\) are not changed, b/c $ = $, and diagonal elements of this is communalities.

Rationale: Since the original loadings \(L\) may not be easily interpretable, it is usual practice to rotate them until a “simpler structure” is achieved.

5.7.3 Varimax Criterion

define \(\hat l_{ij}^\ast = \dfrac {l_{ij}^\ast}{\hat h_j}\) to be the rotated coefficients. Then the Varimax procedure selects the orthogonal transformation T that makes \(V=\) as large as possible. 이는 일종의 분산으로서 관점될 수 있다.

In words, $V _{j=1}^mVar( l_j^2 ) $ , which is variance of squares of loading for jth factor.

Maximizing \(V\) corresponding to “spreading out” the squares of loadings on each factor as much as possible.

5.7.3.0.1 Oblique Rotation
  • It is not possible to rotate the axes so that one axes goes through each cluster of variables while keeping the axes orthogonal to one another.
  • Such rotation can be achieved by multiplying L by a matrix Q where Q is not an orthogonal matrix.
  • Oblique rotations do not produce new factors that remain uncorrelated, which is a contradiction of the initial FA assumptions \(\rightarrow\) not good!

5.7.4 Factor Scores

In