6.6 Generalized Least Squares
Consider a full rank parameterization
$ Y = X + ; ; ; ; ; ; ; ; ; ; E()=0, ; ; ; Cov() = ^2 >0 $
by SVD of \(\Sigma\),
$
\[\begin{alignat}{2} \Sigma &= \Gamma ' \Lambda \Gamma = \Gamma ' \Lambda^{\tfrac{1}{2}} \Lambda^{\tfrac{1}{2}}\Gamma = \Gamma ' \Lambda^{\tfrac{1}{2}} \Gamma' \Gamma \Lambda^{\tfrac{1}{2}}\Gamma = \Lambda^{\tfrac{1}{2}} \\ \\ Z &\equiv \Lambda^{-\tfrac{1}{2}} Y = \Lambda^{-\tfrac{1}{2}}(X \beta + \epsilon) = \Lambda^{-\tfrac{1}{2}}X \beta + \Lambda^{-\tfrac{1}{2}} \epsilon = W \beta + \epsilon^\ast \end{alignat}\]
$
$ \[\begin{align} \hat \beta &= (W'W)^{-1} W' Z = (X' \Sigma^{-1}X)^{-1}X'\Sigma^{-1}Y \\ E(\hat \beta) &= (X' \Sigma^{-1}X)^{-1} X'\Sigma^{-1} X \beta = \beta \\ Cov(\hat \beta) &= \sigma^2 (X' \Sigma^{-1}X)^{-1} \\ \hat \sigma^2 &= \dfrac{\Vert Z - \mu_Z \Vert^2}{n-p} = \dfrac{(Y-\hat \mu)' \Sigma^{-1} (Y-\hat \mu)}{n-p} \end{align}\] $
the projection Matrix is $ ^{-} X (X’ {-1}X){-1}X’ ^{-}$, which is symmetric, and hence is an orthogonal projection.
Now all computations have been done in the \(z\) coordinates, so in particular \(x' \beta\) estimates \(\mu_Z = \Sigma^{-\tfrac{1}{2}} \mu\).
Since linear combinations of Gauss-Markov estimates are Gauss-Markov, it follows immediately that \(\hat \mu_Z = \Sigma^{-\tfrac{1}{2}} \hat \mu\).
6.6.1 A direct solution via inner products
We can approach the problem of determining the Generalized Least Squares estimators in a different way by viewing \(\Sigma\) as determining an intter product.
We do this by returning to first principles, carefully defining means and covariances in a general inner product space.
let \(x, \; y \in \mathbb{R}^n\) and \((x,y) = x'y\) be the usual innter product.
choose a basis \(\{e_1 , \cdots, e_n \}\), the usual coordinate vectors. then a rvec \(x\) has coordinates \((e_i, x) = x_i\).
- Definition 1.
\(E(x)=\mu= \begin{pmatrix} \mu_i \end{pmatrix}\) where \(\mu_i = E(e_i , \; x)\). For any \(a \in \mathbb{R}^n\),
$ E( (a, x) ) =
E( (_{i=1}^n a_i e_i, ; x ) ) =
E( _{i=1}^n a_i (e_i, ; x) ) =
_{i=1}^n a_i _i =
(a, ; ) $
thus, another characterization of \(\mu\) is: \(\mu\) is the unique vector that satisfies \(E\Big( (a, x) \Big) = (a, \; \mu)\) for all \(a \in \mathbb{R}^n\).
Now, turn to Cov. use the same set-up as above. if \(E(x_i^2)<\infty\), then \(Cov(x_i , x_j) = (x_i = \mu_i) (x_j - \mu_j) = \sigma_{ij} = \sigma_{ji}\) exists for all \(i,j\), and defines \(\Sigma = (\sigma_{ij})\).
For any \(a, b \in \mathbb{R}^n\),
$ Cov( (a, x), (b, x) ) =
E( ({i=1}^n a_i x_i, ; {j=1}^n b_j x_j ) ) =
{i=1}^n {j=1}^n a_i b_j Cov(x_i, ; x_j) =
{i=1}^n {j=1}^n a_i b_j _{ij}
=(a, b)
$
- Definition 2
Assume \(E\Bigg( (a,x)^2 \Bigg) < \infty\). The unique non-negative definite linear transformation \(\Sigma: V \rightarrow V\) that satisfies \(Cov\Bigg( (a,x), (b,x) \Bigg) = (a, \Sigma b)\) for all \(a, b \in V\) is called the covariance of \(X\) and is denoted \(Cov(x)\).
- Theorem 1
let \(Y \in V\) with innerproduct \((\cdot, \; \cdot)\), \(Cov(Y)=\Sigma\). Define another inner product \((\cdot, \; \cdot )\) on \(V\) by \([x,y] - (x, \; Ay)\) for some positive definite \(A\). Then the covariance of \(X\) in the inner product sapce \(V, \; [\cdot, \; \cdot])\) is \(\Sigma A\).
- Note 1: This shows that if \(Cov(X)\) exists in one inner product, it exists in all inner products.
If \(Cov(X)=\Sigma\) in \(\begin{pmatrix} V & (\cdot, \; \cdot) \end{pmatrix}\), then if \(\Sigma > 0\) in the inner product \([x,y] = (x, \; \Sigma^{-1}y)\), the covariance is \(\Sigma^{-1} \Sigma = I\).
- Theorem 2
Suppose \(Cov(X) = \Sigma\) in \(\begin{pmatrix} V & (\cdot, \; \cdot) \end{pmatrix}\). If \(\Sigma_1\) is symmetric on \(\begin{pmatrix} V & (\cdot, \; \cdot) \end{pmatrix}\), and \(Cov \Big( (a,x) \Big) = (a, \; \Sigma_1 a)\) for all \(a \in V\), then \(\Sigma_1 = \Sigma\). This implies that the covariance is unique.
Consider the inner product sapce given by \(\begin{pmatrix} \mathbb{R}^n & (\cdot, \; \cdot) \end{pmatrix}\), where \([x,y] = (x, \; \Sigma^{-1}y)\), \(E(Y)=\mu \in \mathcal{E}\) and \(Cov(Y) = \sigma^2 \Sigma\).
Let \(P_\Sigma\) be the projection on \(\mathcal{E}\) in this inner product space, and let \(Q_\Sigma = I - P_\Sigma\), so \(y = P_{\Sigma} y + Q_{\Sigma} y\).
- Theorem 3
with \([x,y] = (x, \; \Sigma^{-1}y)\), \(P_\Sigma = X(X'\Sigma^{-1} X )^{-1} X' \Sigma^{-1}\) is an orthogonal projection.
- Theorem 4
let the OLS estimate \(\hat \beta = (X'X)^{-1}X'Y\) and the GLS estimate \(\tilde \beta = (X'\Sigma^{-1}X)^{-1} X' \Sigma^{-1}Y\). then
$ = ; ; ; ; ; ; ; ; ; ; (^{-1}X) = (X) $
- Corollary 1
\(\mathcal{C}(\Sigma^{-1}X) = \mathcal{C}(X)= \mathcal{C}(\Sigma X)\)
So \(\Sigma\) need not be inverted to apply the theory.
To use this equivalence theorem (due to W. Kruskal), we usually characterize the \(\Sigma\)’s for a given \(X\) for which \(\hat \beta = \tilde \beta\).
if \(X\) is completely arbitrary, then only \(\Sigma = \sigma^2 I\) works.
- Intra-class correlation model:
let \(J_n \in \mathcal{C}(X)\). then any \(\Sigma\) of the form
$ = ^2 (1-)I + ^2 J_n J_n ’ $
with \(-\dfrac{1}{n-1} < \rho < 1\) will work.
to apply the theorem, we write,
$ X = ^2 (1-)X + ^2 J_n J_n ’ X $
so for \(i>1\), the i-th coluimn of \(\Sigma X\) is
$ ( X )_i = ^2 (1-)X_i + ^2 J_n a_i $
with \(a_i = J_n ' X\).
Thus, the i-th column of \(\Sigma X\) is a linear combination of the i-th column of \(X\) and the column of \(1\)’s.
For the first column of \(\Sigma X\), we compute \(a_1 = J_n\) and \(\Big ( \Sigma X \Big)_1 = \sigma^2 (1- \rho) J_n + n \sigma^2 \rho J_n = \sigma^2 \Big ( 1 + \rho(n-1) \Big )J_n\), So \(\mathcal{C}(\Sigma X) = \mathcal{C}(X)\) as required, provided that \(1+\rho(n-1) \not = 0\) or \(\rho > -\dfrac{1}{n-1}\).