6.6 Generalized Least Squares

Consider a full rank parameterization

$ Y = X + ; ; ; ; ; ; ; ; ; ; E()=0, ; ; ; Cov() = ^2 >0 $

by SVD of $\Sigma$,

\[\begin{alignat}{2} \Sigma &= \Gamma ' \Lambda \Gamma = \Gamma ' \Lambda^{\tfrac{1}{2}} \Lambda^{\tfrac{1}{2}}\Gamma = \Gamma ' \Lambda^{\tfrac{1}{2}} \Gamma' \Gamma \Lambda^{\tfrac{1}{2}}\Gamma = \Lambda^{\tfrac{1}{2}} \\ \\ Z &\equiv \Lambda^{-\tfrac{1}{2}} Y = \Lambda^{-\tfrac{1}{2}}(X \beta + \epsilon) = \Lambda^{-\tfrac{1}{2}}X \beta + \Lambda^{-\tfrac{1}{2}} \epsilon = W \beta + \epsilon^\ast \end{alignat}\]

$ \[\begin{align} \hat \beta &= (W'W)^{-1} W' Z = (X' \Sigma^{-1}X)^{-1}X'\Sigma^{-1}Y \\ E(\hat \beta) &= (X' \Sigma^{-1}X)^{-1} X'\Sigma^{-1} X \beta = \beta \\ Cov(\hat \beta) &= \sigma^2 (X' \Sigma^{-1}X)^{-1} \\ \hat \sigma^2 &= \dfrac{\Vert Z - \mu_Z \Vert^2}{n-p} = \dfrac{(Y-\hat \mu)' \Sigma^{-1} (Y-\hat \mu)}{n-p} \end{align}\] $

the projection Matrix is $ ^{-} X (X’ ^{-1}X){-1}X’ ^{-}$, which is symmetric, and hence is an orthogonal projection.

Now all computations have been done in the $z$ coordinates, so in particular $x' \beta$ estimates $\mu_Z = \Sigma^{-\tfrac{1}{2}} \mu$.

Since linear combinations of Gauss-Markov estimates are Gauss-Markov, it follows immediately that $\hat \mu_Z = \Sigma^{-\tfrac{1}{2}} \hat \mu$.

6.6.1 A direct solution via inner products

We can approach the problem of determining the Generalized Least Squares estimators in a different way by viewing $\Sigma$ as determining an intter product.

We do this by returning to first principles, carefully defining means and covariances in a general inner product space.

let $x, \; y \in \mathbb{R}^n$ and $(x,y) = x'y$ be the usual innter product.

choose a basis $\{e_1 , \cdots, e_n \}$, the usual coordinate vectors. then a rvec $x$ has coordinates $(e_i, x) = x_i$.

Definition 1.

$E(x)=\mu= \begin{pmatrix} \mu_i \end{pmatrix}$ where $\mu_i = E(e_i , \; x)$. For any $a \in \mathbb{R}^n$,

$ E( (a, x) ) =

E( (_{i=1}^n a_i e_i, ; x ) ) =

E( _{i=1}^n a_i (e_i, ; x) ) =

_{i=1}^n a_i _i =

(a, ; ) $

thus, another characterization of $\mu$ is: $\mu$ is the unique vector that satisfies $E\Big( (a, x) \Big) = (a, \; \mu)$ for all $a \in \mathbb{R}^n$.

Now, turn to Cov. use the same set-up as above. if $E(x_i^2)<\infty$, then $Cov(x_i , x_j) = (x_i = \mu_i) (x_j - \mu_j) = \sigma_{ij} = \sigma_{ji}$ exists for all $i,j$, and defines $\Sigma = (\sigma_{ij})$.

For any $a, b \in \mathbb{R}^n$,

$ Cov( (a, x), (b, x) ) =

E( ({i=1}^n a_i x_i, ; {j=1}^n b_j x_j ) ) =

{i=1}^n {j=1}^n a_i b_j Cov(x_i, ; x_j) =

{i=1}^n {j=1}^n a_i b_j _{ij}

=(a, b)

Definition 2

Assume $E\Bigg( (a,x)^2 \Bigg) < \infty$. The unique non-negative definite linear transformation $\Sigma: V \rightarrow V$ that satisfies $Cov\Bigg( (a,x), (b,x) \Bigg) = (a, \Sigma b)$ for all $a, b \in V$ is called the covariance of $X$ and is denoted $Cov(x)$.

Theorem 1

let $Y \in V$ with innerproduct $(\cdot, \; \cdot)$, $Cov(Y)=\Sigma$. Define another inner product $(\cdot, \; \cdot )$ on $V$ by $[x,y] - (x, \; Ay)$ for some positive definite $A$. Then the covariance of $X$ in the inner product sapce $V, \; [\cdot, \; \cdot])$ is $\Sigma A$.

Note 1: This shows that if $Cov(X)$ exists in one inner product, it exists in all inner products.

If $Cov(X)=\Sigma$ in $\begin{pmatrix} V & (\cdot, \; \cdot) \end{pmatrix}$, then if $\Sigma > 0$ in the inner product $[x,y] = (x, \; \Sigma^{-1}y)$, the covariance is $\Sigma^{-1} \Sigma = I$.

Theorem 2

Suppose $Cov(X) = \Sigma$ in $\begin{pmatrix} V & (\cdot, \; \cdot) \end{pmatrix}$. If $\Sigma_1$ is symmetric on $\begin{pmatrix} V & (\cdot, \; \cdot) \end{pmatrix}$, and $Cov \Big( (a,x) \Big) = (a, \; \Sigma_1 a)$ for all $a \in V$, then $\Sigma_1 = \Sigma$. This implies that the covariance is unique.

Consider the inner product sapce given by $\begin{pmatrix} \mathbb{R}^n & (\cdot, \; \cdot) \end{pmatrix}$, where $[x,y] = (x, \; \Sigma^{-1}y)$, $E(Y)=\mu \in \mathcal{E}$ and $Cov(Y) = \sigma^2 \Sigma$.

Let $P_\Sigma$ be the projection on $\mathcal{E}$ in this inner product space, and let $Q_\Sigma = I - P_\Sigma$, so $y = P_{\Sigma} y + Q_{\Sigma} y$.

Theorem 3

with $[x,y] = (x, \; \Sigma^{-1}y)$, $P_\Sigma = X(X'\Sigma^{-1} X )^{-1} X' \Sigma^{-1}$ is an orthogonal projection.

Theorem 4

let the OLS estimate $\hat \beta = (X'X)^{-1}X'Y$ and the GLS estimate $\tilde \beta = (X'\Sigma^{-1}X)^{-1} X' \Sigma^{-1}Y$. then

$ = ; ; ; ; ; ; ; ; ; ; (^{-1}X) = (X) $

Corollary 1

$\mathcal{C}(\Sigma^{-1}X) = \mathcal{C}(X)= \mathcal{C}(\Sigma X)$

So $\Sigma$ need not be inverted to apply the theory.

To use this equivalence theorem (due to W. Kruskal), we usually characterize the $\Sigma$’s for a given $X$ for which $\hat \beta = \tilde \beta$.

if $X$ is completely arbitrary, then only $\Sigma = \sigma^2 I$ works.

Intra-class correlation model:

let $J_n \in \mathcal{C}(X)$. then any $\Sigma$ of the form

$ = ^2 (1-)I + ^2 J_n J_n ’ $

with $-\dfrac{1}{n-1} < \rho < 1$ will work.

to apply the theorem, we write,

$ X = ^2 (1-)X + ^2 J_n J_n ’ X $

so for $i>1$, the i-th coluimn of $\Sigma X$ is

$ ( X )_i = ^2 (1-)X_i + ^2 J_n a_i $

with $a_i = J_n ' X$.

Thus, the i-th column of $\Sigma X$ is a linear combination of the i-th column of $X$ and the column of $1$’s.

For the first column of $\Sigma X$, we compute $a_1 = J_n$ and $\Big ( \Sigma X \Big)_1 = \sigma^2 (1- \rho) J_n + n \sigma^2 \rho J_n = \sigma^2 \Big ( 1 + \rho(n-1) \Big )J_n$, So $\mathcal{C}(\Sigma X) = \mathcal{C}(X)$ as required, provided that $1+\rho(n-1) \not = 0$ or $\rho > -\dfrac{1}{n-1}$.