6.3 Estimation

이하와 같은 linear model 고려. 이때 $x_i '$는 $X$의 i번째 row vector이며, $E(\epsilon)=0, \; Cov(\epsilon)=\sigma^2 I = \sigma^2 \Sigma$.

\[ Y_{n \times 1} = X_{n \times p} \beta_{p \times 1} + \epsilon_{n \times 1} = \begin{pmatrix} x_i ' \beta \end{pmatrix} + \epsilon \]

6.3.1 Identifiability and Estimability

6.3.1.1 Identifiable

모델에서의 무한한 갯수의 관측치를 보유한다면, 모델의 underlying 패러미터의 참값을 획득하는 것이 가능한 성질.

A general linear model is a parameterization

$ \[\begin{align} E(Y) &= f(X) \\ &= E(X\beta + \epsilon)\\ &= X\beta + E(\epsilon) \\ &= X\beta + 0 \\ &= X\beta \end{align}\] $

The parameter $\beta$ is identifiable if for any $\beta_1$ and $\beta_2$ $f(\beta_1) = f(\beta_2)$ implies $\beta_1 = \beta_2$. If $\beta$ is identifiable, we say that the parameterization $f(\beta)$ is identifiable. (패러미터 $\beta$가 identifiable하다면, 우리는 해당 패러미터의 parameterization $f(\beta)$ 또한 identifiable 하다) Moreover, a vector-valued function $g(\beta)$ is identifiable if $f (\beta_1) = f(\beta_2)$ implies $g (\beta_1) = g(\beta_2)$.

For regression models for which $r(X) = p$, the parameters are identifiable: $X'X$ is nonsingular, so if $X\beta_1 = X\beta_2$, then

\[ \beta_1 = (X'X)^{-1} X'X \beta_1 = (X'X)^{-1} X'X \beta_2 = \beta_2 \]

A function $g(\beta)$ is identifiable $\iff$ $g(\beta)$ is a function of $f(\beta)$.

6.3.1.2 Estimable

The results in the last section suggest that some linear combinations of $\beta$ in the less than full rank case will not be estimable.

The linear parametric function $c'β$ is an estimable function if there exists a vector $a \in \mathbb{R}^n$ such that $\forall \beta: E(a ' y ) = c ' \beta$.

A vector-valued linear function of $\beta$, $\Lambda ' \beta$ is estimable if $\Lambda ' \beta = P ' X \beta$ for some matrix P; In other words, $\Lambda ' \beta$ is estimable if $\Lambda = X ' P \in \mathcal{C}(X')$.

Clearly, if $\Lambda ' \beta$ is estimable, it is identifiable and therefore it is a reasonable thing to estimate.

estimable $\rightarrow$ identifiable

For estimable functions $\Lambda' \beta = P ' X \beta$, although $P$ need not be unique, its perpendicular projection (columnwise) onto $\mathcal{C}(X)$ is unique:
let $P_1 , \; P_2$ be matrices with $\Lambda ' = P_1 ' X = P_2 ' X$, then

\[ MP_1 = X(X'X)^{-}X'P_1 = X(X'X)^{-}\Lambda = X(X'X)^{-}X'P_2 = MP_2 \]

Example 2.1.4 and 2.1.5

$g(\beta)$’s estimate, $f(Y)$, is unbiased if $\forall \beta: \; E[f(Y)] = g(\beta)$.

if $f (Y) = a_0 + a' Y$ for some scalar $a_0$ and vector $a$, $f(Y)$ is a linear estimate of $\Lambda ' \beta$.

if $\Lambda ' \beta$ $\iff$ $a_0 = 0$ and $a ' X = \Lambda'$; say, $\Lambda = X ' a \in \mathcal{C}(X')$, then a linear estimate $a_0 + a ' Y$ is unbiased

$\Lambda ' \beta$ is estimable $\iff$ there exists $\rho$ such that $E(\rho ' Y ) = \Lambda ' \beta$ for any $\beta$.

6.3.2 Estimation: Least Squares

Estimating $E(Y)$ is to take a vector in $\mathcal{C}(X)$ closest to $Y$;

$ \[\begin{alignat}{2} E(Y) &= X\beta \; &&\in \; \mathcal{C}(X)\\ \\ \hat \beta &= \min_\beta \left\{ (Y-X \beta) ' (Y-X \beta) \right\} \\ &= \min_\beta \left\{ \Vert Y-X \beta \Vert^2 \right\} \tag{Least Squares Estimate of beta} \end{alignat}\] $

for any Least Squares Estimate $\hat \beta$, LSE of $\Lambda ' \beta is \Lambda ' \hat \beta$, e.g., $\hat {\Lambda ' \beta}_{LSE} = \Lambda ' \hat \beta$.

Theorem 2.2.1

where $M$ is the perpendicular projection operator onto $\mathcal{C}(X)$, then

$\hat \beta$ is a LSE of $\beta$ $\iff$ $X \hat \beta = M Y$

Corollary 2.2.2

$\hat \beta_{LSE} = X(X'X)^{-}X' Y$

Corollary 2.2.3

The unique LSE of $\rho ' X \beta = \rho ' M Y$.

※ Note: the unique LSE of $\Lambda ' \beta = \Lambda ' \hat \beta = P' M Y$.

Theorem 2.2.4

the LSE of $\Lambda ' \beta$ is unique only if $\Lambda ' \beta$ is estimable: $\Lambda = X'\rho$ if $\Lambda ' \hat \beta_1 =\Lambda ' \hat \beta_2$, so that $X \hat \beta_1 = X \hat \beta_2 = MY$.

※ Note: When $\beta$ is not identifiable, we need side conditions imposed on the parameters to estimate nonidentifiable parameters.

※ Note: With $r = r (X) < p$ (overparameterized model), we need $p - r$ individual side conditions to identify and estimate the parameters.

Proposition 2.2.5

If $\Lambda = X ' \rho$, then $E(\rho ' MY) = \Lambda ' \beta$.

let’s decompose

$ \[\begin{alignat}{2} Y &= X \hat \beta &&+ Y - X \hat \beta \\ &= MY &&+ (I-M)Y \\ &= \hat Y &&+ e \end{alignat}\] $

이때 $ \[\begin{align} \hat Y &\in \mathcal{C}(X) \tag{fitted values of Y} \\ e &\in \mathcal{C}(X)^{\perp} \tag{residuals} \end{align}\] $

Theorem 2.2.6

Let $r (X) = r$ and $Cov(\epsilon) = \sigma^2 I$. At below formula, denominator is degrees of freedom for error.

Then an UE of $\sigma^2$, MSE, is as below.

\[ \hat \sigma^2 =\dfrac{Y'(I-M)Y}{rank(I-M)} =\dfrac{Y'(I-M)Y}{n-r} \tag{MSE} \]

6.3.3 Estimation: Best Linear Unbiased

Definition 2.3.1

$a'Y$ is a Best Linear Unbiased Estimate(BLUE) of $\lambda ' \beta$ if $a ' Y$ is unbiased.

e.g., $E(a ' Y) = \lambda ' \beta$ and if for any other linear unbiased estimate $b ' Y$, $Var(a ' Y) \le Var(b'Y)$.

Theorem 2.3.2: Gauss-Markov thm

Consider $Y = X \beta + \epsilon$ with $E(\epsilon) = 0$, $Cov(\epsilon) = \sigma^2 I$. Let $\lambda ' \beta$ be estimable.

Then LSE of $\lambda ' \beta=$ BLUE of $\lambda ' \beta$.

Corollary 2.3.3

Let $\sigma^2 > 0$. Then there exists a unique BLUE for any estimable function $\lambda ' \beta$.

6.3.4 Estimation: Maximum Likelihood

Assume that $Y \sim N_n(X\beta , \; \sigma^2 I_n)$. Then the Maximum Likelihood Estimates (MLEs) of $\beta$ and $\sigma^2$ are obtained by maximizing the log of the likelihood so that

$ \[\begin{align} \left( \hat \beta , \; \hat \sigma^2 \right) &= \text{ MLE of } \left( \beta , \; \sigma^2 \right) \\ &= \max_{\left( \beta , \; \sigma^2 \right)} \left\{ -\dfrac{n}{2}log(2 \pi) - \dfrac{1}{2} \log \left[ (\sigma^2 )^n\right] - \dfrac{(Y-X\beta)'(Y-X\beta)}{2\sigma^2} \right\} \end{align}\]

\[\begin{align} \hat \beta &= \text{ LSE of } \beta \\ \\\ \hat \sigma^2 &= \dfrac{1}{n} \left\{Y'(I-M)Y \right\} \end{align}\] $

6.3.5 Estimation: Minimum Variance Unbiased

Assume that $Y = X \beta + \epsilon$ with $\epsilon \sim N_n(0, \; \sigma^2 I_n)$.

if $\forall \beta, \sigma^2: \; E \left \{h[T(Y)] \right\} = 0$ implies that $Pr[h(T(Y)) = 0] = 1$, A vector-valued sufficient statistic $T(Y)$ is said to be complete

If $T(Y)$ is a complete sufficient statistic, then $f(T(Y))$ is a Minimum Variance Unbiased Estimate (MVUE) of $E \Big [ f (T(Y)) \Big ]$.

Theorem 2.5.3

let $\theta = (\theta_1 , \cdots, \theta_s)'$ and let $Y$ be a rvec with pdf as below. then $T(Y) = \Big( T_1(Y), \cdots, T_s(Y) \Big)'$ is a complete sufficient statistics provided that neither $\theta$ nor $T(Y)$ satisfies any linear constraints.

\[ f(Y) = c(\theta) \exp \left[ \sum_{i=1}^s \theta_i T_i (Y) \right] h(Y) \]

Theorem 2.5.4

MSE is a $\hat {\sigma^2 }_{MVUE}$, and $\hat { \rho ' X \beta }_{MVUE} = \rho ' M Y$ whenever $\epsilon \sim N(0, \; I)$.

6.3.6 Sampling Distributions of Estimates

Assume that $Y = X \beta + \epsilon$ with $\epsilon \sim N_n(0, \; \sigma^2 I_n)$. Then $Y \sim N_n(X \beta, \; \sigma^2 I_n)$. then

$ \[\begin{alignat}{4} \Lambda ' \hat \beta &= P' M Y &&\sim N(\Lambda ' \beta , \; &&\sigma^2 P'MP&&\; \; \; ) && \; \; \; \; \; \; \; \; \; \;&& && && \\ & &&\sim N(\Lambda ' \beta , \; &&\sigma^2 \Lambda ' (X'X)^{-} \Lambda&&\; \; \; ) && && \because && \;M && =X(X'X)^- X' \\ & && && && && && && \; \hat Y && = MY &&\sim N(X\beta, \sigma^2 M) \\ \hat \beta &= (X'X)^- X'Y &&\sim N(\beta , \; &&\sigma^2 (X'X)^{-1}) && && && && && && (\text{if X is of full rank}) \end{alignat}\] $

Do Exercise 2.1. Show that

\[ \dfrac{Y' (I-M) Y}{\sigma^2} \sim \chi^2 \Bigg( r(I-M), \; \dfrac{\beta'X'(I-M)X\beta}{2\sigma^2} \Bigg) \]

6.3.7 Generalized Least Squares(GLS)

Assume that for some known positive definite $\Sigma$,

\[ Y = X \beta + \epsilon, \; \; \; \; \; \]

\[ \begin{alignat}{3} Y &= X \beta &&+ \epsilon && \; \; \; \; \; \; \; \; \; \; && E(\epsilon)&&=0, \; \; &&\; Cov(\epsilon) &&= \sigma^2 \Sigma \tag{1} \\ \Sigma^{-\tfrac{1}{2}}Y &= \Sigma^{-\tfrac{1}{2}} X \beta &&+ \Sigma^{-\tfrac{1}{2}} \epsilon && \; \; \; \; \; \; \; \; \; \; && E(\Sigma^{-\tfrac{1}{2}} \epsilon)&&=0, &&\; Cov(\Sigma^{-\tfrac{1}{2}} \epsilon) &&= \sigma^2 I \tag{2, by SVD} \\ Y_\ast &= X_\ast \beta &&+ \epsilon_\ast && \; \; \; \; \; \; \; \; \; \; && E( \epsilon_\ast)&&=0, &&\; Cov( \epsilon_\ast) &&= \sigma^2 I \end{alignat} \]

\[ \begin{alignat}{2} \hat \beta_{GLS} &= \min_\beta (Y_\ast - X_\ast \beta)'(Y_\ast - X_\ast \beta) \\ &= \min_\beta \Vert Y_\ast - X_\ast \beta \Vert^2 \\ &= \min_\beta (Y - X \beta)' \Sigma^{-1} (Y - X \beta) \tag{Generalized LSE (GLSE) of β} \end{alignat} \]

Theorem 2.7.1

$\lambda ' \beta$ estimable in model (1) $\iff$ if $\lambda ' \beta$ is estimable in model (2).
$\hat \beta$ is GLSE of $\beta$ $\iff$ $X(X' \Sigma^{-1} X)^{-}X' \Sigma^{-1}Y = X \hat \beta$, which is Normal Equation of GLS.

For any estimable function, there exists a unique GLSE.

GLSE estimate of estimable $\lambda' \beta$, is BLUE of $\lambda' \beta$.
let $\epsilon \sim N(0, \; \Sigma^2 \Sigma)$. then, GLSE of estimable $\lambda ' \beta$, is MVUE.
let $\epsilon \sim N(0, \; \Sigma^2 \Sigma)$. then, $\hat \beta_{GLS} = \hat \beta_{MLE}$.

Normal Equation of GLS can be rewritten as

$ \[\begin{align} X(X' \Sigma^{-1} X)^{-}X' \Sigma^{-1}Y &= X \hat \beta \\ AY &= \end{align}\] $

$A$ is a projection operator onto $\mathcal{C}(X)$.

$Cov(X \hat \beta_{GLS}) = \sigma^2 \ast X(X' \Sigma^{-1} X)^{-}X'$ Let $\lambda ' \beta$ be estimable. Then $Var(\lambda ' \hat \beta_{GLS}) = \sigma^2 \ast \lambda ' (X' \Sigma^{-1} X)^- \lambda$.

Note: $(I-A)Y$ is residual vector of GLSE.

$ \[\begin{align} SSE_{GLS} &= (Y_\ast - \hat Y_\ast)' (Y_\ast - \hat Y_\ast) \\ &\; \; \vdots \\ &= Y'(I-A)' \Sigma^{-1}(I-A)Y \\ \\\ MSE_{GLS} &= \hat \sigma^2 \\ & = \dfrac{1}{n-r(X)} \ast SSE_{GLS}\\ \\\ \dfrac{1}{\hat \sigma^2} \dfrac{\lambda' \Big(\hat \beta_{GLS} - \beta_{GLS} \Big)}{ \lambda ' (X' \Sigma^{-1} X)^- \lambda} &\sim t\Big( n-r(x) \Big) \end{align}\] $

denominator는 $Var(\lambda ' \hat \beta_{GLS}) = \sigma^2 \ast \lambda ' (X' \Sigma^{-1} X)^- \lambda$.

Let $\Sigma$ be nonsingular and $\mathcal{C}(\Sigma X) \subset \mathcal{C}(X)$. Then least squares estimates are BLUEs.

Note: for diagonal $\Sigma$, GLS is referred to as Weighted Least Squares (WLS).

Exercise 2.5.

Show that $A$ is the perpendicular projection operator onto $\mathcal{C}(X)$ when the inner product between two vectors $\pmb x$ and $\pmb y$ is defined as $(\pmb x, \pmb y)_\Sigma \equiv \pmb x' \Sigma^{-1} \pmb y$.