6.3 Estimation

이하와 같은 linear model 고려. 이때 \(x_i '\)\(X\)의 i번째 row vector이며, \(E(\epsilon)=0, \; Cov(\epsilon)=\sigma^2 I = \sigma^2 \Sigma\).

\[ Y_{n \times 1} = X_{n \times p} \beta_{p \times 1} + \epsilon_{n \times 1} = \begin{pmatrix} x_i ' \beta \end{pmatrix} + \epsilon \]



6.3.1 Identifiability and Estimability

6.3.1.1 Identifiable

모델에서의 무한한 갯수의 관측치를 보유한다면, 모델의 underlying 패러미터의 참값을 획득하는 것이 가능한 성질.

A general linear model is a parameterization

$ \[\begin{align} E(Y) &= f(X) \\ &= E(X\beta + \epsilon)\\ &= X\beta + E(\epsilon) \\ &= X\beta + 0 \\ &= X\beta \end{align}\] $

The parameter \(\beta\) is identifiable if for any \(\beta_1\) and \(\beta_2\) \(f(\beta_1) = f(\beta_2)\) implies \(\beta_1 = \beta_2\). If \(\beta\) is identifiable, we say that the parameterization \(f(\beta)\) is identifiable. (패러미터 \(\beta\)가 identifiable하다면, 우리는 해당 패러미터의 parameterization \(f(\beta)\) 또한 identifiable 하다) Moreover, a vector-valued function \(g(\beta)\) is identifiable if \(f (\beta_1) = f(\beta_2)\) implies \(g (\beta_1) = g(\beta_2)\).

For regression models for which \(r(X) = p\), the parameters are identifiable: \(X'X\) is nonsingular, so if \(X\beta_1 = X\beta_2\), then

\[ \beta_1 = (X'X)^{-1} X'X \beta_1 = (X'X)^{-1} X'X \beta_2 = \beta_2 \]

A function \(g(\beta)\) is identifiable \(\iff\) \(g(\beta)\) is a function of \(f(\beta)\).




6.3.1.2 Estimable

The results in the last section suggest that some linear combinations of \(\beta\) in the less than full rank case will not be estimable.

The linear parametric function \(c'β\) is an estimable function if there exists a vector \(a \in \mathbb{R}^n\) such that \(\forall \beta: E(a ' y ) = c ' \beta\).

A vector-valued linear function of \(\beta\), \(\Lambda ' \beta\) is estimable if \(\Lambda ' \beta = P ' X \beta\) for some matrix P; In other words, \(\Lambda ' \beta\) is estimable if \(\Lambda = X ' P \in \mathcal{C}(X')\).

Clearly, if \(\Lambda ' \beta\) is estimable, it is identifiable and therefore it is a reasonable thing to estimate.

  • estimable \(\rightarrow\) identifiable

For estimable functions \(\Lambda' \beta = P ' X \beta\), although \(P\) need not be unique, its perpendicular projection (columnwise) onto \(\mathcal{C}(X)\) is unique:
let \(P_1 , \; P_2\) be matrices with \(\Lambda ' = P_1 ' X = P_2 ' X\), then

\[ MP_1 = X(X'X)^{-}X'P_1 = X(X'X)^{-}\Lambda = X(X'X)^{-}X'P_2 = MP_2 \]




  • Example 2.1.4 and 2.1.5

\(g(\beta)\)’s estimate, \(f(Y)\), is unbiased if \(\forall \beta: \; E[f(Y)] = g(\beta)\).

if \(f (Y) = a_0 + a' Y\) for some scalar \(a_0\) and vector \(a\), \(f(Y)\) is a linear estimate of \(\Lambda ' \beta\).

if \(\Lambda ' \beta\) \(\iff\) \(a_0 = 0\) and \(a ' X = \Lambda'\); say, \(\Lambda = X ' a \in \mathcal{C}(X')\), then a linear estimate \(a_0 + a ' Y\) is unbiased

\(\Lambda ' \beta\) is estimable \(\iff\) there exists \(\rho\) such that \(E(\rho ' Y ) = \Lambda ' \beta\) for any \(\beta\).







6.3.2 Estimation: Least Squares

Estimating \(E(Y)\) is to take a vector in \(\mathcal{C}(X)\) closest to \(Y\);

$ \[\begin{alignat}{2} E(Y) &= X\beta \; &&\in \; \mathcal{C}(X)\\ \\ \hat \beta &= \min_\beta \left\{ (Y-X \beta) ' (Y-X \beta) \right\} \\ &= \min_\beta \left\{ \Vert Y-X \beta \Vert^2 \right\} \tag{Least Squares Estimate of beta} \end{alignat}\] $

for any Least Squares Estimate \(\hat \beta\), LSE of \(\Lambda ' \beta is \Lambda ' \hat \beta\), e.g., \(\hat {\Lambda ' \beta}_{LSE} = \Lambda ' \hat \beta\).



  • Theorem 2.2.1

where \(M\) is the perpendicular projection operator onto \(\mathcal{C}(X)\), then

\(\hat \beta\) is a LSE of \(\beta\) \(\iff\) \(X \hat \beta = M Y\)



  • Corollary 2.2.2

\(\hat \beta_{LSE} = X(X'X)^{-}X' Y\)



  • Corollary 2.2.3

The unique LSE of \(\rho ' X \beta = \rho ' M Y\).

※ Note: the unique LSE of \(\Lambda ' \beta = \Lambda ' \hat \beta = P' M Y\).



  • Theorem 2.2.4

the LSE of \(\Lambda ' \beta\) is unique only if \(\Lambda ' \beta\) is estimable: \(\Lambda = X'\rho\) if \(\Lambda ' \hat \beta_1 =\Lambda ' \hat \beta_2\), so that \(X \hat \beta_1 = X \hat \beta_2 = MY\).

※ Note: When \(\beta\) is not identifiable, we need side conditions imposed on the parameters to estimate nonidentifiable parameters.

※ Note: With \(r = r (X) < p\) (overparameterized model), we need \(p - r\) individual side conditions to identify and estimate the parameters.



  • Proposition 2.2.5

If \(\Lambda = X ' \rho\), then \(E(\rho ' MY) = \Lambda ' \beta\).

let’s decompose

$ \[\begin{alignat}{2} Y &= X \hat \beta &&+ Y - X \hat \beta \\ &= MY &&+ (I-M)Y \\ &= \hat Y &&+ e \end{alignat}\] $

이때 $ \[\begin{align} \hat Y &\in \mathcal{C}(X) \tag{fitted values of Y} \\ e &\in \mathcal{C}(X)^{\perp} \tag{residuals} \end{align}\] $



  • Theorem 2.2.6

Let \(r (X) = r\) and \(Cov(\epsilon) = \sigma^2 I\). At below formula, denominator is degrees of freedom for error.

Then an UE of \(\sigma^2\), MSE, is as below.

\[ \hat \sigma^2 =\dfrac{Y'(I-M)Y}{rank(I-M)} =\dfrac{Y'(I-M)Y}{n-r} \tag{MSE} \]







6.3.3 Estimation: Best Linear Unbiased

  • Definition 2.3.1

\(a'Y\) is a Best Linear Unbiased Estimate(BLUE) of \(\lambda ' \beta\) if \(a ' Y\) is unbiased.

e.g., \(E(a ' Y) = \lambda ' \beta\) and if for any other linear unbiased estimate \(b ' Y\), \(Var(a ' Y) \le Var(b'Y)\).



  • Theorem 2.3.2: Gauss-Markov thm

Consider \(Y = X \beta + \epsilon\) with \(E(\epsilon) = 0\), \(Cov(\epsilon) = \sigma^2 I\). Let \(\lambda ' \beta\) be estimable.

Then LSE of \(\lambda ' \beta=\) BLUE of \(\lambda ' \beta\).



  • Corollary 2.3.3

Let \(\sigma^2 > 0\). Then there exists a unique BLUE for any estimable function \(\lambda ' \beta\).







6.3.4 Estimation: Maximum Likelihood

Assume that \(Y \sim N_n(X\beta , \; \sigma^2 I_n)\). Then the Maximum Likelihood Estimates (MLEs) of \(\beta\) and \(\sigma^2\) are obtained by maximizing the log of the likelihood so that

$ \[\begin{align} \left( \hat \beta , \; \hat \sigma^2 \right) &= \text{ MLE of } \left( \beta , \; \sigma^2 \right) \\ &= \max_{\left( \beta , \; \sigma^2 \right)} \left\{ -\dfrac{n}{2}log(2 \pi) - \dfrac{1}{2} \log \left[ (\sigma^2 )^n\right] - \dfrac{(Y-X\beta)'(Y-X\beta)}{2\sigma^2} \right\} \end{align}\]

$

$

\[\begin{align} \hat \beta &= \text{ LSE of } \beta \\ \\\ \hat \sigma^2 &= \dfrac{1}{n} \left\{Y'(I-M)Y \right\} \end{align}\] $







6.3.5 Estimation: Minimum Variance Unbiased

Assume that \(Y = X \beta + \epsilon\) with \(\epsilon \sim N_n(0, \; \sigma^2 I_n)\).

if \(\forall \beta, \sigma^2: \; E \left \{h[T(Y)] \right\} = 0\) implies that \(Pr[h(T(Y)) = 0] = 1\), A vector-valued sufficient statistic \(T(Y)\) is said to be complete

If \(T(Y)\) is a complete sufficient statistic, then \(f(T(Y))\) is a Minimum Variance Unbiased Estimate (MVUE) of \(E \Big [ f (T(Y)) \Big ]\).



  • Theorem 2.5.3

let \(\theta = (\theta_1 , \cdots, \theta_s)'\) and let \(Y\) be a rvec with pdf as below. then \(T(Y) = \Big( T_1(Y), \cdots, T_s(Y) \Big)'\) is a complete sufficient statistics provided that neither \(\theta\) nor \(T(Y)\) satisfies any linear constraints.

\[ f(Y) = c(\theta) \exp \left[ \sum_{i=1}^s \theta_i T_i (Y) \right] h(Y) \]



  • Theorem 2.5.4

MSE is a \(\hat {\sigma^2 }_{MVUE}\), and \(\hat { \rho ' X \beta }_{MVUE} = \rho ' M Y\) whenever \(\epsilon \sim N(0, \; I)\).







6.3.6 Sampling Distributions of Estimates

Assume that \(Y = X \beta + \epsilon\) with \(\epsilon \sim N_n(0, \; \sigma^2 I_n)\). Then \(Y \sim N_n(X \beta, \; \sigma^2 I_n)\). then

$ \[\begin{alignat}{4} \Lambda ' \hat \beta &= P' M Y &&\sim N(\Lambda ' \beta , \; &&\sigma^2 P'MP&&\; \; \; ) && \; \; \; \; \; \; \; \; \; \;&& && && \\ & &&\sim N(\Lambda ' \beta , \; &&\sigma^2 \Lambda ' (X'X)^{-} \Lambda&&\; \; \; ) && && \because && \;M && =X(X'X)^- X' \\ & && && && && && && \; \hat Y && = MY &&\sim N(X\beta, \sigma^2 M) \\ \hat \beta &= (X'X)^- X'Y &&\sim N(\beta , \; &&\sigma^2 (X'X)^{-1}) && && && && && && (\text{if X is of full rank}) \end{alignat}\] $



Do Exercise 2.1. Show that

\[ \dfrac{Y' (I-M) Y}{\sigma^2} \sim \chi^2 \Bigg( r(I-M), \; \dfrac{\beta'X'(I-M)X\beta}{2\sigma^2} \Bigg) \]







6.3.7 Generalized Least Squares(GLS)

Assume that for some known positive definite \(\Sigma\),

\[ Y = X \beta + \epsilon, \; \; \; \; \; \]

\[ \begin{alignat}{3} Y &= X \beta &&+ \epsilon && \; \; \; \; \; \; \; \; \; \; && E(\epsilon)&&=0, \; \; &&\; Cov(\epsilon) &&= \sigma^2 \Sigma \tag{1} \\ \Sigma^{-\tfrac{1}{2}}Y &= \Sigma^{-\tfrac{1}{2}} X \beta &&+ \Sigma^{-\tfrac{1}{2}} \epsilon && \; \; \; \; \; \; \; \; \; \; && E(\Sigma^{-\tfrac{1}{2}} \epsilon)&&=0, &&\; Cov(\Sigma^{-\tfrac{1}{2}} \epsilon) &&= \sigma^2 I \tag{2, by SVD} \\ Y_\ast &= X_\ast \beta &&+ \epsilon_\ast && \; \; \; \; \; \; \; \; \; \; && E( \epsilon_\ast)&&=0, &&\; Cov( \epsilon_\ast) &&= \sigma^2 I \end{alignat} \]

\[ \begin{alignat}{2} \hat \beta_{GLS} &= \min_\beta (Y_\ast - X_\ast \beta)'(Y_\ast - X_\ast \beta) \\ &= \min_\beta \Vert Y_\ast - X_\ast \beta \Vert^2 \\ &= \min_\beta (Y - X \beta)' \Sigma^{-1} (Y - X \beta) \tag{Generalized LSE (GLSE) of β} \end{alignat} \]

  • Theorem 2.7.1
  1. \(\lambda ' \beta\) estimable in model (1) \(\iff\) if \(\lambda ' \beta\) is estimable in model (2).
  2. \(\hat \beta\) is GLSE of \(\beta\) \(\iff\) \(X(X' \Sigma^{-1} X)^{-}X' \Sigma^{-1}Y = X \hat \beta\), which is Normal Equation of GLS.
  • For any estimable function, there exists a unique GLSE.
  1. GLSE estimate of estimable \(\lambda' \beta\), is BLUE of \(\lambda' \beta\).
  2. let \(\epsilon \sim N(0, \; \Sigma^2 \Sigma)\). then, GLSE of estimable \(\lambda ' \beta\), is MVUE.
  3. let \(\epsilon \sim N(0, \; \Sigma^2 \Sigma)\). then, \(\hat \beta_{GLS} = \hat \beta_{MLE}\).




Normal Equation of GLS can be rewritten as

$ \[\begin{align} X(X' \Sigma^{-1} X)^{-}X' \Sigma^{-1}Y &= X \hat \beta \\ AY &= \end{align}\] $

\(A\) is a projection operator onto \(\mathcal{C}(X)\).

\(Cov(X \hat \beta_{GLS}) = \sigma^2 \ast X(X' \Sigma^{-1} X)^{-}X'\) Let \(\lambda ' \beta\) be estimable. Then \(Var(\lambda ' \hat \beta_{GLS}) = \sigma^2 \ast \lambda ' (X' \Sigma^{-1} X)^- \lambda\).

  • Note: \((I-A)Y\) is residual vector of GLSE.

$ \[\begin{align} SSE_{GLS} &= (Y_\ast - \hat Y_\ast)' (Y_\ast - \hat Y_\ast) \\ &\; \; \vdots \\ &= Y'(I-A)' \Sigma^{-1}(I-A)Y \\ \\\ MSE_{GLS} &= \hat \sigma^2 \\ & = \dfrac{1}{n-r(X)} \ast SSE_{GLS}\\ \\\ \dfrac{1}{\hat \sigma^2} \dfrac{\lambda' \Big(\hat \beta_{GLS} - \beta_{GLS} \Big)}{ \lambda ' (X' \Sigma^{-1} X)^- \lambda} &\sim t\Big( n-r(x) \Big) \end{align}\] $

denominator는 \(Var(\lambda ' \hat \beta_{GLS}) = \sigma^2 \ast \lambda ' (X' \Sigma^{-1} X)^- \lambda\).

Let \(\Sigma\) be nonsingular and \(\mathcal{C}(\Sigma X) \subset \mathcal{C}(X)\). Then least squares estimates are BLUEs.

  • Note: for diagonal \(\Sigma\), GLS is referred to as Weighted Least Squares (WLS).



  • Exercise 2.5.

Show that \(A\) is the perpendicular projection operator onto \(\mathcal{C}(X)\) when the inner product between two vectors \(\pmb x\) and \(\pmb y\) is defined as \((\pmb x, \pmb y)_\Sigma \equiv \pmb x' \Sigma^{-1} \pmb y\).