mAiSight

Setup

Let X1,,XnX_1, \dots, X_n be a sample from a distribution indexed by an unknown parameter θΘ\theta \in \Theta. Write PθP_\theta for the probability measure under parameter θ\theta and EθE_\theta for the corresponding expectation.

An estimator \htmlClass(X1,,Xn)\htmlClass{maisight-sym-T}{T}(X_1, \dots, X_n) is a measurable function of the sample. It is an unbiased estimator of \htmlClass\htmlClass{maisight-sym-theta}{\theta} when

E_{\htmlClass{maisight-sym-theta}{\theta}}[\htmlClass{maisight-sym-T}{T}] = \htmlClass{maisight-sym-theta}{\theta} \quad \text{for every } \htmlClass{maisight-sym-theta}{\theta} \in \Theta. \tag{1}

where \htmlClass\htmlClass{maisight-sym-T}{T} denotes the estimator, \htmlClass\htmlClass{maisight-sym-theta}{\theta} denotes the unknown parameter, and the expectation E\htmlClassE_{\htmlClass{maisight-sym-theta}{\theta}} is taken under the law P\htmlClassP_{\htmlClass{maisight-sym-theta}{\theta}}.

A statistic S=S(X1,,Xn)S = S(X_1, \dots, X_n) is a sufficient statistic for θ\theta when the conditional distribution of the sample given SS does not depend on θ\theta. Equivalently:

P_\theta(X_1, \dots, X_n \mid S = s) = P(X_1, \dots, X_n \mid S = s) \quad \text{for all } \theta, s. \tag{2}

Here SS denotes the sufficient statistic, ss ranges over its support, and the independence of the right-hand side from θ\theta is the defining property.

The theorem

Rao-Blackwell. Let TT be an unbiased estimator of θ\theta with finite variance, and let SS be a sufficient statistic for θ\theta. Define

\htmlClass{maisight-sym-Tstar}{T^*} := E[\htmlClass{maisight-sym-T}{T} \mid \htmlClass{maisight-sym-S}{S}]. \tag{3}

where \htmlClass\htmlClass{maisight-sym-Tstar}{T^*} denotes the Rao-Blackwellised estimator obtained by conditioning \htmlClass\htmlClass{maisight-sym-T}{T} on \htmlClass\htmlClass{maisight-sym-S}{S}. Then \htmlClass\htmlClass{maisight-sym-Tstar}{T^*} is an unbiased estimator of \htmlClass\htmlClass{maisight-sym-theta}{\theta}, and

\mathrm{Var}_{\htmlClass{maisight-sym-theta}{\theta}}(\htmlClass{maisight-sym-Tstar}{T^*}) \leq \mathrm{Var}_{\htmlClass{maisight-sym-theta}{\theta}}(\htmlClass{maisight-sym-T}{T}) \quad \text{for every } \htmlClass{maisight-sym-theta}{\theta}, \tag{4}

with equality iff \htmlClass\htmlClass{maisight-sym-T}{T} is already a function of \htmlClass\htmlClass{maisight-sym-S}{S} (up to a P\htmlClassP_{\htmlClass{maisight-sym-theta}{\theta}}-null set).

Proof sketch

Two ingredients suffice.

Unbiasedness. By the tower property of conditional expectation,

E_\theta[T^*] = E_\theta[E[T \mid S]] = E_\theta[T] = \theta. \tag{5}

where the inner expectation E[TS]E[T \mid S] is the conditional expectation of TT given SS, and the outer expectation is taken under PθP_\theta.

Variance. Apply the variance decomposition with G=σ(S)G = \sigma(S):

\mathrm{Var}_\theta(T) = E_\theta[\mathrm{Var}(T \mid S)] + \mathrm{Var}_\theta(E[T \mid S]). \tag{6}

Here σ(S)\sigma(S) denotes the σ-algebra generated by SS, Var(TS)\mathrm{Var}(T \mid S) denotes the conditional variance of TT given SS, and the two right-hand terms are both non-negative.

Substituting T=E[TS]T^* = E[T \mid S] from (3)(3):

\mathrm{Var}_\theta(T) = E_\theta[\mathrm{Var}(T \mid S)] + \mathrm{Var}_\theta(T^*). \tag{7}

Since Eθ[Var(TS)]0E_\theta[\mathrm{Var}(T \mid S)] \geq 0, we conclude Varθ(T)Varθ(T)\mathrm{Var}_\theta(T^*) \leq \mathrm{Var}_\theta(T), with equality iff Var(TS)=0\mathrm{Var}(T \mid S) = 0 almost surely — i.e. iff TT is determined by SS.

The same chain holds for any convex loss LL via Jensen's inequality:

E_\theta\!\left[L(T^*, \theta)\right] \leq E_\theta\!\left[L(T, \theta)\right]. \tag{8}

where LL denotes a convex loss function and the variance case in (4)(4) is the special case L(t,θ)=(tθ)2L(t, \theta) = (t - \theta)^2.

Worked example

Let X1,,XnX_1, \dots, X_n be i.i.d. Poisson with mean λ\lambda, and let τ(λ)=eλ=Pλ(X1=0)\tau(\lambda) = e^{-\lambda} = P_\lambda(X_1 = 0).

A trivial unbiased estimator:

T = \mathbf{1}\{X_1 = 0\}. \tag{9}

where 1{}\mathbf{1}\{\cdot\} denotes the indicator. Then Eλ[T]=Pλ(X1=0)=eλ=τ(λ)E_\lambda[T] = P_\lambda(X_1 = 0) = e^{-\lambda} = \tau(\lambda), so TT is unbiased for τ(λ)\tau(\lambda).

The sample sum S=X1++XnS = X_1 + \dots + X_n is sufficient for λ\lambda. Given S=sS = s, the conditional distribution of (X1,,Xn)(X_1, \dots, X_n) is Multinomial(s;1/n,,1/n)\mathrm{Multinomial}(s; 1/n, \dots, 1/n) — independent of λ\lambda. Therefore:

T^* = E[T \mid S] = P(X_1 = 0 \mid S) = \left(1 - \tfrac{1}{n}\right)^S. \tag{10}

where TT^* is the Rao-Blackwellised estimator. By (4)(4), Varλ(T)Varλ(T)\mathrm{Var}_\lambda(T^*) \leq \mathrm{Var}_\lambda(T), with strict inequality for n>1n > 1 since TT is not a function of SS alone.

Why this matters

The recipe is constructive: take any unbiased estimator TT, project onto the σ-algebra of a sufficient statistic SS, and you get TT^* with no greater variance. Combined with completeness of SS (Lehmann-Scheffé, 1950), this gives the minimum-variance unbiased estimator uniquely — there is essentially one Rao-Blackwellisation up to almost-sure equality, and it dominates every other unbiased estimator simultaneously.

References

C. R. Rao, "Information and accuracy attainable in the estimation of statistical parameters," Bulletin of the Calcutta Mathematical Society, 37:81-91, 1945.

D. Blackwell, "Conditional expectation and unbiased sequential estimation," Annals of Mathematical Statistics, 18(1):105-110, 1947.

E. L. Lehmann and H. Scheffé, "Completeness, similar regions, and unbiased estimation — Part I," Sankhyā, 10:305-340, 1950.