Setup

Let $X_1, \dots, X_n$ be a sample from a distribution indexed by an unknown parameter $\theta \in \Theta$ . Write $P_\theta$ for the probability measure under parameter $\theta$ and $E_\theta$ for the corresponding expectation.

An estimator $\htmlClass{maisight-sym-T}{T}(X_1, \dots, X_n)$ is a measurable function of the sample. It is an unbiased estimator of $\htmlClass{maisight-sym-theta}{\theta}$ when

E_{\htmlClass{maisight-sym-theta}{\theta}}[\htmlClass{maisight-sym-T}{T}] = \htmlClass{maisight-sym-theta}{\theta} \quad \text{for every } \htmlClass{maisight-sym-theta}{\theta} \in \Theta. \tag{1}

where $\htmlClass{maisight-sym-T}{T}$ denotes the estimator, $\htmlClass{maisight-sym-theta}{\theta}$ denotes the unknown parameter, and the expectation $E_{\htmlClass{maisight-sym-theta}{\theta}}$ is taken under the law $P_{\htmlClass{maisight-sym-theta}{\theta}}$ .

A statistic $S = S(X_1, \dots, X_n)$ is a sufficient statistic for $\theta$ when the conditional distribution of the sample given $S$ does not depend on $\theta$ . Equivalently:

P_\theta(X_1, \dots, X_n \mid S = s) = P(X_1, \dots, X_n \mid S = s) \quad \text{for all } \theta, s. \tag{2}

Here $S$ denotes the sufficient statistic, $s$ ranges over its support, and the independence of the right-hand side from $\theta$ is the defining property.

The theorem

Rao-Blackwell. Let $T$ be an unbiased estimator of $\theta$ with finite variance, and let $S$ be a sufficient statistic for $\theta$ . Define
$\htmlClass{maisight-sym-Tstar}{T^*} := E[\htmlClass{maisight-sym-T}{T} \mid \htmlClass{maisight-sym-S}{S}]. \tag{3}$
where $\htmlClass{maisight-sym-Tstar}{T^*}$ denotes the Rao-Blackwellised estimator obtained by conditioning $\htmlClass{maisight-sym-T}{T}$ on $\htmlClass{maisight-sym-S}{S}$ . Then $\htmlClass{maisight-sym-Tstar}{T^*}$ is an unbiased estimator of $\htmlClass{maisight-sym-theta}{\theta}$ , and
$\mathrm{Var}_{\htmlClass{maisight-sym-theta}{\theta}}(\htmlClass{maisight-sym-Tstar}{T^*}) \leq \mathrm{Var}_{\htmlClass{maisight-sym-theta}{\theta}}(\htmlClass{maisight-sym-T}{T}) \quad \text{for every } \htmlClass{maisight-sym-theta}{\theta}, \tag{4}$
with equality iff $\htmlClass{maisight-sym-T}{T}$ is already a function of $\htmlClass{maisight-sym-S}{S}$ (up to a $P_{\htmlClass{maisight-sym-theta}{\theta}}$ -null set).

Proof sketch

Two ingredients suffice.

Unbiasedness. By the tower property of conditional expectation,

E_\theta[T^*] = E_\theta[E[T \mid S]] = E_\theta[T] = \theta. \tag{5}

where the inner expectation $E[T \mid S]$ is the conditional expectation of $T$ given $S$ , and the outer expectation is taken under $P_\theta$ .

Variance. Apply the variance decomposition with $G = \sigma(S)$ :

\mathrm{Var}_\theta(T) = E_\theta[\mathrm{Var}(T \mid S)] + \mathrm{Var}_\theta(E[T \mid S]). \tag{6}

Here $\sigma(S)$ denotes the σ-algebra generated by $S$ , $\mathrm{Var}(T \mid S)$ denotes the conditional variance of $T$ given $S$ , and the two right-hand terms are both non-negative.

Substituting $T^* = E[T \mid S]$ from $(3)$ :

\mathrm{Var}_\theta(T) = E_\theta[\mathrm{Var}(T \mid S)] + \mathrm{Var}_\theta(T^*). \tag{7}

Since $E_\theta[\mathrm{Var}(T \mid S)] \geq 0$ , we conclude $\mathrm{Var}_\theta(T^*) \leq \mathrm{Var}_\theta(T)$ , with equality iff $\mathrm{Var}(T \mid S) = 0$ almost surely — i.e. iff $T$ is determined by $S$ .

The same chain holds for any convex loss $L$ via Jensen's inequality:

E_\theta\!\left[L(T^*, \theta)\right] \leq E_\theta\!\left[L(T, \theta)\right]. \tag{8}

where $L$ denotes a convex loss function and the variance case in $(4)$ is the special case $L(t, \theta) = (t - \theta)^2$ .

Worked example

Let $X_1, \dots, X_n$ be i.i.d. Poisson with mean $\lambda$ , and let $\tau(\lambda) = e^{-\lambda} = P_\lambda(X_1 = 0)$ .

A trivial unbiased estimator:

T = \mathbf{1}\{X_1 = 0\}. \tag{9}

where $\mathbf{1}\{\cdot\}$ denotes the indicator. Then $E_\lambda[T] = P_\lambda(X_1 = 0) = e^{-\lambda} = \tau(\lambda)$ , so $T$ is unbiased for $\tau(\lambda)$ .

The sample sum $S = X_1 + \dots + X_n$ is sufficient for $\lambda$ . Given $S = s$ , the conditional distribution of $(X_1, \dots, X_n)$ is $\mathrm{Multinomial}(s; 1/n, \dots, 1/n)$ — independent of $\lambda$ . Therefore:

T^* = E[T \mid S] = P(X_1 = 0 \mid S) = \left(1 - \tfrac{1}{n}\right)^S. \tag{10}

where $T^*$ is the Rao-Blackwellised estimator. By $(4)$ , $\mathrm{Var}_\lambda(T^*) \leq \mathrm{Var}_\lambda(T)$ , with strict inequality for $n > 1$ since $T$ is not a function of $S$ alone.

Why this matters

The recipe is constructive: take any unbiased estimator $T$ , project onto the σ-algebra of a sufficient statistic $S$ , and you get $T^*$ with no greater variance. Combined with completeness of $S$ (Lehmann-Scheffé, 1950), this gives the minimum-variance unbiased estimator uniquely — there is essentially one Rao-Blackwellisation up to almost-sure equality, and it dominates every other unbiased estimator simultaneously.

References

C. R. Rao, "Information and accuracy attainable in the estimation of statistical parameters," Bulletin of the Calcutta Mathematical Society, 37:81-91, 1945.

D. Blackwell, "Conditional expectation and unbiased sequential estimation," Annals of Mathematical Statistics, 18(1):105-110, 1947.

E. L. Lehmann and H. Scheffé, "Completeness, similar regions, and unbiased estimation — Part I," Sankhyā, 10:305-340, 1950.