We say a distribution $$H$$ over $$\bits^n$$ has density $$\delta$$ if $$\Pr_H(x)\le 1/(\delta 2^{n})$$ for all $$x\in\bits^n$$. (The set of distribution with density $$\delta$$ turns out to be the convex hull of distributions that are uniform over a subset $$H'$$ with $$|H'|\ge \delta 2^n$$.)

A priori, if a function is average-case hard, it could be because every circuit fails to compute $$f$$ on a different fraction of inputs. The hardcore lemma rules out this possibility. If a function is average-case hard, then every circuit fails essentially on the same fraction of inputs.

Theorem: Let $$f\from \bits^n\to\sbits$$ be a mildly average-case hard function, i.e., $$\E_{U_n} f\cdot C\le 1-\delta$$ for every circuit $$C\from \bits^n\to \sbits$$ with $${\mathrm{size}}(C)\le S$$.

Then, there exists a distribution $$H$$ with density $$\Omega(\delta)$$ such that $$\E_{H} f\cdot C\le \e$$ for every circuit $$C\from \bits^n\to \sbits$$ with $${\mathrm{size}}(C)\le S':= S\cdot O(\e^2/\log(1/\delta))$$.

Proof:

The proof is by linear programming duality and uses that the set of functions with small circuit complexity is approximately convex, in the sense that a distribution over circuits can be approximated by a single circuit (of slightly larger size).

We will prove the contrapositive of the theorem statement: Suppose that for every density-$$\delta$$ distribution, there is an $$S'$$-sized circuit that computes $$f$$ with advantage $$\e$$. Then there exists a circuit of size $$S=S'\cdot n/\e^2$$ that computes $$f$$ on a $$1-O(\delta)$$ fraction of the inputs.

Linear programming duality1 allows us to exchange quantifiers of our assumption if we pass to distributions over circuits: There exists a distribution $$D$$ over $$S'$$-sized circuits $$C$$ such that for all density-$$\delta$$ distributions $$H$$, $$D$$ “computes” $$f$$ with advantage $$\e$$. Hence, $\min_{H\subseteq \bits^n,~|H|\ge \delta 2^{n}} \E_H f \cdot \E_{C \sim D} C \ge \e$ What does this condition mean? Let us say that $$D$$ “computes” $$f$$ on an input $$x\in\bits^n$$ if $$f(x)\cdot \E_{C\sim D}C(x)\ge \e$$. On how many inputs can $$D$$ fail to compute $$f$$? We can see that the condition above implies $$D$$ can fail on at most $$\delta 2^n$$ inputs. (Otherwise, there would be a set $$H$$ violating the condition.) Thus $$D$$ computes $$f$$ on all but a $$\delta$$ fraction of the inputs.

How can we go from a distribution over circuits to a single circuit? (Note that this distribution could in principle be over an exponential number of circuits.) The idea is to use sample circuits $$C_1,\ldots,C_r$$ from the distribution and use their outputs to estimate $$f(x)\cdot \E_{C\sim D}C(x)$$. By the Chernoff bound, every input $$x\in\bits^n$$ with $$f(x)\E_{C\sim D}C(x)\ge \e$$ satisfies $\Pr_{C_1,\ldots,C_r}\Big\{{\mathrm{maj}}(C_1(x),\ldots,C_r(x))\neq f(x)\Big\} \le 2^{-\Omega(\e^2 r)}.$ Hence, for $$r=O(\log(1/\delta)/\e^2)$$, the failure probability due to sampling is at most $$\delta$$. In particular, there exists a choice for $$C_1,\ldots,C_r$$ such that $${\mathrm{maj}}(C_1,\ldots,C_r)$$ computes $$f$$ on all but a $$\delta$$ of the inputs that $$D$$ computes $$f$$ on. Therefore, $${\mathrm{maj}}(C_1,\ldots,C_r)$$ is the desired circuit (of size $$S=S'\cdot O(r)$$).

1. Formally, we define a payoff matrix $$M_{H,C}=\E_{H} f\cdot C$$ with rows indexed by distributions $$H$$ over $$\bits^n$$ of density $$\delta$$ and columns indexed by circuits $$C\from\bits^n\to\sbits$$ with $${\mathrm{size}}(S)\le S'$$. Our assumption implies that $$\min_{p}\max_C (p^T M)_C\ge \e$$ (where $$p$$ is a distribution over density-$$\delta$$ distributions, which is again a density-$$\delta$$ distribution). Hence, by linear programming duality, we also have $$\min_{H} (Mq)_H\ge \e$$ for some distribution $$q$$ over $$S'$$-sized circuits $$C$$.