Analytically evaluating a limiting probability (June 2007 Ponder This)

In this post, I’ll describe my solution to June 2007’s Ponder This. I had felt that my solution was kind of nifty, and different from the one that was published. It had actually taken me several days to work the whole thing out.

Here’s part (2), the tougher part, of the problem: Values for a random variable are generated independently and uniformly over \left[0,1\right). By accumulating these values, your job is to reach a sum between n+x and n+1, where n is a positive integer, and 0\,\textless\,x\,\textless\,1. You can ask for successive values of this variate as many times as you wish, but you must add each such value to the running total you maintain. If the accumulating sum exceeds n+1, you loose; if you manage to get a sum between n+x and n+1, you win. What is the probability of winning as n\to\infty?

Earlier that year, I had finished studying Knuth’s Selected Papers on Discrete Mathematics. That is a precious collection for anyone interested in discrete structures, combinatorics, asymptotics, random structures etc. I had especially enjoyed one of the papers, “The first cycles in an evolving graph” (with Flajolet & Pittel as co-authors), not only for the neat results but also for the techniques used to get asymptotic values/bounds of certain integrals. That paper helped introduce me to, and drive home the importance of, the fascinating field I later came to know as Analytic combinatorics (also this). That paper analyzes the evolution of a random graph, where edges are successively added to a set of n vertices(under different random graph models). Among other results, it shows that expected length of the first cycle to appear is proportional to n^{1/6} and the size of the component having this cycle is \sim\sqrt{n} — quite fascinating stuff. At any rate, with these analytic techniques still fresh in my mind, I attacked the problem by trying to find a bound on the probability integral. Details follow (with a I-to-we switch).

To begin with, we use the classical Central Limit Theorem to replace the sum of m independently and uniformly distributed variates (each of which had mean 1\over{2} and variance 1\over{12}) with a Normal variate with mean m\over{2} and variance m\over{12}.

Now, the probability of “hitting” (i.e., achieving a sum lying within the interval) \left[n+x,n+1\right) in exactly m “moves” is \int_{n-1+x}^{n+x}P_a(m,y) P_b(y)\,dy, where P_a(m,y)\,dy is the probability of hitting \left[y, y+dy\right) in exactly m-1 moves and P_b(y) is the probability of jumping from \left[y, y+dy\right) to the desired interval \left[n+x, n+1\right) in the mth-move.

When y\in\left[n-1+x, n\right), P_b(y)=y-(n-1+x); and when y\in\left[n, n+x\right), P_b(y)=1-x. P_a(m,y) is the probability density function of the aforementioned normal variate; so

\displaystyle P_a(m,y) = e^{-{{6(y-{m\over{}2})^2}\over m}}\sqrt{6\over{m\pi}}.

Thus the probability of success in precisely m moves is

\displaystyle P(m)=\int_{n-1+x}^{n} P_a(m,y) (y-(n-1+x))\,dy + \int_n^{n+x} P_a(m,y) (1-x)\,dy

which simplifies to

\displaystyle (1-x)\int_{n-1+x}^{n+x} P_a(m,y)\,dy + \int_{n-1+x}^{n} P_a(m,y) (y-n)\,dy.

Since we’re interested in a success in any number of moves, we must evaulate \sum_{m=n+1}^{\infty}P(m), and finish by taking the desired limit n\to\infty.

To evalutate the sum over m, we replace the summation by integration (see caveat below) and substitute the variable of integration from m to 2n + r\sqrt{n} (resulting in an extra factor of \sqrt{n}). So our final integral turs out to:

\displaystyle \int\limits_{-\sqrt{n}}^{\infty} \sqrt{n}\Big((1 - x)\int\limits_{-(1-x)}^{x}P_a(2n+r\sqrt{n},n+t)\,dt + \int\limits_{-(1-x)}^{0}P_a(2n+r\sqrt{n},n+t)t\,dt\Big)\,dr

Expanding the integrand (of the first integral) in powers of 1\over n, the integrand becomes:

\displaystyle \sqrt{3\over{4\pi}}e^{-3r^2/4}(1-x^2) + O(e^{-3r^2/4}r^3n^{-1/2})

When we integrate over r: [-\sqrt{n},\infty) and take limit n\to\infty, all terms other than the first go away. As for the first term, it yields 1-x^2 since

\displaystyle \int_{-\sqrt{n}}^{\infty}\sqrt{3\over{4\pi}}e^{-3r^2/4}\,dr = {1\over{2}}(1+\mbox{erf}({1\over{2}}\sqrt{3n}))\to 1\quad \mbox{as}\,n\to\infty

(where \mbox{erf}() is the Error function).

Whew! That was heavy going. Let us catch our breath. Okay. So the answer to the original problem turns out to be 1-x^2. As a sanity check, it’s value turns out to be 0 at x=1 and 1 at x=0. Good.

Caveat: The above derivation is not rigorous (in fact, it is somewhat sloppy) for at least the following reasons:

  • Convergence of many integrals/summations is implictly assumed
  • The error term has not been calculated when changing a summation to an integration. Euler Summation Formula (also TAOCP should have been used to bound the error terms.
  • The Central Limit approximation has been used without being demonstrated that it is being applied only close to the peak of the distribution, and not far into the tails (unless justified)

A question: If you read the publisher solution on Ponder This, it says, “Values between 0 and 1 are equally likely but larger values are more likely to cause the sum to exceed n+x. So as n\to\infty the differential probability that x_k=t is 2t\,dt.” Can the reader explain the reasoning behind this differential probability?