Derivation of the Poisson Distribution

There are various ways to derive the Poisson distribution. Here we provide a derivation that is based on two high-level requirements that we like an arrival process to have, namely, that the number of arrivals in separate time intervals are independent, and that distribution properties of the arrival process are constant over time. In other words, if you believe that a certain arrival process is such that the number of arrivals in past minute does not provide you any information about what the next minute will bring, and that there is no reason to assume that the next minute will look roughly the same as the previous minute, then necessarily the number of arrivals in an interval is Poisson distributed.

The Assumptions

Consider an arrival process \(\{N(s,t], 0\leq s, \leq t\}\) such that \(N(s,t]\) is the number of arrivals that arrrive during an interval \((s,t]\). I consider the following two properties quite natural to assume/require:

  1. The number of arrivals in disjoint intervals are independent, i.e., the random variables \(N(s,t]\) and \(N(u,v]\) are independent if \(s\leq t < u \leq v\).
  2. The arrival process \(\{N(s,t]\}\) is time-homogenenous, i.e., \(\P\{N(u,u+s] = j\} = \P\{N(0,s] = j\}\) for all \(s\geq 0\) and \(j=0,1,2,\ldots\).

Do these two properties suffice to prove that \(N(0,t]\) is Poisson distributed? Lets see how far we can get.



\[f_n(t) = \P\{N(0,t] = n\}\]


\[\begin{split}f_n(t) &= \P\{N(0,t] = n\} \\ &= \sum_{j=0}^n \P\{N(0,s] = j, N(s,t] = n-j\} \\ &= \sum_{j=0}^n \P\{N(0,s] = j\} \P\{N(s,t] = n-j\} \text{ by independence}\\ &= \sum_{j=0}^n \P\{N(0,s] = j\} \P\{N(0,t-s] = n-j\} \text{ by time homogeneity}\\ &= \sum_{j=0}^n f_j(s) f_{n-j}(t-s) \text{ by the definition of } f_n\end{split}\]

Thus, we seek a function \(f_n\) that solves

(1)\[f_n(t) = \sum_{j=0}^n f_j(s) f_{n-j}(t-s)\]

for all \(s\in[0,t]\) and \(n=0,1,2,\ldots\). Of course \(\{f_n(t), t\geq 0, n=0,1,2,\ldots\}\) should satisfy some further simple and natural properties:

\[\sum_{n=0}^\infty f_n(t) = 1, \text{ for all } t\geq 0,\]

as the \(f_n(t)\) should correspond to probabilies. Moreover, no arrival can occur at time \(t=0\), hence

\[f_0(0) = 1\]

Solution method 1

One solution of (1) can be found by using the analogy:

(2)\[t^n = (s + (t-s) )^n = \sum_{j=0}^n {n \choose j} s^j (t-s)^{n-j}\]

Deviding both sides by \(n!\) leads to

\[\frac{t^n}{n!} = \sum_{j=0}^n \frac{s^j}{j!}\frac{(t-s)^{n-j}}{(n-j)!}\]

Cleary, then, take \(f_n(t) \sim t^n/n!\). A straightforward normalization then fixes \(f_n\) to be

\[\P\{N(0,t] = n \} = f_n(t) = e^{-t} \frac{t^n}{n!}\]

Rescaling the time axis enables us to consider other arrival rates than 1, so that in general

\[\P\{N(0,t] = n \} = f_n(t) = e^{-\lambda t} \frac{(\lambda t)^n}{n!}\]

It remains to prove that the solution of (2) is unique, thereby proving that the only arrival process that satisfies our above assumptions is the Poisson distribution.

Solution method 2

Considering \(n=0\) in (1) leads to

\[f_0(t) = f_0(t-s)f_0(s).\]

In particular we can take \(s = t/n\), so that

\[f_0(t) = f_0(t-t/n) f_0(t/n) = f_0(t-2t/n) (f_0(t/n))^2\]

and so on. Completing the recursions leads to

\[f_0(t) = \left(f_0\left(\frac t n\right)\right)^n.\]

This must hold for any \(n\). Moreover, as \(f_0(0) = 1\) and assuming that \(f_0(x) = 1 - \lambda x + o(x)\) for small \(x\), we find that

\[f_0(t) = (1 - \lambda \frac t n + o(\frac t n ))^n.\]

In the limit of \(n\to \infty\) we see that

\[f_0(t) = e^{-\lambda t}\]

for some \(\lambda > 0\).

Interestingly, this implies that

\[\P\{T > t\} = \P\{N(t) = 0\} = f_0(t) = e^{-\lambda t},\]

i.e., the interarrival times are exponentially distributed.

Next, take \(n=1\). Then (1) becomes

\[\begin{split}f_1(t) &= f_0(t-s)f_1(s) + f_1(t-s)f_0(s) \\ &= e^{-\lambda(t-s)} f_1(s) + f_1(t-s)e^{-\lambda s}.\end{split}\]

Now define \(g(t) = f_1(t) e^{\lambda t}\), then multiply the above with \(e^{\lambda t}\) so that this becomes

\[g(t) = g(s) + g(t-s).\]

There is just one functional form that satisfies this equation: \(g(t) = \alpha t\) for some \(\alpha\). From this we conclude that

\[f_1(t) = e^{-\lambda t} g_(t) = e^{-\lambda t} \alpha t\]

To be continued for \(n \geq 2\).

Open Issues

  1. In method 1: why is it allowed to assume that the solution of (1) has a unique solution? Are there further boundary conditions required? Where is a proof that (1) has a unique solution?
  2. In method 2: why is the assumption that \(f_0(x) = 1 - \lambda x + o(x)\) for small \(x\) valid? What properties of \(f_0\), or in general \(f_n\) should we require to make this assumption valid?