The Statistician’s Guide to the Galaxy: Revisiting the Drake Equation

Drake Equation, adapted from SETI Institute.

As Douglas Adams once quipped, “Space is big. I mean, you may think it’s a long way down the road to the chemist’s, but that’s just peanuts to space.” Indeed, our Milky Way alone is ~200,000 light years in diameter, and contains at least 100 billion planets.

Which makes one wonder: Could any of these planets support intelligent life? And if so, how many contactable alien civilisations are out there?

In 1961, radio astronomer Frank Drake pondered over the same questions, and formulated the famous Drake Equation in response.

The equation multiplies several factors believed to be instrumental to the development of technological civilisations. But there’s a flaw in it. Because while some factors are calculable like R* and fp, others, like fl and fi, are seemingly up to conjecture. After all, how can we empirically estimate the probability of life arising on Earth-like planets (abiogenesis)—and what’s more, intelligent life—with a sample size of one success*?

And that’s where statistics comes in. Specifically, one of its most popular inference methods: Bayesian analysis. How does it work?

Let’s say that we want to find the odds of life arising on a given planet. First, we’ll create a model incorporating terms of interest that constrain our estimate: the minimum time for life to emerge after a world’s formation, the maximum time after which life cannot emerge, etc..

Then, we’ll collate existing information about abiogenesis on Earth: It happened around 3.8 Bya on a 4.5 Gyr old planet, orbiting a main sequence star. We use this to make a likelihood distribution of abiogenesis.

Next, we’ll construct several best-guess probability distributions based on our assumptions about abiogenesis, using various values for our terms of interest. These are known as prior distributions, and represent alternate scenarios we wish to simulate.

We combine our likelihood distribution with our prior distributions, et violá! We get a posterior distribution for each prior distribution, and therefore each scenario. Posteriors are powerful; they overcome the pitfalls of relying wholly on either shaky beliefs or scant evidence.

The above is precisely what researchers David Spiegel and Edwin Turner did in 2012. And it was further refined in 2020, when astronomer David Kipping sought to estimate the odds of life arising and the emergence of intelligence simultaneously. He chose to use the maximally uninformative Jeffreys prior, placing greater weight on evidence (such as fossil records) and producing a single posterior distribution. The results indicated 3:2 betting odds of intelligence being rare in our galaxy… but also 9:1 odds of life being common.

Sounds promising, doesn’t it? However, as the researchers themselves acknowledge, despite the scientific rigour that Bayesian analysis lends, it can only do so much. These figures are still infused with uncertainty, and that won’t change until we get a second or third data point*—in other words, until we find extra-terrestrial life.

On the bright side, if and when we do, we can be sure that Bayesian analysis will be there to help us put a number to fl and fi, once and for all.

*Earth is the only example we know of!