“If causation is not correlation, what is it?”
How many times have you repeated the mantra “correlation does not mean causation”? I did, plenty of times, and for good reason. Any attempt to interpret or explain a machine learning model should keep in mind that two variables happening together does not mean that one causes the other. There could be a spurious correlation in the sample of data available, or a third, unobservable variable, that causes both.
How many times have your asked yourself “then what is causation?”. This question hasn’t got much attention by the statistical community, which is unfortunate because it has kept Science away from a language to talk about cause and effect. And this is how Judea Pearl starts his fantastic book, The Book of Why – The New Science of Cause and Effect.
One of the most prominent researchers in the history of AI, Pearl is best known for the development of Bayesian Networks in the 1980s, still widely used today. They allow to express relationships between variables in terms of probability distributions, and perform inferences from observed data to other variables. It is a model that combines data and graphical representation, in the form of a graph of variables. A connection between two variables expresses that they’re not independent, i.e. arrows are just a description of the conditional probability tables that you have, they don’t necessarily have causal meaning.
Introducing the do-calculus
Using the terms of the book, Bayesian Networks fall in the first level of the ladder of causation, because they only involve associations, i.e. changes in probabilities given certain variables are observed (conditional probabilities). In order to express causation, we need something else than probabilities. And there comes Pearl’s creation: the do-calculus. What is the probability of X given that we do Y take some value? This is called intervention, and it means that we are manually setting the value of a variable. It is different to observing the variable in the data, because in the latter case there could be other factors the values observed. In other words, we can ask about the effect of a variable on another, all else equal.
Pearl summarizes the fundamentals of do-calculus in the book, entering in the formulas superficially but achieving to give a rigorous description. He also presents other techniques for interventions, some of which have been used in the Social Sciences independently of the causal revolution that was being started in Pearl’s labs. In that sense, the main contribution is to present the full picture, where these other techniques can be reduced to specific cases of the do-calculus.
Given you’ve been rejected, if you know you’re not ugly, chances increase that you’re a jerk
A different story is his criticism that researchers have been intervening on virtually all covariates they could, in order to avoid confounders that bias their analysis. This approach is wrong, he shows, when the variable in question is a collider between the variables being analysed. Variable Z is a collider of X and Y if both X and Y are causes of Z. If you intervene on all covariates and don’t care about the underlying causal structure, you could be opening a path of information that interferes with your analysis, and which would’ve remained closed without intervention. Paraphrasing one of the funniest examples of the book, possible partners are rejected (Z) because they are jerks (X) and/or because they are ugly (Y). Given you’ve been rejected, if you know you’re not ugly, chances increase that you’re a jerk. Until your partner makes a decision, ugliness and jerkiness don’t affect each other. This is how observing variables can break conditional independencies.
The latter is one of many examples of causal reasoning proposed in the book. You can see it is a very natural way of thinking. Actually it is more human-like than probabilistic reasoning, with which our brains struggle even if we took a course in Advanced Statistics. For this reason, and because asking “what if…” questions is a crucial part of learning and intelligence, causal reasoning is considered a necessary step towards Artificial General Intelligence.
A reflexion on prediction and explanation
The new era of AI is dominated by prediction. We achieved great levels of prediction in certain domains because we have lots of data and good algorithms that speak the language of Statistics, and that language is correlations. Correlations are good for prediction, because the algorithm can absorbe as much information as possible that puts it closer to the objective function.
Explanation, on the other hand, is relegated as a post-process of a predictive algorithm. This makes sense as we can only write meaningful interpretations from models that solve a task with accuracy. The more prediction error, the wronger the conclusions. On the other hand, this limits the extent of interpretability to the conclusions that can be extracted from a predictive model based on correlations between data. We won’t be able to draw cause-and-effect conclusions unless we use algorithms that ask the why questions.