Perturbation Analysis and Machine Learning – So, in the previous article, we saw how much progress has been done by formalizing the concepts of causality and chance in mathematics, In particular with logic probability and statistics. Another revolution takes these ideas together and takes probability and statistics to their limits. How informative an experiment may be to determine the strength of a cost depends on many parameters, such as whether there are control experiments to discard or rule out certain causes as we have seen before.
But also, how good are the measurements and how well placed is an observer to take them both at the input of a system and at the output of the system, and how distinguishable are individual events? We have seen how correlation is central to statistics, but also how limited they can be. One way to make the most out of statistics is by performing systematic perturbations according to a probabilistic calculus introduced by people such as Judea Pearl and other of his colleagues.
To find possible causes that can easily be ruled out after further inspection, these ideas are today at the forefront in the practice of probabilistic cause of discovery. Similar to the experiments with time series that we have mentioned in previous segments, we can ask whether two time series coming from different observations are causally connected to each other.
Now, traditional statistics would typically suggest that the behavior of two time series, let’s call them x and z, are constantly connected because they are correlated. But there are several other cases not distinguishable after a simple correlation test. One first possibility is that the time series are simply showing similar behavior but are not causally connected whatsoever, despite the possible result of a correlation test.
Another possibility is that they are causally connected but correlation does not tell us whether it is x affecting z or the other way around. And one other possibility is that both have some common cause upstream in the causal path and at x and z are therefore not directly causally connected but connected through a third cause y that is concealed from the observer. So how to test all or some of these hypotheses?
One way to test this hypothesis is to perform a perturbation to one time series and see how the perturbation spreads to the other time series. Let’s perturb a data point in the time series z. Let’s say that we multiply by -2 to the data point in position 5. We can see that nothing happened to the values of time series x. It looks exactly the same as before. So if we perturb the values in the time series z at least for this data point, we can see that x remains the same.
This suggests that there is no causal influence of z on x. However, if the perturbation is applied to a value of x, z changes and follows the direction of the new value, suggesting that the perturbation to x has a causal influence on z. Behind the curtains I can tell you that z is actually the moving average of x. That means that each value of z takes two values of x to calculate its value. So it is a function of x.
The results of these perturbations produce evidence in favor of a causal relationship between these processes, if we did not know that they were related by the function I just described in the first place. This suggests that it is x who causally proceeded z. So we can say that this single perturbation suggests the following causal relationship: x causes z. It is important to consider that performing interventions is not always simple or possible.
Think of drug development as a way to perturb a biological system. Some drugs may have consequences, so interventions in the way of prescriptions are highly regulated so one cannot easily do experiments with diseases on humans. But assuming that one may perform such intervention, in cases where an intervention does not to a change, it produces evidence against causal connection between the events.
Moreover, the argument behind is that after an intervention, correlation may or may not occur and thus one still relies on classical statistics to make the final calls. So while perturbation analysis does help to rule out some cases, it still inherits the pitfalls of statistics and correlation analysis for the simple reason that we have not yet changed the tools. We only perform more experiments to the data.
Notice that as we said before there are three possible types of causal relationships that can be represented in what is known as a directed, acyclic graph. That is a graph that has arrows implying a cause-and-effect relationship between events, but has no loops because a loop would make a cause to be the cause itself, or an effect will be also its cause, something that is not allowed because would break causality.
But we will cover all this graph jargon in the next module. In these graphs, nodes are events and events are linked between each other if there’s a direct cause-and-effect. In the first case, labeled a in orange color, the event X is the cause of event Y and Y is the cause of event Z, but X is said to be an indirect cause of Z. In general we are of course always more interested in direct causes because almost anything can be an indirect cause of anything else.
In the second case B, an event Y is direct cause of both Z and X. And finally in case C, the event Y has two causes: X and Z. With an intervention calculus, such as the one performed in the time series before, one may rule out some cases but not all of them at the same time. But more important, the perturbation analysis does offer the means to start constructing models explaining a system and the data produced by the system rather than describing it simply in terms of correlations.
In our approach to causality, we incorporate these ideas of an interventionist calculus and perturbation analysis in what we call algorithmic information dynamics and we replaced traditional probability theory and classical statistics and correlation by a full model-driven approach that is fed by data, but is not data-driven only.
Tthe idea is to systematically produce a most likely generating model that produces the observed behavior of a system. In the case of our two time series experiments, the time series X is produced by the mathematical function sin(x) and so sin(x) is the generating mechanism of time series X. On the other hand, the generating mechanism of Z is the function moving average of f(x) where f(x) is the sin(x), and clearly the moving average of sin(x) depends on the time series X which is sin(x).
But it is not the other way around. So how could we have guessed these two functions that statistics alone cannot systematically attempt to find? In other words, we need some sort of method to infer the function behind the data. Having found the moving average and then sin(x), we could have not only established the actual causal relationship, but we could also produce the observation and any number of other data points in the future with perfect accuracy.
The perturbation analysis and graphical approach to causality is the basis of what is called Bayesian networks, involving the calculation of conditional probability distributions of the form: “What happens with some variable representing a possible event if this other variable representing this other event is observed or perturbed?” where these ideas on an interventionist calculus are fundamental.
Total creator. General coffe buff. Award-winning internet trailblazer. Devoted tv practitioner. Gamer. Communicator. Travel fan. AI and machine learning are everyday!