OK, so my talk is called “Please Stop Doing ‘Explainable’ Machine Learning.” All right, now why am I talking about that? It’s because bad stuff is happening now. There are bad bail and parole decisions being made because people typed the wrong number into a black box model.
And it’s letting dangerous people go free, and it’s keeping people in prison who don’t deserve to be there. We’re doing a lot of bad medical diagnosis and screening. Because for example, the algorithms are dependent on factors that are not allowed to be there, like words within an X-ray image.
It’s depending on those, rather than the actual X-ray. We are making bad loans and credit decisions based on faulty information. And I claim that explainable machine learning actually perpetuates this problem.
OK, so why is that? Well, I’m going to make a distinction between explainable machine learning, and interpretable machine learning. Explainable machine learning is when you use a black box, and you explain it afterwards. So that’s a post-hoc analysis. Whereas interpretable machine learning is when you use a model that is not black box in the first place.
Now, interpretable machine learning is not simply the cart algorithm. If you think we haven’t done anything since 1984, you would be wrong. Now, variable importance plots, or partial dependence plots, are not interpretable machine learning. A lot of people say that all they need is their partial dependence plots, but that’s explainable and not interpretable.
Can you imagine someone bringing out some kind of partial dependence plot in a courtroom that determines someone’s fate? So I’m going to make– and I want to clarify that the context of this talk is on high stakes decisions and when you actually need to troubleshoot your model.
Those are the main two times that you really need a model that’s either explainable or interpretable it’s when you actually care about the decision you’re making, or you’re trying to troubleshoot your model. This is not for things like computational advertising, and Google Search, and stuff like that.
OK, so I want to make two arguments on why we should stop working on explainable machine learning. Now the first one is that the accuracy interpretability trade off, is as far as I can tell, a myth. So I kept seeing this plot all over the place. This particular version of the plot is from the DARPA explainable AI BAA.
Now, I don’t know why they keep producing this plot. Because there is no application that corresponds to this plot. What application is this? And what are the axes on this plot? And are these points for like different machine learning models? Which models do they try? And then is this for a static data set where they tried a whole bunch of different machine learning methods? This plot– it’s mathiness. It doesn’t mean anything.
Now if this plot is for a static data set, I would argue that there’s no high stakes decision, or case where you’re troubleshooting your model, where you have a static data set.
Data science, or knowledge discovery as it kind of used to be called, it’s an iterative process. So you’re supposed to take– this is the KDD process from 1996. It’s identical to the CCC big data white paper, the process that was rediscovered there.
But you’re supposed to process the data, do data mining, machine learning, and then interpret or evaluate your model. And then go back and change the processing and improve the whole system. And so I’m arguing that in fact, if you have a better understanding, and you used a model that was interpretable, that would actually lead to overall better performance, not worse.
So we saw this firsthand when I was working on maintaining the New York City power grid with machine learning. And we tried a whole bunch of different machine learning methods on the data we had, and it amounted to, at most, a 1% change in performance between the different algorithms. But then when we started using interpretable models, we could actually figure out what was wrong with our data, and what was wrong with our model, and go back and change things.
And make huge leaps in performance after we started doing that. So I claim that this figure is nonexistent, and it’s a myth. OK, so I’ll make an even stronger argument, which is that even if you did have a static data set, and you tried a whole bunch of different machine learning algorithms on it, then on many data science problems, with good data representations, all machine learning methods perform approximately the same.
And so when I say good data representations, I mean data that naturally comes with representation. So I’m excluding things like computer vision, sometimes serious problems, some genomics problems, and some natural language processing problems.
And that’s why machine learning kind of focuses on those problems, because the data representation needs to be constructed. On the rest of the problems, all the algorithms perform approximately the same. So this figure, as far as I can tell for most data science problems, doesn’t exist.
Now, I will say that I’ve worked on a lot of data science problems, and I’ve always been able to produce an interpretable model that is just as accurate as the best black box that we could find. And I worked in a lot of medical problems, criminological problems. I’ve worked on meeting analysis. And lately, we’ve been working on computer vision.
And even there, we can actually get somewhere with some level of interpretability. It’s not the same as in the other problems, but you can at least– you don’t really truly need a fully complete black box. OK, so my second argument is that explainable machine learning is fundamentally flawed because it forces you to rely on two models instead of one.
You have the original black box, and then you have the model for the explanation. Now, those models necessarily disagree with each other. Because if they didn’t, then you could throw out the black box altogether, and just use the explanation model. So they have to disagree with each other.
So let’s say that you have an explanation model that’s right 80% of the time. Now, usually people say that’s pretty good. But the fact is that model is wrong 20% of the time. So what it means is that the explanations you’re getting fairly often are wrong, which means you can’t trust the explanations, which means you can’t trust the black box.
And you are back to where you started. Also, the explanation, sometimes they’re not consistent. So for instance, the black box could say it denied my credit, because I didn’t have a very long credit history.
But maybe I could point to somebody over there who had exactly the same length of credit history as me, and that guy did get the loan, or something like that. So the explanations can be inconsistent. And sometimes they don’t actually make any sense at all.
So for instance, the computer vision people really like saliency. Saliency is where you can figure out what part of the image the neural network is looking at. But the problem is saliency is that if the image is misclassified, then you could see where the network is looking, but it doesn’t tell you anything about what the network is actually doing.
So you realize you’d never understood what it was doing in the first place. Now, I will say that you, if you’re like most scientists, your reward function is designed to favor explainable machine learning.
But I believe that, at least for the next few years, the world would be better if you don’t. So, why is that? Well, it’s because people love their black boxes. Companies can make money off black boxes. You can’t make money if you give away the model. The model is completely interpretable.
You can’t make any money off of it. Black boxes are magical, and they uncover things. They uncover patterns you didn’t know about. But that’s also true for interpretable machine learning. And in fact, the patterns are much clearer.
So if you have a choice between an interpretable model and a black box, a lot of people would choose the black box. But I claim that that makes no sense. Black box models are much easier to train than interpretable models. There’s no question about that, because interpretable models, you need to have the constraints to make them interpretable.
And then also, black boxes are an excuse not to do any feature engineering, because you don’t need to. The machine learning algorithms should supposedly do it for you. But I claim that for any high stakes decision, you should really be very carefully looking at what your features are, and what they mean.
Now, people think that the black boxes are more accurate. But they often do that without testing that hypothesis. They very often don’t even try to find an interpretable model that’s accurate. Now, what’s going on right now in the education of machine learning is that we don’t even teach interpretable machine learning anymore.
Most machine learning classes do not even teach CART. So people only know their black boxes. Now it is much easier as an academic to write a paper introducing an explainability method than an interpretable modeling method. Because explainability methods have often involved just taking derivatives, whereas interpretable modeling requires solving a hard optimization problem with interpretability constraints.
Now, when the world writes mostly papers about explainable AI, it actually encourages black boxes. Because people think that you can just make a black box, and somebody can explain it. So you might as well just make the black box.
Now my bottom line is that I think statisticians an fix this problem. It’s not going to happen in machine learning, not the way things are going. But applied statistics has always cared about interpretability.
So if the field of statistics wants to be leaders of data science, I claim that it should solve problems like this.
Total creator. General coffe buff. Award-winning internet trailblazer. Devoted tv practitioner. Gamer. Communicator. Travel fan. AI and machine learning are everyday!