This will be a fairly technical post, but it shouldn’t be too bad.
I’m currently working at a job where we do a ton of machine learning. Machine learning is basically do statistics with a lot of computers. There’s really no magic there, despite warnings from people with a lot of money and brainpower.
Machine learning is really a set of tricks that we’ve learned to solve a particularly hard kind of problem. The problem is summarized here.
- Go get a bunch of data.
- Look at the data really, really hard.
- Try to figure out what the data is telling you about how the universe works.
This is pretty much a summary of every scientific discipline ever.
Note that the “try to figure out” part is often expressed in different ways, the other two common forms being: “make a prediction that is accurate” or “decide what we should do differently to get different results.” Both of these are just restatements of the problem of figuring out what the data tells you about the universe.
Now, there are a number of ways people go wrong when they do this kind of thing. I won’t bore you with all the tiny details of how to even begin to understand what you are looking at and how to make sense of it.
One common problem is called “overfitting”. The way it manifests itself is you propose a theory about the data that explains the data really, really well. In fact, remarkably well. But then, when you try out the idea in reality, it is horrible.
In order to understand this, imagine a scatterplot of data points. You want to predict what value you should get depending on where you’re at along the X-axis. What you could do is just draw lines connecting all the point together. This graph will accurately predict every value you’ve seen in the data, but it will not be a very good predictor of how reality behaves (in the vast majority of cases.)
In physics, we sometimes do the same things. We have common patterns we follow to try and avoid this. One of the patterns is “Don’t look at the data before making your theories.” That is, try to make your theories out of previous theories and new assumptions. This is like trying to hit a bullseye wearing a blindfold. The problem with this method is it is very inefficient, and there is only so many ideas we can come up with. However, when you do find that needle in the haystack, the theory that does a good job at matching the data, then we think we’re pretty close to reality. The best part is we know how that theory was put together, and we can think about it and reason about it.
This is what Newton did. Or really, it would’ve been what he did except he was familiar with the data, and he was looking for a reason why things moved the way they did. So really, it didn’t work that way in practice. And it never does. Theoretical physicists do look at data. They get inspired by it.
The issue is when you look at the data, and you see shapes, you propose math that explains those shapes, and then you try to figure out what it all means. The truth is that there are a lot of shapes that will fit that data. Some of them are worse than others. And you really have no way of knowing that the shape that fits best is really the shape that represents reality. True, the more points of data you have, the more certain you are about that shape being the right shape, but you can never reach a point where you can say “This is the only shape that works well.” It gets even worse when you consider the fact that the data you have collected is not and never can be 100% accurate.
What does this have to do with conservatism?
Conservatism is one of those “inside out” philosophies on par with Newtonian Mechanics. It is a collection of ideas, “shapes” if you will, about how the world works. Philosophers and logicians have argued about these ideas for a very long time. They’ve been around for such a long time that they aren’t new anymore.
Granted, sometimes the shapes fit the data really well, and sometimes they don’t. There are other shapes you can find that fits the data better than conservatism. That isn’t really the problem we’re trying to solve, though. Focusing on what fits the data best gets you shapes that fit the data well but don’t have much power in understanding what is really going on.
The other type of philosophy when it comes to these sorts of things are the “outside in” philosophies. In these philosophies, you look really, really hard at the data, find a really good shape that fits, and then declare that to be the ultimate truth. Then from that newly discovered ultimate truth, you make predictions and take courses of action. This seems very reasonable, but as I said earlier, it has the fundamental flaw of overfitting.
The way this philosophy pops up is in comments like, “There are poor people. We have to do something!” or “The rich make a lot of money! We have to do something!”
And if that something is aimed at changing the metric, and you make proposals based on conclusions drawn solely on the data, you’re going to get some really bad ideas. For instance, we could kill all the rich people and the poor people and that would certainly eliminate the problem of poverty and wealth disparity. Obviously, something is hopefully telling you something is fundamentally wrong with this proposal. But don’t you also feel like there is something wrong with the idea of taxing and giving money away?
I can’t tell you the number of times I’ve personally seen data-driven thinking lead people astray. When I worked at Amazon, we were almost cult-like in our devotion to data, but we knew about overfitting and we tried to avoid it. This was years ago, but we used to run A/B testing on various changes to the website. We developed ideas about why people clicked on some things and not other things. Some of our ideas were pretty fantastic, and entire teams were formed to pursue them. In the end, we discovered that the data seemed to be telling us people click on new things but they really don’t like things to change very much. That’s why Amazon.com doesn’t really look very much different than it did a decade ago.
In your own life, be very careful about making decisions based on data. Be careful about overfitting. Be careful about reading too much into the data you see. Remember that the shape you think fits the data isn’t necessarily true even if it is the best shape.
If you really want to understand something, you have to develop theories in an almost clean-room environment free from data. Once those ideas are developed, then you can test them to see if they pan out. But be careful about reading the data too closely, since it can lead you astray.
Also, this is why I don’t like string theory. It’s a plain and simple case of overfitting. They literally pick and choose which versions of string theory they like based on how well it fits the data.
And this is why I don’t like political decisions being made with graphs in the background.