Episode 109: How to Measure Anything and Make Better Decisions

Download MP3

[00:00:00] Dr Genevieve Hayes: Hello and welcome to Value Driven Data Science, where data professionals become strategic experts. I'm your host, Dr. Genevieve Hayes, and today I'm joined by Douglas Hubbard. Doug is the founder and president of Hubbard Decision Research and the creator of Applied Information Economics. He has over 35 years experience in management consulting, focusing on the application of quantitative methods to decision making.
[00:00:30] He's also the author of How to Measure Anything, finding the Value of Intangibles in Business and the Failure of Risk Management. Why it's broken and how to fix it. In this episode, we'll explore how reducing decision making uncertainty can create more business value than predictive modeling, especially when data is scarce.
[00:00:54] Doug, welcome to the show.
[00:00:55] Douglas Hubbard: Yeah, thanks for having me.
[00:00:57] Dr Genevieve Hayes: Data scientists exist within organizations for one primary reason to help stakeholders make better decisions. For routine decisions, this is typically achieved through machine learning automation because when the data volume is high machine learning excels. But routine decisions, although necessary, are not typically what make or break an organization.
[00:01:21] The decisions that truly matter for an organization's success are the high stakes decisions that are much less frequent in nature. Where volume is so low that machine learning falls apart. In such situations, data scientists often find themselves without the tools to help leaving executive stakeholders to rely on qualitative advice for support or even worse gut instinct.
[00:01:46] Yet these are also the situations where data scientists have the potential to add the greatest value if they know how. Now, as a data scientist with a background in both actuarial science and statistics, this is something I've experienced firsthand in my own work. When machine learning has failed, I've found myself falling back on actuarial and statistical techniques better suited to sparse data.
[00:02:11] While all the time worrying, I was somehow doing data science wrong. So when I came across your book, how to Measure Anything, Doug, it was somewhat of a revelation to me. Not so much because it taught me something entirely new, but because it confirmed for me that I was doing data science right. It was as if data science had suddenly stopped gaslighting me.
[00:02:35] The reality is there are some data science techniques such as machine learning that are better suited to supporting high volume, low stakes decision making while others, such as actuarial and statistical based techniques are better suited to supporting low volume, high stakes decision making. And that's okay now.
[00:02:57] The framework you developed for combining scientific and mathematical techniques from disciplines such as economics, finance, and statistics to support business decision making is known as Applied Information economics or a IE. What does that look like in practice?
[00:03:17] Douglas Hubbard: Basically we start with a process. If somebody says, I've got a big decision to make, and I'm not sure how to make it, and I'm not sure how to measure what goes into it, to inform that decision first off, we have to clarify what the decision is.
[00:03:31] What's the decision that you're making? And we wanna model that decision. We wanna model our uncertainty about the decision. Now, a way to model an uncertainty about a decision. It could come in the form of, let's say a big business case, a cost benefit analysis, maybe a big net present value calculation, or an ROI or something like this, and all those variables.
[00:03:55] A lot of them will be uncertain. They won't know them exactly. What we wanna do is capture our current uncertainty about each of those variables. Now, this is actually before you measure them. So when you measure, we talk about reducing uncertainty quantitatively based on observations. So that implies you had a prior state of uncertainty.
[00:04:16] So we model our current state of uncertainty. We say, here's how much I think I know about these things now, and then I can compute the value of additional information. Now, this is where data science, I think sometimes gets interesting. I always thought that name was odd. What is the other kind of science?
[00:04:36] Exactly. So yeah, of course science is data science, right? But, data scientists don't often come up with a situation where they say, I've analyzed something and I have some existing uncertainty on these questions here, and now I'm going to conduct an original controlled experiment or a survey or a measurement of the outside world, or count the number of cars going through this intersection, et cetera.
[00:05:02] They don't think in terms of original empirical work. Like a typical scientist would like, you might see a psychologist or healthcare researcher, somebody like that, they tend to start with, here's the data that I have and what inferences can I make with the data that I have.
[00:05:18] Now, as you said, when you have a. Large volume of data, often that suffices. You can do all sorts of things when you have large volumes of data, but when you're making a big, risky decision and you don't have that data on every variable, you're going to have uncertainties. And of course when you have data science going on and you're making decisions those decisions are all based on forecast and even with large volumes of.
[00:05:43] Data, you're not gonna make perfect forecast. You still have uncertainties about forecast you would know that as an actuary you can have lots of uncertainty about, mortality tables for individuals my age and, background, et cetera, but still have uncertainty about whether or not
[00:06:00] my family would file a claim this year or something like this. That's uncertainty. So the big questions are those one-off big kinds of uncertain bets. Should I implement this brand new system? Should I spend tens of millions of dollars re-engineering the entire organization with ai?
[00:06:17] Should I. Test this new drug compound. It's gonna cost me hundreds of millions of dollars to go through phase three trials. Should I change this government policy that's going to affect all sorts of lives, positive and negative,
[00:06:31] those are all big. Consequential decisions and even if you had a lot of historical data, you're still making forecast about the outcomes of that particular decision and there will be uncertainties. So you model your current state of uncertainty and then you can compute the value of additional information.
[00:06:49] That means that you might have to go out and try to conduct. Random sampling methods of species in the forest or surveys of people on how much time they spend in traffic. Those are original data. I don't know why data science has really evolved into this non empirical method. They just assume that the only questions they can answer are the ones, they already have giant databases for it.
[00:07:13] You have to combine these things. Now here's a few things I would add to how this process works. So I mentioned you define decisions. You quantify your current uncertainty about those decisions, all the uncertainties of the variables that you have. You compute the value of additional information, then you measure what matters most.
[00:07:32] Those high information, value measurements, especially the ones that are low hanging fruit. And then you can optimize the decision. That's the whole approach. That's what it looks like when you implement it for individual large decision problems. We base it on a few other factors though when we say that we quantify our current state of uncertainty, that is also based on a lot of historical data.
[00:07:55] It's based on the historical data that people can be trained to quantify uncertainty, and we can actually measure their performance at quantifying their uncertainty probabilistically. So it's just calibrating an instrument, the same thing occurs when you calibrate a person. It's really no different. The math isn't any different, . A way to calibrate a person just like you'd calibrate a scale, for example, is to measure something that you either already know the answer to.
[00:08:24] Or something that you will very shortly know the answer, so I have a one kilogram standard weight. I've put it on a scale, and I wanna see how close it comes to a one kilogram standard. So we have whole government institutions dedicated to problems just like that. In the case of humans putting probabilities on things we can ask them trivia questions or ask 'em to forecast things that we will unambiguously know the answer to in a short period of time.
[00:08:49] So that's where you can measure their performance. So you've got lots of data about the performance of experts quantifying their own uncertainty. So that's a level up, that's metadata about the measurement method itself.
[00:09:03] The other side of data that we get here is, we've done this a lot for a lot of different kinds of decisions with lots of variables in the. Decision model dozens or sometimes a few hundred variables in a decision model. And we compute the value of information for every one of them. And there's formulas for the value of information that come from game theory and decision theory.
[00:09:23] They've been around for a long time. Some of them get a little bit more elaborate, but in their essence, they boil down to the chance of being wrong times the cost of being wrong. So when we do that, we tend to find out that the highest information value variables are things they would not have otherwise measured. We call this the measurement inversion.
[00:09:43] We've observed this, I think, in every industry we've ever done consulting it. It's almost like everybody's systematically measuring all the wrong stuff. I don't know how it doesn't affect the GDP. Organizations making major decisions are spending more time measuring things that are statistically less likely to actually improve a decision while ignoring the most uncertain things that would have the biggest sway on the decision.
[00:10:07] , So that's a major problem. What's interesting though, is what the math tells us about the solution to that. Obviously one solution is computing information values, but the very variables that have high information values.
[00:10:21] Unsurprisingly are also highly uncertain. In addition to the decision being sensitive to that variable, so the decision is sensitive to that variable, and the variable is highly uncertain. Highly uncertain variables are actually easier to reduce uncertainty on.
[00:10:37] Because they're uncertain. It's convenient. It's the neat little convenient aspect of all of this that the highest information value variables are actually also easier to reduce uncertainty on because they're so uncertain. I paraphrased the math behind this as if you know almost nothing, almost anything will tell you something.
[00:10:57] That's what the math actually means about this which is contrary, I think, to a lot of intuition. A lot of people might think that if I have a lot of uncertainty, I'm gonna need a lot of data. That's the way they might think about that.
[00:11:09] And mathematically speaking, just the opposite is true. You get the biggest uncertainty reductions on a highly uncertain variable on the first few observations. The size of the uncertainty reduction after the first few observations tends to get less and less. And then if you're trying to do something let's say you're trying to estimate the mean of a population based on a sample, once you're past about 30 samples, you have to quadruple the size of the sample for every additional 50% reduction in uncertainty.
[00:11:41] The first few were a huge reduction in uncertainty. Even very small samples. A sample of five or something like this can be a huge reduction in uncertainty if you had a lot of uncertainty before.
[00:11:51] Dr Genevieve Hayes: Before you were saying, why is data science so focused on the applications where basically we're swimming in data? I think part of that happened because of the evolution of data science. What is currently referred to as data science largely evolved in Silicon Valley, where you had the big tech companies where they had absurd amounts of data.
[00:12:14] So the techniques that naturally evolved were the ones that were suited to massive data, whereas. I've never worked for a company as big as Google, the organizations I've worked for had far less data.
[00:12:30] So many of these techniques started falling apart, and that's what got me to start looking at other techniques. Drawing on techniques from my own actuarial and statistical background and discovering, the value of those techniques. And I think. That's missed by a lot of data scientists.
[00:12:49] Douglas Hubbard: Yeah. I think data science says we can solve your problem if it's of this type, if it has these kinds of uncertainty issues. And the fact is that a lot of business problems don't. Fall neatly into it's entirely based on data science or it's entirely based on original observations.
[00:13:09] It's a decision under uncertainty and it can use some of both. You can certainly use machine learning to inform specific variables that you're uncertain about, and when you have other variables, you can. Run a survey, a control experiment, do a regression analysis on a bunch of other data that you might have.
[00:13:29] There's even ways to better quantify the judgments of experts even beyond just calibration that outperforms the unaided intuition of the expert. It was a particular method we used quite a lot. I don't know if you came across this in one of the books, but it's called the Lens Method.
[00:13:44] Dr Genevieve Hayes: Oh yes.
[00:13:45] Douglas Hubbard: It's been around since the 1950s or so.
[00:13:47] Dr Genevieve Hayes: Yeah. I love that. I have bookmarks in my book at that page.
[00:13:52] Douglas Hubbard: Yeah. The lens method really just starts with, you have some human experts but they're highly inconsistent. There's been decades of studies on how inconsistent we are, we'll just. Make a different judgment about the same thing because it was a different day and it was after I had my coffee or whatever it is, so inconsistency is so high that if I can just smooth out inconsistency, there's a measurable improvement in estimates and judgment.
[00:14:17] So what I could do is I could take something where you've got a large number of relatively homogenous judgements, that can be parameterized. Maybe you are doing something like you're trying to estimate the durations of projects, and let's suppose there's eight or 10 key factors that you consider when you're trying to estimate the duration of a project.
[00:14:40] And you're given these eight or 10 factors about the project and you make a subjective judgment over and over again on a list of a hundred of them. But unknown to you. Hidden in that list are some duplicate pairs. We've done a lot of this, by the way.
[00:14:54] It's duplicate pair analysis. Number eight is identical to number 91 or something like this. By the time you get down to 91, you forgot you already answered that exact combination of quantities and those parameters, and you probably put down a slightly different answer. We find that based on how much of the variation in judgment.
[00:15:15] That you observe your total variation in all your judgments based on all the different data, you can explain about 20% of the variation in your judgments. Just as personal infancy, they just would've given a different answer. So we build this model that tries to predict what the human would say.
[00:15:31] It's based on no historical data at all. It's just a list of the human judgments. And so the model tries to predict what the human would say. And then you have a tournament comparing the human against the model of the human, where you can see real world outcomes and measure the performance of each method.
[00:15:47] And this has been done many times. There's been a lot of research on this. It turns out the model of the human consistently beats the human.
[00:15:54] Dr Genevieve Hayes: Is that there's less variability.
[00:15:56] Douglas Hubbard: Yeah, it's just less random. It has other biases. If a human applies the wrong emphasis to some variables and not others, or systematically ignore some things or overweight some things in their judgment the lens model will pick that up too.
[00:16:11] It'll have those same biases, but. The lens model won't have any inconsistency. It won't be any noise in it. And that is large enough that simply removing that even though there's still other biases that alone improves the judgments.
[00:16:27] Dr Genevieve Hayes: What I took away from looking at the lens model part of it also looked like it was creating an ensemble of experts, and you'd get all the benefits associated with ensemble modeling from it.
[00:16:40] Douglas Hubbard: Yeah, so when we do lens models you could, and we have done lens models on single individuals, but often we're using groups of individuals. There's also, by the way, decades of research on, how to aggregate the estimates of multiple individuals and some methods of aggregation clearly outperform other methods of aggregation.
[00:16:59] So what's interesting is that probably the most popular form of aggregation is the worst one.
[00:17:06] Dr Genevieve Hayes: It's a just simple average.
[00:17:08] Douglas Hubbard: No, that's actually a relatively decent one.
[00:17:10] The most popular one is just get everybody in the room and build a consensus and they all agree on something.
[00:17:15] Dr Genevieve Hayes: Yeah.
[00:17:16] Douglas Hubbard: Yeah, 'cause the most overconfident person in the room is probably gonna drive that discussion or something like that,
[00:17:21] what we observe in the empirical data is if you get a bunch of people to make judgements that way about forecast their forecasts are worse than the other words averaging is slightly better. Averaging of course, also offset some of the inconsistency because there's gonna be some random inconsistencies, and of course, averaging multiple estimates.
[00:17:39] Takes care of that to a certain degree. And you can compute mathematically what that reduction the lens model reduces it even further. But even averaging, which is one of the, algorithmic methods. It's the simplest algorithmic method for combining multiple expert estimates.
[00:17:56] That's not the best algorithm. The best algorithms are ones that take into account the past performance of individuals in the group and even how well correlated they are to each other. That's the less obvious one. But when you think about it, if you and I were perfectly correlated.
[00:18:13] I don't add any value to the forecast. Here's an interesting little tidbit. I remember this from the analysis we did. We actually constructed based on the 2000 or so people that we have in our database right now that went through our calibration training where we trained people to be good at quantifying their own uncertainty probabilistically.
[00:18:32] We did a series of experiments where, among other things we created a couple of million virtual teams. Of people. So these would be people from different organizations, at different times that we grouped together in a quote unquote team who all happened to be answering the same question.
[00:18:50] It would be a trivia question, let's say. And it was true or false. And they would say how confident they were in their answer. And then we would unfold it so that if they said that there. 80% confident the answer was false. They're basically saying there's a 20% chance it's true. So we turned 'em all into probability of being true.
[00:19:10] And when you do that with three people, you don't just average the three individuals, it turns out you have to take into account the base rate of the whole portfolio. In this case, we set our base rate such that there's about a 50 50 chance of getting a true,
[00:19:27] 'Cause they were about equally distributed between true and false correct answers. So they answer each of these and with that 50% base rate we then take their answers. And if three people all said that something was 70% likely and they did it independently, we don't just average them and get another 70%.
[00:19:45] The chance that it's true is more like 81%.
[00:19:49] Because they're independent of each other, and apparently their knowledge is at least somewhat complimentary. That's what we observe in the data and that particular. Question where three people all said something was 7% likely. We had about 2,400 of those out of the 2 million.
[00:20:06] So there was plenty of data on these things. Most of these combinations had a few hundred examples. There'd be rare examples where one person said it. 10% likely, and two other people said it was 90%. That would be a little more rare than other combinations where they directionally agreed, but what's interesting is that you actually end up with more information in the aggregated answer
[00:20:30] Dr Genevieve Hayes: A lot of this is reminding me of the probability theory that I studied back when I was an undergraduate. I can see how this all works. So yeah it's a fascinating idea. How to measure, anything was written before the big AI wave. One thing that struck me while reading this was that the, a i a approach is something that firstly, it would be much harder to automate using AI than machine learning would, but at the same time, it's something that I can see AI supporting very well and.
[00:21:06] Basically turbo charging. What are your thoughts on this?
[00:21:10] Douglas Hubbard: Yeah, we thought that right away, as soon as the first versions of, say, chat, GPT became available to the public we started doing experiments right away. A, we were experimenting with AI on how well calibrated it would be in forecasting future events. We have a bunch of data on that. The answer is, it can be about as good as a human and forecasting future events.
[00:21:31] And maybe the generations after the first ones that we tested would be even better, we can still test those. One way that AI can actually improve the judgments of individual humans is, aI has its own lens, model lever. You can actually change
[00:21:48] the inconsistency in an ai, it's called temperature, so temperature is this parameter that goes from zero to two, and the default is a one and. If you look at different levels of inconsistency and compare them apples to apples people behave like their inconsistency is equal to a temperature of about 0.8 or 0.9.
[00:22:08] So they're a little less inconsistent than the default of ai, like a chat bot, right? But you could set the temperature of a chat bot to zero. And it'll just give the same answer every time. Now, if you did something like that and then the AI was actually one of the members in the group that's being aggregated, does that actually improve the estimates of the group?
[00:22:34] It's like another highly consistent member of your team, and maybe even different ais are better at this. We've been building AI personas. For the purpose of supervised fine tuning training. And so one of our team members built several AI personas in order to ask questions about my books and the AI that we were training would try to answer them? And the. Personas that generated these questions. We would compare it to the actual excerpts from the book, and that would be, here's the right answer and here's what the AI said kind of stuff, but AI personas, a lot of people have been doing this sort of stuff.
[00:23:13] So you can have one persona that is more of the, skeptic, a persona that's more of a creative explorer type. You can have a persona that gets down to brass tacks and looks at the business problem, et cetera. Very practical person. One that thinks like a child. Whatever you like, I can make a persona that replicates that.
[00:23:34] And would it make sense to have a group that you're going to aggregate to estimate something made up of different deliberate personas? Because at least with AI, you can create the persona. I don't really have a lot of control over the personas of people other than who I select to be part of the group in the first place.
[00:23:52] That's maybe the only control I have. But with the ai you can get very specific about that and I wonder if that would be even better. Having lots of individual AI personas participate in a Delphi round of combining multiple experts or something.
[00:24:06] Dr Genevieve Hayes: So for a data scientist who wants to start applying a IE within their organization, where should they begin?
[00:24:13] Douglas Hubbard: First off, we often recommend starting with the biggest, hardest problems it's a pilot project in effect, and so for a lot of our clients, it's a really big organization. Could be a giant. Corporation or government agency, and they've asked the same question, how do I start to apply this stuff?
[00:24:30] I say let's do this first. Pick the hardest problem that you have, a big, complex consequential decision, and we're gonna do a full risk return analysis with those steps that I described earlier on that problem. We do that we often find solutions that they wouldn't have thought of. We measured things that they didn't know how to measure.
[00:24:49] And then that pilot becomes a demonstration piece for the rest of the organization. So even if they don't do anything else with it, they have a very useful deliverable for that consulting project. Not only is it telling 'em whether or not to do the project, but if they do it, maybe there's some risk mitigations they didn't already think of.
[00:25:08] And if they don't do it, we can, define the conditions under which it would be viable. For example, it's entirely possible that we should just start building all the office buildings with solar power plants, solar power panels over all the roofs. We should convert all the windows to those transparent type solar collectors where , some of the IR energy is actually redirected to the edges of the glass panel and picked up there.
[00:25:35] Could be very expensive. I could start doing that, but if I waited a year or two. Maybe those technologies would get better. Actually, they do, they get better on a log linear basis and capital cost per watt of capacity, et cetera. I could do it now and start saving money, but if I waited a year or waited two years, I would save even more.
[00:25:56] Obviously, I don't defer indefinitely and never save the money, but that's one thing that we could start doing. Data scientists should start with that one big problem. And do that kind of analysis. And even if it's not currently viable, like maybe it's not currently viable to put all these solar panel windows all over the building, but what are the conditions under which it would be.
[00:26:20] That's the next question, if it's not viable. Now, should I start forecasting at this point in time the trend in capital cost per wat of capacity for these various technologies and trends in the otherwise cost of electricity? Maybe it's going up in my area. So if it's not viable right now, are there conditions under which it will be and then I start tracking those in advance.
[00:26:44] That's actually a really important finding here. A lot of times when people implement a new technology, it's usually pretty far after, it was already pretty obvious that they should have implemented.
[00:26:54] Dr Genevieve Hayes: Yes. That's the key thing. When will this become viable?
[00:26:59] Douglas Hubbard: So it's never just a yes no. It's if it's not viable now, when will it be? And under what conditions? If it is viable right now, are there things that you should do to mitigate risk further, increase ROI further, et cetera. So there's always optimization problems, even if seems like a yes, no question getting.
[00:27:16] So you pick that big, complicated investment problem, solve that one, just that one, and then it becomes the demonstration for the rest of the organization.
[00:27:27] Dr Genevieve Hayes: For listeners who wanna get in contact with you, Doug, what can they do?
[00:27:31] Douglas Hubbard: Info@hubbardresearch.com is real straightforward. My books all have their own web pages on the site. How to measure anything.com. So if you go to How to measure anything.com, you'll see images of five books and you can click on each one of them.
[00:27:46] And of course you've got all the contact information there, but you also get all of the materials that I refer to in the books. Like in the books, I'll talk about a spreadsheet example for optimizing that technology regret forecasting problem, like how long you should put off a new technology or something like that.
[00:28:04] There's a spreadsheet for that. There's a spreadsheet for a very simple Monte Carlo simulation for a risk analysis. There's a spreadsheet for very simple besian analysis with small stamps, et cetera or regression models. We just built all of those and they're sitting on there. In fact, people can download them without buying the book.
[00:28:20] For us it's really just a benefit for the readers. So the readers. In the books, if something seems a little bit more mathematically complicated, we just lead that out of the book and say, look, here's an example.
[00:28:31] It's just fill in the blank. It's a spreadsheet you can download. We explain enough about it that they know how to use it, but. We don't have to derive things from Kamal Gras axiom for , probability theory, so we're just gonna say, Hey, it's fill in the blank.
[00:28:47] You wanna run a control experiment? Here's a spreadsheet for that.
[00:28:51] Dr Genevieve Hayes: I do recommend reading the book though, 'cause it's a very worthwhile read.
[00:28:54] Douglas Hubbard: Yeah. Yes, thanks.
[00:28:56] Dr Genevieve Hayes: That's it for today's episode of Value-Driven Data Science. But if you want more from Doug next week, you can catch our value-based episode. Where we explore simple techniques that data scientists can start using right now to get the most out of limited data.
[00:29:13] And if you found today's episode useful and think others could get benefit, please leave us a rating and review on your favorite podcast platform. That way we'll be able to reach more data scientists just like you. Thanks for joining us today, Doug,
[00:29:29] Douglas Hubbard: Thanks a lot Genevieve.
[00:29:30] Dr Genevieve Hayes: and for those in the audience, thanks for listening.
[00:29:33] I'm Dr. Genevieve Hayes, and this has been Value Driven Data Science.

Episode 109: How to Measure Anything and Make Better Decisions
Broadcast by