Episode 95: [Value Boost] Building Models That Work While Millions Are Watching
Download MP3[00:00:00] Dr Genevieve Hayes: Hello, and welcome to Your Value Boost from Value Driven Data Science, the podcast that helps data scientists transform from technical implementers to strategic experts by contributing to the decisions that really matter. I'm Dr. Genevieve Hayes. And I'm here again with Professor Steve Stern, official custodian of the Duckworth Lewis Stern Cricket Scoring System to boost your strategic impact in less time than it takes to grab a coffee.
[00:00:30] In today's episode, you'll learn practical strategies for creating statistical models that remain reliable when the stakes are high, and the whole world is literally watching. Welcome back, Steve.
[00:00:42] Prof Steve Stern: You.
[00:00:43] Dr Genevieve Hayes: Academics are often criticized for being out of touch with the real world. However, a few academic data scientists, let alone industry data scientists have ever had their models deployed during a Cricket World Cup with millions of people watching and absolutely no room for error.
[00:01:01] That's a very different challenge from building a model that's only used to create results for a research paper. But you've actually done that, Steve. So today I'd like to explore some of the practical considerations that go into building a model that has to work perfectly in real time, high stake situation specifically with reference to the DLS method.
[00:01:24] When you're designing a model that needs to work in a real time high stakes situation, such as the Cricket World Cup, is there anything fundamentally different about your approach compared to building a model, for, say, an academic research paper, or even for a standard business application?
[00:01:44] Prof Steve Stern: I don't think there's anything fundamentally different at its base, but I think the difference is that you need to be far more certain of your understanding of the model that you're putting forward. 'Cause as you say, there are high stakes and if the model has a flaw or it does something in a scenario that you hadn't thought of.
[00:02:04] Then you can be caught out in a rather significant way that you typically wouldn't be in an academic setting. And because of that, I think that it really is the case that you want to make sure that you have a simple enough model that you can understand it in its entirety.
[00:02:20] And I think DLS if nothing else, and I know people won't believe this, but it's simpler than all the other models that we thought of of potentially implementing.
[00:02:30] Dr Genevieve Hayes: Yeah, so it reduces the risk of it coming and slacking you in the back of the head.
[00:02:33] Prof Steve Stern: Exactly right. Exactly right now, you can't reduce that risk to zero, but you can definitely control that risk. And I guess even in the case where it does make a mistake, you need to be in a position to be able to say, okay, I know why it did that. I didn't think of it beforehand, but now sitting here, I know why it did that.
[00:02:50] I can readily fix that. Because otherwise you say, you basically get. One chance. And if you create a catastrophe, that's it. You won't be asked back to the table
[00:03:01] Dr Genevieve Hayes: It won't be DLS anymore. It'll just go back to dl.
[00:03:04] Prof Steve Stern: exactly.
[00:03:06] Dr Genevieve Hayes: I think it's interesting you're acknowledging that it does make mistakes, because. It is impossible to create a model that is completely error free.
[00:03:16] What does no room for error in a model actually mean? In practice?
[00:03:21] Prof Steve Stern: Well, I think you have to think about what type of errors there are, in my mind, there are two fundamental types of errors. There are mathematical flaws and inconsistencies you have to spend enough time to be as certain as you can be, that there are no mathematical inconsistencies in what you do.
[00:03:38] But the other type of errors are errors of extrapolation. In other words, the model is itself consistent, but it hasn't been tuned or it hasn't been trained on a broad enough set of data. That it understands what will happen in extreme cases. And as you say, in some sense, you can't ever get to that point.
[00:03:57] There's always going to potentially be an extreme case, so if DLS were to make an error I am fully confident that the answer would be because it was just trying to make a prediction in a case where it just had never had those things before. And it's interesting. That's becoming more likely these days because the ICC has opened up the world of T 20 cricket to a lot more countries than it used to.
[00:04:20] And there are now matches where teams score seven and, I don't think that I would expect DLS to work in that situation. I don't think anyone else would either. But that's a different situation than having a fundamental mathematical flaw. Those you have to root out before you ever put it out to the wide world.
[00:04:39] And that can take time. The first instance of the application of the Duckworth Lewis method was in January 1st, 1997. And they began thinking of it right after that famous match in 1992. So nearly five years went by before they actually got to put it out.
[00:04:56] To real test in a real situation. And the same was true of DLS. I started thinking about DLS in 2005 and it wasn't officially implemented until 2014. So, you really have to be willing. To do those deep dives into what your model's actually doing and so that it doesn't have mathematical inconsistencies 'cause that will cause collapse in a wide range of situations.
[00:05:22] The extrapolation thing, there's not much you can do about that. I mean, obviously you can be extremely careful and you can do your cross validation assessments and so forth, but you're never going to overcome an issue where there are situations where there's just not much data on which to base evidence-based decisions.
[00:05:39] Dr Genevieve Hayes: So how do you actually test and validate that your model is going to hold up in at least the majority of extreme. M situations, especially when you can't anticipate every possibility.
[00:05:50] Prof Steve Stern: Well, you do what I hope all mathematicians do is you build your model and tune and train it on existing data. And then you spend. Years looking at, well, what happened? What about this and sensitivity analyses and, getting advice from cricket players and asking them, so what in your experience is the weirdest situation you've ever been in?
[00:06:11] So trying to figure out, well, where around the edges would you be most likely to go into strange territory? And looking at those. I had a rather interesting experience, it must have been in about 2016 when I was offering up the next version or the tweaked version to the ICC.
[00:06:28] And I said to them, and you don't have to worry I've tested this model and it works for any match with scores between 50 and 500. And then not 18 months later, a team scored 498 and I thought, oops, maybe I should extend my investigations.
[00:06:44] And so now I guarantee them up to 650. So we'll see how those things go. But I think patience and perseverance, and not putting things out too soon. Making sure that you really understand what your model is doing. And where it's likely to go in those extreme cases.
[00:07:03] And you might not be able to tie it down in those extreme cases without data, but at least if something happens, you can go. Yep. I was anticipating that something weird might happen out there, but I just didn't have any data on which to tie it down.
[00:07:14] Dr Genevieve Hayes: So most data scientists listening to this won't be deploying models during sporting finals, but they might be building systems in areas such as finance or medical diagnosis, where mistakes can have very serious consequences. What's the most important lesson you learn from DLS that translates to those kinds of high stakes situations?
[00:07:37] Prof Steve Stern: That's a great question. I think the first is that if you've done enough searching and validating yourself that you have to trust yourself. If something appears to be a mistake. Maybe it's not as much of a mistake as you might think, so you can go in and say, well, what actually happened here?
[00:07:56] I think that forensics on those scenarios is crucial. But then the other side of it is you need to be humble. You need to say, well, maybe there is something about what I have created that has a flaw that I wasn't able to see initially and go back and recreate it.
[00:08:12] So one of the things that happened in DLS interestingly enough, was that there was a match that was interrupted by rain and people were very disappointed with what happened with the correction. And what I noticed when I looked at the match was that I agreed that something seemed off.
[00:08:27] And so I went back and had a look. And what had happened was that the model structure I had chosen had. Essentially built in a particular parametric structure to the relative strength of the batters in a cricket team. As you went down the order, obviously, the first batters are better, and then the middle batters are not quite as good and so on.
[00:08:48] But what that structure had done, , the parametric structure, is it hadn't allowed a sufficient flexibility for me to notice that modern cricketers, , the bottom batters were getting better than my structure. And so I released. That parametric constraint to a certain degree. It allowed the model to be more data driven as to how powerful the batteries at different levels were.
[00:09:14] And it solved the problem immediately. I wish that I had done that before the match had come out, but sometimes, you know, it's interesting, I had a long discussion with a group who I'm currently working with who work at the Gold Coast University Hospital, and they're trying to predict based on therapist notes whether someone is likely to go into a mental health crisis.
[00:09:37] So it's very important work and very difficult work because it's obviously a very complicated problem. And because they're so conscientious, their worry is, what if we make a mistake? And I think sometimes you just have to say, well, nothing can be perfect and we have to turn it around.
[00:09:52] I say to them, yes, you might make a mistake, and yes, that is something that you should definitely feel strongly about so that you want to not do it. But let's look at the flip side. Think of all the people you've helped. And so, I mean, it's not as big a deal for a cricket match, but I still like to think of, well, even if DLS makes a mistake, it still helped a heck of a lot of other matches.
[00:10:15] Dr Genevieve Hayes: Yeah, you will make a mistake just
[00:10:17] Prof Steve Stern: Yeah. You have to live with that. But the question is whether you've that, the benefits outweigh those risks.
[00:10:24] And then you have enough wherewithal, both the skill and humility to go fix it so that you don't, the real problem is making the same mistake twice.
[00:10:32] That's the real problem.
[00:10:34] Dr Genevieve Hayes: And if you could give one piece of advice to a data scientist who's about to deploy a model in a high pressure, real time environment for the first time, what would it be?
[00:10:43] Prof Steve Stern: Close your eyes. No, I think I would just say to them, make sure that you feel like you've done all your due diligence. That's as much as you can do, , you have to feel ready and most people will know. That is. And I think that the enemy from that perspective is wanting to monetize.
[00:11:03] There's nothing wrong with wanting to monetize. Of course, we all want to make a living off of what we do. And there's absolutely nothing wrong with that. But if that's a priority, then I think you're already behind the eight ball.
[00:11:14] Dr Genevieve Hayes: That's a wrap for today's value boost. But if you want more insight from Steve, you are in luck. We've got a longer episode with Steve where we explore exactly what's involved in transforming technical data science solutions into globally adopted standards. And it's packed with practical advice for moving from technical execution to real strategic impact.
[00:11:38] You can find it now wherever you found this episode or at your favorite podcast platform. Thanks for joining me again, Steve,
[00:11:46] Prof Steve Stern: You're very welcome, Genevieve. My pleasure.
[00:11:48] Dr Genevieve Hayes: and for those in the audience. Thanks for listening. I'm Dr. Genevieve Hayes, and this has been Value-Driven Data Science.
Creators and Guests