Episode 112: [Value Boost] Lies, Damned Lies and Stakeholders
Download MP3[00:00:00] Dr Genevieve Hayes: Hello, and welcome back to Value-Driven Data Science, where data professionals become strategic experts. I'm Dr. Genevieve Hayes, and I'm here again with Derek Gibson, a data scientist, and analytics expert, who is the co-author of Data Duped: How to Avoid Being Hoodwinked by Misinformation, and author of the upcoming Data, AI, and the Noise: Searching for Truth in Information and Algorithms.
[00:00:30] Last week, Derek and I discussed strategies for identifying and stopping AI misinformation before it reaches your stakeholders. Today, in this Value Boost episode, we're exploring how data professionals can help their stakeholders avoid being duped by misleading data beyond AI outputs. Welcome back, Derek.
[00:00:52] Derek Gibson: Thank you.
[00:00:53] Dr Genevieve Hayes: It's only in the last few years that people have even had to consider learning how to defend themselves against AI-generated misinformation. However, misleading data has been around for far longer. The phrase, "Lies, damned lies, and statistics," for example, dates back to the late 19th century. Although learning to defend yourself against AI misinformation is vitally important, it's equally important to learn how to defend yourself against good old-fashioned data and statistical lies.
[00:01:23] This was the topic of your first book, Derek, which was co-authored with decision scientist Professor Jeff Kamm, who previously appeared on episodes 80 and 81 of this podcast. In our last episode, you mentioned several ways in which you see people getting caught out by AI misinformation. Beyond those, what are some of the most common ways in which you see people getting caught out by data deception more broadly?
[00:01:51] Derek Gibson: In terms of, things that, data professionals would be looking at.
[00:01:56] I think it's the same things we've experienced before. So In the corporate sense, a lot of decisions are made looking at the averages, but not looking at the median. And it's something that I've seen repeatedly leaders not check in on when they're looking at a report or looking at an analysis to really tease out the distribution of the data they're dealing with.
[00:02:17] Not only is data messy, but it's not always a bell-shaped curve, so figuring out what's our vulnerability of making a decision based on the average versus the distribution or the skewed mean is something that, is timeless in terms of a skill. Also think about cherry-picking data and our cognitive biases around that and around groupthink.
[00:02:43] It's very easy for us to look at numbers and see an initial reaction. Say you're in a room and going with the crowd without having enough dialogue and discussion about what do those numbers really mean.
[00:02:57] Dr Genevieve Hayes: I remember back when I was doing my PhD research, one of the things I'd often find is you do every hypothesis test you could think of in order to prove something. And usually what you'd find is a couple of them would conclude that your hypothesis is valid, a couple of them would reject your hypothesis, and then you'd be stuck thinking, "Okay, what do I even conclude from this because nothing's in agreement?"
[00:03:25] And I wasn't trying to deceive anyone because all I was trying to do is pass my PhD. But I can imagine how in some sort of business situation, if you had all these hypothesis tests, you could end up just saying two of them say this, two of them say that. I wanna do this particular outcome, so let's focus on the hypothesis tests that agree with me."
[00:03:50] Derek Gibson: Yeah, I think you're onto it there. There's two things. One is numbers have a certain authority that they bring to a conversation than not having numbers. So in a business situation, if someone brings numbers to the discussion, this maybe elevates the credibility of that person's position or the conclusion.
[00:04:09] The other part is that numbers don't always behave, and you have to consider what the outliers are doing there consider how it might be affecting the decision. So I guess back to the business situation, what's important is that you're having that discussion and dialogue, and hopefully you're in a corporate culture where questions are welcomed and not, pushed to the back of the room.
[00:04:35] Because, in my experience, numbers are never neatly aligned with the decision. Another example of that is when the numbers are presented in a way that support the predetermined decision that the people have gathered in the room to make. And if the numbers are presented, say, as a percentage or just presented as a gross number by themselves, it could be deceptive.
[00:04:57] Maybe not intentionally deceptive, but people have to ask that question like, what's the denominator? Why is someone only talking about the growth percentage being so high, but the actual units of sales being low? If those things aren't shown at the same time, then there's a risk of, intentionally or not, making the wrong decision.
[00:05:17] Dr Genevieve Hayes: I remember once freaking out a room full of executives because I had a table showing insurance premium increases and there were one or two of them that were massive, like 78% or 92%. And I remember, the CFO seeing this and thinking, this is terrible. We've got some of our clients getting a 78% increase in premium."
[00:05:39] And what it actually translated to was it was just a really low base because they'd taken on the policy late in the year and had a, part-year premium for the previous year. So that resulted in this massive increase and in dollar terms, it was maybe $100 increase or something.
[00:06:00] And, I wasn't trying to be deceptive. The standard way we were reporting it was with percentages. And my mistake there was I just hadn't reviewed it and thought, "Okay what impact is this gonna have on an executive who sees it?"
[00:06:14] Derek Gibson: Exactly. And I love how you positioned that because it wasn't actually intentional on your part. It was just the way that you were accustomed to presenting it when then the follow-up to the question was nominally, is that a lot of money?" Is it a big difference for people?"
[00:06:27] You know- Sometimes this is done intentionally, and a great place to look for that is in the news. Very often for the purposes of getting your attention, numbers will be quoted as a percentage rather than the base number. Or it'll be quoted as the number without the percentage. Thousands are affected by the latest storm, something like that might be the headline.
[00:06:49] But when you break down the numbers, maybe it's only a small percentage of people affected by whatever event they're reporting. So there's both this intentional and non-intentional misuse of numbers that I've witnessed, and we just have to be a little more alert when we see those things to ask that follow-up question.
[00:07:04] Like your story, so good that they did to figure out, is this really an extreme percentage, or is this just an artifact of a small denominator?
[00:07:13] Dr Genevieve Hayes: Yeah, exactly. And after that, I realized, "Oh, okay, that's a mistake in how I'm presenting things." And then I started putting in a column that's like, "And this is what the dollar difference is." And then that cleared it up for everyone going forward.
[00:07:26] Derek Gibson: Excellent.
[00:07:29] Dr Genevieve Hayes: If you're a good data scientist, you should be trying to be honest in the data that you're presenting to people. But not all data that goes before executives is going to be reliable. Whether it's intentional or not, it might be misleading in some way.
[00:07:45] How do you help your stakeholders to become data skeptics in a way so that they don't get duped when you're not sitting next to them to guide them?
[00:07:57] Derek Gibson: So If the skeptic in that situation is the decision-maker, the executive, the leader, the manager they have to be willing to ask, what could be wrong with this data? They don't need to see all of the painstaking work that the data scientist has gone through to do it, but they need to ask what's vulnerable about the data that's being used to support the decision.
[00:08:16] This hopefully gets to a more comfortable discussion about perhaps the data pipeline. Are there quality issues, and that maybe it's just a artifact of whatever it is they're working on? Are there data sources? Or are there interpretation issues? Did data have to be imputed versus collected?
[00:08:31] Those sort of things. But unless that stakeholder who is making the decision doesn't have the ability to push back and question what could go wrong in this conclusion, or how could these numbers not lead us to the right number, then I think that's where it's to their detriment, to making better decisions. The data is great, but it could also be misused in a lot of different ways.
[00:08:55] Dr Genevieve Hayes: So it's part education piece and part just providing your stakeholders with a set of questions that they can ask in order to deliver better outcomes for themselves and their organization.
[00:09:11] Derek Gibson: Absolutely.
[00:09:14] Dr Genevieve Hayes: Final piece of advice about consuming data to help them avoid being data duped, what would it be?
[00:09:20] Derek Gibson: Slow down. Sl-slow down at the number that confirms what you already believe. Really what I'm referring to is our natural confirmation bias. I see this as probably the most often data fail is that back to the business context, the people in the room often have an idea of what they believe the conclusion's gonna be and what the data's gonna support.
[00:09:43] And when they see those numbers, they should stop and say, "Why are these believable numbers?" Even if it's the ones they want to believe, which is the hard part, right? We wanna believe that we're right. We wanna believe we have all the answers, that we don't have to go to our data science team and get them, because as a leader, they intuitively knew what the outcome was going to be.
[00:10:03] But, the real advice here is to slow down, and ask about that number that's confirming what they think they already know, and then ask the opposite question: How could these numbers disprove what I believe?
[00:10:14] Dr Genevieve Hayes: So be more critical of numbers that tell you what you wanna hear rather than what you don't want to hear
[00:10:21] Derek Gibson: Absolutely. If the numbers confirm what you already believed before you walked in the room, you need to ask why. And it actually may be a real opportunity. One, it might be an opportunity to have this great discussion with your data scientists about how to collect data. The other might be are you really looking at the problem opportunity question in the right way?
[00:10:41] Why is it that you already knew the answer before you asked the data team to do data analysis on it? You might be looking at the wrong question.
[00:10:48] Dr Genevieve Hayes: I mentioned the super scales in the gym that I was using the other day. I used it for the first time in four months, and it basically told me that even though I'd made no major changes to my diet still been going to the gym five days a week, everything had literally gone backwards.
[00:11:04] I'd put on fat, lost four hundred grams of muscle. . And, I was horrified at first, and then after two days of being miserable, I thought, "Maybe it's not me, maybe it's the scales."
[00:11:18] Derek Gibson: Maybe the data's wrong.
[00:11:19] Dr Genevieve Hayes: Yeah, maybe the data's
[00:11:20] Derek Gibson: So you need another data point.
[00:11:22] Dr Genevieve Hayes: Yeah. And I had a conversation with Claude, and I said, "Okay, how are these numbers actually calculated?" Well, They calculate this one thing, and everything else is a formula. So once one thing goes in the wrong direction, everything else will go in the wrong direction. And then secondly, okay,
[00:11:39] let's assume I do not have a metabolic condition, why would I end up seeing this? And he said did you do it at a different time of the day? Did you do it before or after exercising? Had you eaten a large meal? What was the weather like?" And all these things. And eventually I came to the conclusion every other time I've used the scale, it was first thing when I walked into the gym.
[00:12:03] This time it was at the end of being at the gym. But it's interesting that I was just assuming the data was right.
[00:12:13] But once I started it from the position of, well, I probably don't have a thyroid condition or have developed some sort of metabolic issue.
[00:12:24] Derek Gibson: So the data doesn't align to the conclusion, right? So what's unique about you is maybe unlike some of the other people we were trying to give advice to, you've got this mental process already in your head. You might not recognize it, but you're going through the steps of could this data be right?
[00:12:38] So now you're saying, how could this be possible, right?
[00:12:40] Dr Genevieve Hayes: Yeah. Does this make sense?
[00:12:43] Derek Gibson: I love that last phrase. That's what I wish more leaders and decision-makers in the room would ask more often, "Does it make sense?" My experience is that they don't ask those questions probably for two reasons. One is they don't know how to challenge or ask the numbers questions of the numbers people because it's been, years since their Stats 101 class.
[00:13:05] And the other reason is the hubris that the executive already knows what to do. The number of times I've been disappointed that my analytics teams were asked to confirm a decision that had already been made, so the strategy's been put in place, they're gonna do the thing that's gonna grow the sales, but there's no analysis before the decision.
[00:13:24] The analysis comes after the decision to confirm that it's been successful.
[00:13:28] Dr Genevieve Hayes: I'm pretty sure that I've seen a Dilbert comic to that effect.
[00:13:32] Derek Gibson: Yeah, I've seen it in real life. There probably is a Dilbert comic. But it's very frustrating because there's then some inherent pressure to come to the right conclusion. Or it puts the data analyst people in the uncomfortable position of maybe bringing bad news to leadership that something's not working.
[00:13:52] And then you know what happens next, right? It's the attack of the data. So when a, very well-respected high-level executive is challenged, something that happens is it challenges not only their confirmation bias, but challenges their beliefs. And this is the mechanism behind misinformation, why some of it persists.
[00:14:11] That if people accept that something they've believed so long is not true, then it could cascade to making them believe a lot of the other things that are foundations of their beliefs are not true. So they resist accepting a decision or truth they previously held. So the executive then has to face, "Oh my goodness could I have been wrong about something?
[00:14:31] And if I was, what else could I be wrong about?" So they don't go to, "Oh, the data supports that maybe I didn't have the right answer." What they usually do is attack the data. "You must have been looking at the data wrong. There's probably something in there that you haven't done." And
[00:14:43] The tells of that are you're directed at the end of the meeting have your team go back and look at that a little bit further, and maybe there's some seasonality. There's something in the trend you haven't looked at. Have you broken it down by the segments we're targeting?"
[00:14:56] Dr Genevieve Hayes: It's telling the data team how to do their job. Yeah.
[00:14:58] Derek Gibson: Yeah, "have you looked at this? You looked at that. Have you looked at this?"
[00:15:01] And, in the case where the data really is challenging their original confirmation bias and decision, they might eventually get there and accept it. But it won't be without a lot of extra, maybe unnecessary work. I've seen that a lot. So I don't know the solution for that other than to have a great organization and a great culture where maybe every leader if they're not in the technical team has to go through some sort of training.
[00:15:29] That might be the solution.
[00:15:30] Dr Genevieve Hayes: So that's it for today's episode with Derek. If you haven't already, listen to our previous episode where Derek and I discuss strategies for strengthening your AI defenses. You'll find it at valuedrivendatascience.com or on your favorite podcast platform. Thanks for joining me again, Derek.
[00:15:49] Derek Gibson: Thank you very much. It's been great being here.
[00:15:51] Dr Genevieve Hayes: And for those in the audience, thanks for listening.
[00:15:55] I'm Dr. Genevieve Hayes, and this has been Value-Driven Data Science.
Creators and Guests