Episode 96: Making Better Decisions with ML and Optimisation

Download MP3

[00:00:00] Dr Genevieve Hayes: Hello and welcome to Value Driven Data Science, the podcast that helps data scientists transform from technical implementers to strategic experts by contributing to the decisions that really matter. I'm Dr. Genevieve Hayes, and today I'm joined by Dr. Tim Varelmann. Tim is the founder of Bluebird Optimization and holds a PhD in mathematical optimization.
[00:00:26] He is also the creator of effortless modeling in Python with GAMSPy, the world's first GAMSPy course. In this episode, you'll learn practical strategies for combining machine learning and optimization to boost the business impact of your modeling solutions. So get ready to boost your impact, shape decisions, and own your expertise.
[00:00:50] Tim, welcome to the show.
[00:00:51] Dr Tim Varelmann: Hello. Hello. Thanks for having me.
[00:00:53] Dr Genevieve Hayes: Data scientists use optimization all the time, often without thinking about it. Every time we train a machine learning model, we're running an optimization algorithm to find the best parameters given a set of data. And we do this so that our models can make better predictions.
[00:01:09] This is just one type of optimization, but there's another type of optimization that most data scientists never even think about, let alone touch, which can be used to enhance the value of machine learning, using those predictions as inputs to a separate optimization problem to help stakeholders make optimal decisions.
[00:01:29] And that second layer of optimization is where the real business value often lies. Now, to be perfectly honest before meeting you, Tim, I was familiar with optimization in the context of machine learning for parameter tuning and in standalone decision making problems that didn't involve machine learning.
[00:01:49] But it never occurred to me that you could combine the two to create more value. However, before we get into discussing exactly how combining machine learning and optimization works, I'd like to take a step back and start with the basics For any of our listeners who've only ever encountered optimization in the context of training machine learning models.
[00:02:11] How does optimization work in the decision making sense, and how does this differ from machine learning parameter optimization?
[00:02:19] Dr Tim Varelmann: The kinds of decisions that I help develop through. Solutions for in my daily work with Bluebird optimization uh, production planning, for example, logistics planning, when we have trucks that we need to assign with a bunch of parcels, we need to assign a driver and we need to assign a route, which also implies an order of clients where we visit them, supply chain management deciding how much inventory to stock up, things like that.
[00:02:45] Those are the kinds of decisions that I'm dealing with. And we can get the bridge to the optimization that is happening in machine learning. When we have a look at what the optimization really does, because in machine learning and the training part of machine learning, essentially we are minimizing a loss function.
[00:03:03] That's something that many data scientists will be familiar with, and there are no other constraints. We just want the minimal loss that we can achieve somehow, and. Frankly speaking from an optimization perspective, that algorithm isn't even that interesting because it works ally typically. And so you can have different training runs on exactly the same data, but the algorithm behaves a little different and you might get slightly different outcomes from different training runs and stuff like that.
[00:03:28] And. You need more reliability really when you're dealing with production plans for big chemical companies where you maybe dealing with explosives and you have to really have tight temperature control. Otherwise the whole plant is gonna blow up. So a lot more is at stake these decisions.
[00:03:44] And also you can deal with constraints. So you can say there are some decisions or some combinations of properties of decisions that I could make that are just not allowed and you don't have this kind of constraints where you don't allow part of the search region. And that of course makes the problem harder to solve from a mathematical point, but you can deal with these things if you have the kinds of decision problems that I work with and use mixed inter optimization, for example.
[00:04:11] Dr Genevieve Hayes: So what I'm hearing is in the machine learning sense, basically you are optimizing parameters, which are a mathematical construct, whereas in the decision making sense, you're actually dealing with real world concepts like optimizing the right temperature or the. Right combination of drivers for delivering parcels or something like that.
[00:04:37] Is that right?
[00:04:38] Dr Tim Varelmann: Yes, absolutely. So you have decision variables that. Represent a yes or no decision typically, is driver A assigned to truck B? That's a yes or no decision. And then typically you have some constraints where you have to say, Hey, driver A can only be assigned to one of our five trucks, and that is a mathematical constraint that you can.
[00:04:59] Formulate, for example, by summing up those five yes or no decisions, which are implicitly represented as one or zero in the computer. And if you sum them up, that sum could be between zero and five because all of these five answers could theoretically be yes, but in order to not allow that, you can say, Hey, this sum of the five yes or no decisions has to be smaller than or equal to one.
[00:05:22] And that means only one of these decisions can be yes.
[00:05:25] Dr Genevieve Hayes: They could also be continuous. For example, in the example you gave previously with the temperature in the explosives factory,
[00:05:31] Dr Tim Varelmann: Yeah, definitely you can have continuous variables as well.
[00:05:34] Dr Genevieve Hayes: What sorts of techniques do you typically use to solve problems such as that?
[00:05:37] Dr Tim Varelmann: So the big thing is mixed inte optimization. I'd say there are other techniques for optimization, constraint programming. There is also primal solvers, which are coming up recently. But the majority is really mixed into the optimization. Which is a very formal mathematical technique.
[00:05:56] Dr Genevieve Hayes: Back in the day when I was studying statistics as an undergrad, I took a course in managerial decision analysis and we were taught techniques such as linear programming and the simplex method. Do you ever use these techniques or are they too basic by your standards?
[00:06:11] Dr Tim Varelmann: No, the simplex method is probably the most fundamental, basic, even for mixed integer programming. In fact every mixed integer program. So mixed inte just means you have some continuous variables such as your temperature you mentioned, and some yes no decisions, for example, or just other.
[00:06:29] Decisions that have to take integral values, you can't send two and a half parcels to a client. You can either send two or three and that's an inter decision variable. And if you have both of these, then you have a mixed integer optimization problem. And the way they solvers internally solve them is actually.
[00:06:45] Pretend that we could send two and a half parcels out because then you go from a mixed integer problem to a linear problem on which you can use the simplex method to get a solution. But because you've simplified the problem, you're not done yet, but it's still the first step that you would take to solving these problems.
[00:07:02] Dr Genevieve Hayes: So the first step by assuming that you can have two and a half drivers, that basically just gets you in the right ballpark and then you only have to test in that area within that ballpark. Is that right?
[00:07:14] Dr Tim Varelmann: Exactly. You mentioned area, so I have a table here in front of me. Think of the table as a giant search area where you're searching for an optimal solution, and what these mixed integer algorithms do is essentially divide and conquer. You can. Say, Hey, we only are allowed to take some values on this table that have discrete points,
[00:07:33] because in the middle, no solution is allowed. But nevertheless, we allow for the entire table to be searched, first of all, because that's amenable, to be optimized by a simplex method. And then you have some. Estimate already. And now you can divide the table and say, now let's just look at the left half of the table.
[00:07:52] And there the simplex method will give you maybe a sharper answer. So if you're minimizing costs it will give you a lower bound that might be better. So the lower bound is higher in this case. And if you say, Hey, I know on the left hand side my. Lower bound cannot be lower than, let's say $80,000.
[00:08:12] And on the other side of the table, you have already found a solution that is better, maybe with just $5,000. Then you have proven that you don't need to look on the first side of the table anymore, and therefore you have reduced your search space, and that's super efficient.
[00:08:24] Dr Genevieve Hayes: So it's basically just a matter of removing parts of the search space until it's small enough so that you can easily find the answer.
[00:08:31] Dr Tim Varelmann: I mean in the details, the math is super, super complex. But the very basic idea is exactly that. Yes. One of the big companies in mixed inter optimization is called Gu Robbi. They got the name from the three founders go row for Rothberg and B for Bixby, and. These three founders I think it's attributed to them to say mixed indige optimization is a bag of tricks, and that's literally true.
[00:08:54] Guru has just in November of 2025, released their version 13, which is. A hundred x faster than the original version. And only the jump from version 12 to version 13 was 16% faster, if I remember correctly. And so this is like a world class product already in version 12, and they still have so many things to tweak with that they can improve it.
[00:09:20] Going from version 12 to version 13 by 16% runtime on average because it's just this giant bag of tricks. So in the details, you really have a plethora of options, but the very basic idea is exactly what you mentioned.
[00:09:32] Dr Genevieve Hayes: Yeah, I keep coming across the terms gams and guro in the context of optimization problems, what are they and how do they fit into all of this?
[00:09:41] Dr Tim Varelmann: Okay, so maybe let's start with Guro, because we just mentioned that Guro is the solver, so the algorithm that really. Solves the problem that applies. The simplex method applies all the other bags of tricks in order to come up with a solution in an efficient manner, and in the end it'll, tell you my recommendation is to assign threat to truck two and KT to truck three.
[00:10:04] And that's how we should do our logistics because it will lead to minimal cost and also adhere to all of the constraints, maybe Fred doesn't want to work in the afternoon and has a contract that allows him to do so. Stuff like that. So that's what Guro does actually solving the thing and coming up with a mathematically proven answer.
[00:10:20] Dr Genevieve Hayes: So it's the software behind it all.
[00:10:22] Dr Tim Varelmann: Yeah, exactly. Gams is also a software, but it's a different kind of software. It's a different job that it does. So gams actually, comes earlier in the process of solving an optimization problem because gums collects the model. So that is. Essentially a translation tasks. So let's say if I go into a company where they have a logistics optimization problem, what I typically do is I listen to lots of people, ask questions, make notes, and then when I've talked to everybody I needed to talk to, I go back into the office and get my hands dirty and write some software.
[00:10:58] And gams is an environment that helps me. Translate what I have learned in the real world into mathematics so that the computer can understand it. And really, the output of gams is a model formulation. So that's a translation of the real world in mathematical terms that guru can understand, and Guru will take as input and then will start doing its thing.
[00:11:21] So gams comes earlier and just makes the software engineering and the bookkeeping of all the variables and stuff like that. Very easy.
[00:11:28] Dr Genevieve Hayes: So it's like accounting software for optimization problems.
[00:11:31] Dr Tim Varelmann: Yeah. I think for starters, that's a valid image to have. Yeah.
[00:11:35] Dr Genevieve Hayes: So in that example you gave, right at the start of the episode, you're talking about how if you had the drivers, each one would be a yes no indicator variable, or be represented by one of those. And then you'd want to say, if we had five drivers, the sum of all of these variables has to equal five.
[00:11:54] That would be something that you would translate into. Gams, is that correct? And then that would go into Guro.
[00:12:01] Dr Tim Varelmann: Yes, so gaps is the environment that allows you to write down this constraint, I have five drivers. I need to use five drivers. So I am summing up all the drivers I have and set this some equal to five. And. You can do that anywhere. Guru also has an API where you could just formulate this one constraint.
[00:12:19] What gums really makes easy is if you have large scale problems. So let's say you don't just have one city where you do this optimization, but you have five cities, maybe in Australia, you do it in Perth, in Brisbane, in Melbourne, and in Sydney, and you don't really want to write this constraint five times for each of those cities. always the same, essentially, but it just depends on how many trucks you have in Adelaide, how many trucks you have in Melbourne, and you just have the data somewhere. How many trucks are available, but you don't really want to formulate this. Specific constraint, the sum of all drivers available has to equal the amount of trucks that we have so that all of the trucks are going.
[00:12:58] You don't want to write it down five times and gums allows you to write it down just one time and say, Hey, extend this over all of the cities that we have, and here is where you find all the data from the cities that you need to automatically instantiate this constraint multiple times for me.
[00:13:15] Dr Genevieve Hayes: So you'd create a template constraint and then it would yeah, I get it. Yeah, that makes sense. So up until now we've been considering optimization on its own and ignoring the machine learning aspect. Now let's consider the situation where you combine machine learning with optimization in the decision making sense.
[00:13:34] How would that work exactly?
[00:13:36] Dr Tim Varelmann: So machine learning really shines with predictions and many complex optimization problems can largely benefit from good predictions in an energy use case. For example, when you are. Optimizing the operation of an electricity grid. You might want to have a weather forecast and some information about that.
[00:13:55] And you need to translate the weather forecast into where do we actually expect sunshine and where do we expect wind? So that influences how much renewable electricity will have available, where that electricity will be, because the topology of the electricity grid also plays an important role and the raw information, what your forecast provides is just the weather forecast, and then you have several steps of processing this.
[00:14:20] That can all be interrelated if you're unlucky in a complex case. And all of these inters and all of these processing steps from the raw information is part of your optimization model, is part of this translation that I just mentioned. Where I translate what's happening in the real world into mathematical formulas, and you can really use machine learning for these predictions and then just work with this in the model.
[00:14:45] Just describe in terms of math, how these predictions influence the decisions you want to make. And then you can inform the optimizer about all of these things at once. The optimizer will take all of them into account and provide you with an according optim.
[00:15:02] Dr Genevieve Hayes: I get it. So with traditional decision making optimization. You are trying to optimize for decisions in a certain world where you know how many parcels you're gonna have to deliver in the future, or what the weather is going to be like in the future because you're living in this magic world where you know that it's definitely going to.
[00:15:22] Be rain or sun or whatever in a day. Adding machine learning to the mix allows you to optimize for decisions in an uncertain world where you don't know what the future is. So you need to make a prediction about the future. Is that right?
[00:15:37] Dr Tim Varelmann: Yes. That's at least the major. Use case, of combining machine learning and optimization. You can obviously also use machine learning for different things. It's not just useful for forecasting and taking uncertainty into account. Sometimes you can think of things in the real world that you can't really describe with any physics.
[00:15:57] Because you just have data available. So one example from my practice is I've worked with an internet company. They wanted to optimize their websites, so they had some design decisions to make and they wanted to optimize their revenue. And there is no physical model for how a human behaves on a website.
[00:16:14] All they have is data. But you can take all this data. That's something machine learning shines with dealing with lots of data and turning that into useful insights and. You have something that you don't have any physics for, but you can still create a model that informs the optimizer about the human behavior that we have observed at least.
[00:16:33] And this way you can use machine learning and optimization together in a way that's not necessarily optimization and uncertainty, but you just model things that you don't have any physics for and you only have data available.
[00:16:45] Dr Genevieve Hayes: That's awesome. How do you connect the two? Is there some special software or do you have to write custom software to bring the machine learning outputs into the decision model?
[00:16:56] Dr Tim Varelmann: It's up and coming. This whole idea of really combining them is super modern, I don't think five years ago. Yeah, maybe that was pre pandemic, right? A lot less people were doing this combination than are doing these days. And of course then also you have tools developing and I think there's lots of potential for tools to come in the future, but we already start to see.
[00:17:19] Tools that support integrating machine learning into optimization. So Gams, for example, has a Python interface called Gams pi, which does a tremendous job because now you're already in the Python world where all this machine learning is happening. Therefore, it makes sense to provide some tools that do this for you.
[00:17:35] But under the hood, what they all do is just, machine learning. Let's say you have a neural network or a greater booster tree. Whatever is going on in this model. You can also describe that with mathematical formulas and nothing else is happening. It's just essentially these tools are just software engineering things to provide you with nice, easy to use abstractions so that you don't have to deal with formulating all these equations on your own.
[00:17:58] You can just say, I have a neural network, and that gives you a bunch of equations. But in the end, if you are looking at it from the guru perspective or any other solver, they will get a bunch of equations that just describe what is going on in that neural network. So what are the activation functions?
[00:18:13] What is the layers in between? How do you combine the outputs from one layer into inputs of the next layer and stuff like that. All of that is essentially just equations and now we start to see tools, especially those bookkeeping tools that make your life easier with this as well.
[00:18:27] Dr Genevieve Hayes: You said that this is only really a new thing that's people have started doing in the last five years, but. Things like weather or the number of parcels that arrived, these were still uncertain five years ago. What were people doing before they started integrating machine learning forecasts into their decision making optimization problems?
[00:18:50] I.
[00:18:50] Dr Tim Varelmann: All kinds of things. You do what you can. Obviously hardware accelerates tremendously now we see, especially lots of GPU growth which on the data side, it has tremendous impact. But also, just CPU's become more and more powerful. As I mentioned also on the algorithmic side from Guro version one to Guro version 13, we had a 100 x improvement.
[00:19:12] So if you had exactly the same hardware that we used to have back in the early two thousands guro, these days would already be 100 times faster on that. Very old, what is it? Inter pen, CPU. And yeah, like using machine learning in optimization models really blows up the problem and it can take you to a situation where your problem becomes too big to solve essentially.
[00:19:37] And. When you have less computational power than we have now. For example, 10 years ago we had less computational power. Then obviously that border of what is possible computationally is a lot tighter than it is these days, and therefore this only became popular right now.
[00:19:55] Dr Genevieve Hayes: Okay, so as you said, there's an extra cost to adding that machine learning layer to your solutions in terms of, financial time computational cost. How do you typically know when that extra cost is justified and when you should just use the techniques that people were using five or so years ago?
[00:20:17] Dr Tim Varelmann: I think a higher resolution is always better. It's just a matter of can you afford it?
[00:20:21] Dr Genevieve Hayes: Yeah. So it's a matter of do you have the budget and do you have the time budget?
[00:20:26] Dr Tim Varelmann: Yeah, the time budget and the computational budget as well, which translates into time in sub manner.
[00:20:30] Dr Genevieve Hayes: And that would really depend on how important is the problem. So if it's something like trying to predict things in the explosives factory, I'd guess that would be a problem where the value of extra resolution would be very high. Whereas if it's just, I don't know, optimizing stock in a clothing shop in a local suburban mall, it might not be worth it.
[00:20:58] Dr Tim Varelmann: Another aspect is you might have just one of those explosive chemical plants and you might have hundreds of thousands of clothing stores. So maybe that scale makes the problem important again, but in principle I'm with you. Yeah.
[00:21:10] Dr Genevieve Hayes: Yeah, I get it. Okay. So for data scientists who've never done optimization before, beyond training machine learning models, what are the first steps that they could take to start learning this skill?
[00:21:23] Dr Tim Varelmann: Optimization models have a bit of a different mindset to them. It's a bit of a difference maybe between procedural programming and declarative programming or functional programming for those of you who are into the software world, a bit more if you do a machine learning model or a data science model, the model more or less describes the computations that are happening inside the computer in order to evaluate the model.
[00:21:49] And that's what I mean by procedural programming. In an optimization world that's a little different. We mentioned these tools, gams and guro, and they have different purposes. Gams is really the tool that helps you set up the model, and that model is just a description of the properties of the final decision you want to have.
[00:22:08] So all of the constraints just. Describe what is not allowed essentially, and everything that's not allowed remains to be allowed. And then the second thing that you need to describe is how do you compare two decisions? Both of them are allowed, but one of them is better than the other. So what is a quality metric for a decision.
[00:22:26] And that's typically a dollar value, so the entire cost of that particular logistic transportation plan, maybe two of those plans are allowed, but one costs $10,000 more. Then the quality metric is just the entire cost, and that allows you to compare two of these allowed decisions to each other.
[00:22:44] So really you are just modeling what you want to achieve in the end, and you're not really specifying how to compute it because that's all implemented inside of Guro.
[00:22:53] Dr Genevieve Hayes: So really the first step would be understanding that whole different mindset and getting your head around that.
[00:22:59] Dr Tim Varelmann: Yeah, exactly. Starting with some simple examples. See how this goes. This is not rocket science, it's working with logic and the basic idea is what we've already discussed. You can sum up a couple of yes and no decisions and as we've discussed, you can say, Hey, I want to sum up these five yes or no decisions and say.
[00:23:17] The sum has to be smaller than or equal to one, so that only one of these five decisions can be a yes. And that's the mindset shift you need to make. And that's something where I think people should start
[00:23:28] Dr Genevieve Hayes: if one of our listeners wanted to start incorporating optimization into their work tomorrow, what's a good problem type or use case that they could look out for where combining machine learning and optimization would create immediate value?
[00:23:42] Dr Tim Varelmann: Whoa, whoa. Okay. So combining the two is hot at the moment. And I would not advise doing that just to start with really. But do, try to find explicit decision problems, it's a difference between these decisions when you have, for example, logistics, you have production planning, which employee will be dealing with which production line, and do they have the skills?
[00:24:04] Do they have the equipment, do they have the resources? Those are the kinds of decisions that optimization is great with. Try to familiarize yourself with what the kinds of decisions are where optimization is powerful. And once you've done that and seen a couple of examples, you'll see it everywhere.
[00:24:19] Go through your life, get up in the morning, and before you even have lunch, you will have found 10 of those optimization problems really.
[00:24:28] Dr Genevieve Hayes: So for listeners who wanna get in contact with you, Tim, what can they do?
[00:24:32] Dr Tim Varelmann: Yeah, absolutely. I have my website, Bluebird optimization available bluebird optimization.com, where I have a blog. You can check that out. There are some deeper topics and also some fun topics. I have a blog post on helping Santa Clause with optimization. I have a blog post on solving.
[00:24:48] Wrote letter, Harry Potter part. So for those of you who are interested in that, you can read these things also. I will share a resource with you or you can share it with your listeners which is a three step guide for starting your transformation, going from data science to optimization. Actually it's three and a bonus step to be honest.
[00:25:08] And it's really built to. Instruct you when you really start off, give you some small actions that you can use to build momentum. And I think that's very helpful for anyone who hasn't heard of optimization until now, but now thinks this is very interesting. We have to make your first steps and see what's out there.
[00:25:26] Dr Genevieve Hayes: And I'll put a link to that resource in the show notes for this episode.
[00:25:29] Dr Tim Varelmann: Awesome.
[00:25:31] Dr Genevieve Hayes: And there you have it. Another value packed episode to help you transform from technical implementer to strategic expert. If you enjoyed this episode, why not make it a double next week? Catch Tim's value boost a 10 minute episode where he shares one powerful tip for creating real strategic impact right away.
[00:25:52] Make sure you're subscribed so you don't miss it. And if you found today's episode useful and think others could benefit, please leave us a rating and review on your favorite podcast platform. Thanks for joining me today, Tim, and for those in the audience, thanks for listening. I'm Dr. Genevieve Hayes, and this has been Value Driven Data Science.

Episode 96: Making Better Decisions with ML and Optimisation
Broadcast by