Episode 20: Using Data Science to Live Better for Longer

Download MP3

00:00:00 Dr Genevieve Hayes
Hello and welcome to value driven data science brought to you by Genevieve Hayes Consulting. I'm your host doctor, Genevieve Hayes, and today I'm joined by Doctor Tori Callen to discuss using data science to live better for longer. Tori is the data scientist at Australian health tech startup.
00:00:21 Dr Genevieve Hayes
You are as well as working as a data scientist with fintech startups.
00:00:26 Dr Genevieve Hayes
Peggy, he's spent the past five years setting up AI and automated risk management for leading finance companies in Australia. Tori, welcome to the show.
00:00:37 Dr Torri Callan
Hi, thanks for having me.
00:00:39 Dr Genevieve Hayes
We all want to live long, happy and healthy lives.
00:00:43 Dr Genevieve Hayes
And in the age of technology, it comes as little surprise that people are turning to data to do just that.
00:00:51 Dr Genevieve Hayes
Between smartwatches, aura rings, and fitness apps like Strava, we're all generating massive quantities of personal health and fitness data each day, sometimes literally in our sleep.
00:01:05 Dr Genevieve Hayes
But that data is only valuable if it can be converted into useful insights, and that's something that a lot of startups are now looking to.
00:01:14 Dr Genevieve Hayes
Right.
00:01:15 Dr Genevieve Hayes
One such startup is UR spelled UARE.
00:01:22 Dr Genevieve Hayes
Which as I mentioned before, Tory, you're the data scientist for now. For listeners who haven't come across, you are before. Can you give us an overview of what it does?
00:01:36 Dr Torri Callan
You I was born out of the desire to make sense of the wealth of data with the deluge is a better word of personal data that you get from your fitness and health trackers.
00:01:49 Dr Torri Callan
I think for those listeners who are familiar with.
00:01:52 Dr Torri Callan
Strava, Garmin, or a group Apple.
00:01:56 Dr Torri Callan
You look through the menu of those apps and you get a lot of numbers thrown at you. So for most people, I think it's probably too much.
00:02:06 Dr Torri Callan
In some ways you have a lot of you. You have a lot of numbers and a lot of data thrown at you, and it's difficult to contextualise all of that to make sense of.
00:02:16 Dr Torri Callan
Where that should be, and I think what's really challenging is what people need to actually do about it. So what we're attempting at you are is to.
00:02:26 Dr Torri Callan
Develop a a broad scope holistic view of an individual that can actually contextualise all of your information and give you recommendations and feedback as to what you need to do to improve overall health.
00:02:42 Dr Torri Callan
Well being and longevity.
00:02:44 Dr Genevieve Hayes
What sort of recommendations does it produce?
00:02:47 Dr Torri Callan
The key phases of the business is that a lot of the drivers of longevity and overall health is how long you're spending on exercise and activity. And then there's this key thing that we think is important.
00:03:01 Dr Torri Callan
That's as far as we can tell, fairly unique in that your functional performance is actually what drives quite a lot of overall health.
00:03:11 Dr Torri Callan
And then as you get into your old age, how well you live in your old age, I suppose it's not a unique idea in the health space because there are other.
00:03:19 Dr Torri Callan
There are plenty of practitioners who have spoken about the need to stay fairly fit and healthy, but what we're trying to do is.
00:03:25 Dr Torri Callan
Bring some recommendations around how well you're performing and also.
00:03:30 Dr Torri Callan
As well as how much you're doing and then also trying to contextualise that with at some point rest sleep, making sure that that base is covered.
00:03:40 Dr Torri Callan
Some of the insights we're working on at the moment and keep in mind.
00:03:43 Dr Torri Callan
This is very much a work in progress has been around trying to give a baseline and a comparison of your performance in any particular activity.
00:03:53 Dr Torri Callan
Compared to what we think is achievable. So for someone of your age and your gender for example, if you went for a 5 kilometre run.
00:04:01 Dr Torri Callan
And for myself, at age 30 and as a male.
00:04:06 Dr Torri Callan
We know that the fastest that someone can run that is roughly 12, I think 12 1/2 minutes.
00:04:12 Dr Torri Callan
And so we know that if I go and do that in 20-3 minutes, I might be at, let's say, 50% of my overall achievement level that I could actually pursue if I wanted to. And So what we could do is we could give that a score.
00:04:27 Dr Torri Callan
Out of 50 out of 100, sorry or what we might wanna do is we say, well, actually the 12 1/2 minutes, I think it's Joshua Chapter Guy who set that world record.
00:04:37 Dr Torri Callan
That's achievable by someone who specialised in running and running for that particular distance. But what we actually know is that for someone who wants to be really fit and healthy into their old age.
00:04:49 Dr Torri Callan
They want to be really fit and healthy across a number of different domains in terms of.
00:04:54 Dr Torri Callan
Time and in terms of model domains, so not just being a really good runner, but being able to swim well, being able to ride a bike well, being able to lift well in the gym.
00:05:03 Dr Torri Callan
And so I don't have exact numbers here. I'm approximating, but maybe that's 17 minutes for that performance is what we think is actually really necessary for you to try and achieve. So what we can then say is, hey, you've you've done this 5 kilometre run.
00:05:19 Dr Torri Callan
We've looked at the heart rate that you did it at and so we saw that actually you were sitting at what we might call a level or a Zone 3 or zone 4. So we know that if you went as hard as you could.
00:05:32 Dr Torri Callan
For that particular distance that you might actually be able to do it in 21 minutes or 20.
00:05:38 Dr Torri Callan
That's. And so we think that you're actually achieving 3/4 of what your overall performance level is. So that's, I mean, that's one domain of insights. But what we also do at the same time is we say well, because you've done this activity.
00:05:53 Dr Torri Callan
You might not be at 100% of your achievement level, but that's fine because you're accumulating this volume of training over time, and we know that's really useful in the way you build an aerobic base and the way you improve your cardiorespiratory system.
00:06:07 Dr Torri Callan
You also know it's really valuable just in terms of moving the body. There's all this data around how regular exercise and activity improves a lot of health outcomes and longevity. When you look at the totality of someone's lifespan.
00:06:22 Dr Torri Callan
And so we can also say, well, you've actually done a really good job in terms of achieving the activity minutes for this week and you've had a really nice mixture of moderate and low intensity and then a high intensity amount of work, both which are quite important and then we can summarise that and say well actually you're.
00:06:42 Dr Torri Callan
Achieving really well because you're you're tracking.
00:06:45 Dr Torri Callan
Upwards in terms of your performance, so it's getting better and you're doing the work to make your performance get better.
00:06:51 Dr Torri Callan
And even if you are well away from the performance level because of lifestyle factors, because you've been sedentary for most of your life, you're doing the right thing. And so we can give you some really valuable feedback on.
00:07:05 Dr Torri Callan
Yeah, the accumulation of minutes that is in a really positive aspect.
00:07:10 Dr Genevieve Hayes
So if you'd had someone who had spent most of their life sitting on the couch watching television, who suddenly decided that they were going to try and run that 5 kilometres, now obviously it would not be possible to go from the couch to 5K's in a single day, so it would provide positive feedback to the fact that just doing anything is better than doing nothing.
00:07:31 Dr Torri Callan
One of the really exciting things as a data scientist is actually coming into this product really, really early on and thinking about a lot of the personalization.
00:07:42 Dr Torri Callan
And different personas from the start, I think, which I find that quite unique compared to other products I've worked on, where there's a really stable feature set.
00:07:53 Dr Torri Callan
For one, a single cohort or a single view of someone, and then if you want to talk about personalization or recommendation systems.
00:08:02 Dr Torri Callan
Or kind of an intelligent way of serving up a product. Yeah, you have to kind of tack that on top of what's already there.
00:08:09 Dr Torri Callan
So a lot of what?
00:08:11 Dr Torri Callan
We've discussed and thought about and are starting to implement is having almost personalised streams depending on the characteristics of someone who comes to you and and what lifestyle.
00:08:22 Dr Torri Callan
Stage they're at so one example of of personalization strategy we can and employ what we're working on at the moment is.
00:08:29 Dr Torri Callan
You have an individual who's potentially active in their teens and 20s, and maybe it's reached their 40s and 50s and hasn't been as active for the last 20 or so years.
00:08:41 Dr Torri Callan
You know, work life gets in the way, that sort of thing happens to most of us. And So what we can do is we can design key product.
00:08:50 Dr Torri Callan
Those and key bits of information where we might not get into the whole performance aspect cause that's just not relevant for those people. But what?
00:08:57 Dr Torri Callan
We can do is we say.
00:08:58 Dr Torri Callan
Well, if you started doing half an hour of activity a week, this is gonna have this percentage impact on your longevity. We know for someone.
00:09:08 Dr Torri Callan
I don't have the numbers off the top of my head, so hopefully people will forgive me for approximating, but we might know the longevity of someone who's in their mid 40s based on where they live. Let's say their life expectancy is 80 and we might be able to see that if we.
00:09:22 Dr Torri Callan
If someone in that situation increased their activity levels to half an hour a week by setting some goals to say we'll actually want you to try and hit 150 minutes of activity in the week and then beyond that start to get into ideas around how efficient you're moving and how well you're moving.
00:09:43 Dr Torri Callan
11 keep it that we've been able to.
00:09:45 Dr Torri Callan
Do so far is actually look at.
00:09:47 Dr Torri Callan
Like incidental activity so.
00:09:50 Dr Torri Callan
Apple Health, I think, does a pretty reasonable job if you're going to.
00:09:55 Dr Torri Callan
If you go into what they serve up, they'll give you information about how many steps that you've done and then how many flights of stairs you're doing and give you a comparison and how well you might be moving.
00:10:05 Dr Torri Callan
But a lot of other fitness trackers tend to ignore what you do in between intentional business activity.
00:10:12 Dr Genevieve Hayes
So Strava, you pretty much have to start an activity in order to record it.
00:10:18 Dr Torri Callan
And what tends to happen with strawberries, it becomes quite performative because.
00:10:24 Dr Torri Callan
Strava just popping really like it's a popular product, but I think it motivates people to display activities in a really performative way.
00:10:33 Dr Torri Callan
So what gets shown is how far you went in a particular activity, how far you ran, how far you spam, how far you rode, and then how quickly you did it. There's almost a disincentive.
00:10:45 Dr Torri Callan
For people to do, like easy activities, I think like a really common bit of feedback you'll hear heard this from cycling groups. Quite a lot is that once people get onto Strava.
00:10:54 Dr Torri Callan
They'll want to do every ride as hard and fast as they possibly can because that looks better when it gets onto the app. All they actually know is that a lot of.
00:11:05 Dr Torri Callan
Training mileage is really useful at a really low level, you almost want to sit at a comfortable pace where you could have a conversation like we're having now while you're doing an activity, because that is why it actually accumulates that that's what allows your body to accumulate some good aerobic cardiorespiratory fitness over many years.
00:11:27 Dr Torri Callan
That's why you see a lot of insurance athletes. So I'm intra athletes and Tour de France cyclist peak in their early 30s is because because they've had almost 10, maybe 15 years of doing a lot of training volume, it tends to be the case that once you've done all of that work, you actually.
00:11:47 Dr Torri Callan
And get really really.
00:11:48 Dr Torri Callan
Fit in a short amount of.
00:11:49 Dr Torri Callan
Time. So the focus.
00:11:51 Dr Torri Callan
On trying to move really hard and move really quickly in a short amount of time.
00:11:58 Dr Torri Callan
Is really useful, but can actually be detrimental if that's all you're doing, and I think this is where tech kind of gets into the loop and actually starts to impact people's behaviour, but not in an an in intentional manner.
00:12:11 Dr Torri Callan
And I think that's one area in which we know that we can have a pretty big impact. So we've started to work on is just saying to people.
00:12:18 Dr Torri Callan
The activity minutes is what matters like accumulation of time, movement, spending time outside trying to do it with other people because we know that social contact and being in a group of like minded people is really valuable and really impactful on your health and well being.
00:12:35 Dr Torri Callan
And that's the sort of feedback that we want to give in a more intentional way, so.
00:12:39 Dr Torri Callan
Going back to our early example like.
00:12:42 Dr Torri Callan
For someone who's been sedentary, we might not show a performance level and we might not show a lot of information around pace or heart rate data or trying to give a really high level analysis of someones activity. All we might do is we might say.
00:12:57 Dr Torri Callan
For a run you ran this far, you ran for this amount of time and you added this amount of time onto.
00:13:04 Dr Torri Callan
The total activity you need for the week. See you now. Let's say another half hour. Closer to your goal.
00:13:09 Dr Torri Callan
And because you've that much closer to hitting your goal for this week, and you've done the last, let's say six weeks of hitting your activity minutes goal, we know that it's going to have.
00:13:21 Dr Torri Callan
A small but pretty valuable impact on your life expectancy.
00:13:25 Dr Torri Callan
I can give you some feedback to say if you keep going then the impact on your longevity and life expectancy is going to keep going up and up.
00:13:34 Dr Genevieve Hayes
One thing I've found cause I've used Strava in the past, I find that gets very intimidating after a while.
00:13:38 Dr Genevieve Hayes
For all of those reasons that you just mentioned that you know, it feels like if you're not really going to be pushing it, then you're just going to embarrass yourself.
00:13:47 Dr Genevieve Hayes
But I can see that what you're describing would be really motivating, because the prize is however many minutes or days that you're going to get extra at the end of.
00:13:57 Dr Torri Callan
Yeah, exactly. And I think we need to be focused on who the person is. And one thing I always keep in the back of my mind is we're not treating people as a source of data or we're not treating people as the activities they do or the device that they have. We're trying to see.
00:14:13 Dr Torri Callan
People as people.
00:14:15 Dr Torri Callan
And trying to give them feedback based on where they are in their life and what they're trying to achieve.
00:14:22 Dr Torri Callan
And so for most people, all we want to do is say you've added another half hour to activity and we just want you to keep going and want you to keep doing that.
00:14:31 Dr Genevieve Hayes
As a data scientist, how do you manage to wrap your head around the idea that the data in front of you is not just data points?
00:14:39 Dr Genevieve Hayes
Those data points actually connect to real human beings because I know that's something that a lot of data scientists struggle with.
00:14:46 Dr Torri Callan
The key is having a life outside of data science. For mine I have a pretty they call it robust training volume through the week I have for most of my life and so I've just sort of continued doing that as I started to work.
00:15:00 Dr Torri Callan
And so it's easy for me to think about, especially in this context because it's so close to what I tend to look at day-to-day for fun and out of interest anyway, it's really easy for me to think about that as a person 1st and then a data scientist second.
00:15:19 Dr Torri Callan
And so I.
00:15:19 Dr Torri Callan
Think maybe the key for.
00:15:22 Dr Torri Callan
For people who were potentially newer to the field.
00:15:26 Dr Torri Callan
Or potentially fairly deep down a specialty is because it can become quite engrossing and I think quite a lot of data scientists have somewhere between obsessive and really interested personalities.
00:15:39 Dr Torri Callan
It becomes really easy to know a lot about a narrow kind of set of tools and techniques and different ways of looking at things.
00:15:49 Dr Torri Callan
And it's harder to miss the full context. I I don't wanna be too prescriptive here and say this is the best way of doing things.
00:15:57 Dr Torri Callan
I think I think certain approaches are really valuable in their time and place. If I were trying to reverse engineer this approach for other people, I would suggest try and have that view outside of.
00:16:10 Dr Torri Callan
Data Science 1st and my background was in statistics and this is something that.
00:16:15 Dr Torri Callan
Because it's a bit more of a mature field, it was perhaps taught a lot better and the the two questions that I've always really liked repurposing, I think it was Don Rubin who came up with them, the statistician.
00:16:29 Dr Torri Callan
30 odd years ago.
00:16:31 Dr Torri Callan
But he he would always ask people. Well, if you had no data at all, what would you do?
00:16:37 Dr Torri Callan
Maybe before you come to me with an analysis or an experiment you want to run or a model you want to build, what would you do if you had absolutely no idea what was happening?
00:16:46 Dr Torri Callan
And then the second question he'd ask is, well, what if you had all the data available? So not just this small experiment or a simple model, but every bit of data you could possibly want? What would?
00:16:58 Dr Torri Callan
And what he's trying to what we're trying to feel for with those two answers, it's like, what's the, what's the baseline?
00:17:03 Dr Torri Callan
What's the default set of things that we're going to do if we don't know much if we have a lot of uncertainty and?
00:17:10 Dr Torri Callan
And we have, we never have no data, but if we don't have a lot of rich and full information about a particular situation.
00:17:20 Dr Torri Callan
And then with the second question, you're trying to understand well what would.
00:17:23 Dr Torri Callan
Happen if you.
00:17:24 Dr Torri Callan
If you had more information, if you had all of that data that you wanted, what's actually going?
00:17:28 Dr Torri Callan
To change so to bring it back to our example.
00:17:31 Dr Torri Callan
Like if I knew almost nothing.
00:17:36 Dr Torri Callan
I'd be reasonably.
00:17:37 Dr Torri Callan
Confident in saying that they need to do.
00:17:40 Dr Torri Callan
At least a little bit of physical activity every day and they, if they spend time trying to get better at the physical activity, they're doing that, they're going to feel better.
00:17:51 Dr Torri Callan
Feel happier, live better and live for longer.
00:17:56 Dr Torri Callan
And so if that's my baseline.
00:17:58 Dr Torri Callan
Then, as we gather more information from peoples devices, all we're trying to do is make that advice a little bit more.
00:18:07 Dr Torri Callan
I suppose high fidelity we're trying to give a little bit more precision to the information we're trying to give back to people.
00:18:13 Dr Genevieve Hayes
What you're saying before, what happens if you have no data? That's actually a good segue into another question. I was going to ask you, since you're working at a startup, there must have been a point where you had absolutely no, no data at all. How did you cope with that situation where you were trying to build a product from absolutely nothing?
00:18:34 Dr Torri Callan
Absolutely nothing's relative, right? It's nice to have the big database with 1,000,000 or billions of rows of data that actually when it comes to physical activity and human performance, there's quite a bit of information out there in the world.
00:18:50 Dr Torri Callan
The starting point for for us was actually just looking at a set of world records.
00:18:55 Dr Torri Callan
So starting with open division for running across a bunch of different time domains.
00:19:01 Dr Torri Callan
I think we looked at the five kilometre, 10 kilometre, the half marathon and the marathon.
00:19:06 Dr Torri Callan
You compare that between women and men to get a sense of the difference between the two. The two divisions. And then we started to compare that across Masters divisions as well, so.
00:19:20 Dr Torri Callan
We might know the marathon world record time for open men, but what does that look like for over 50s and over 60s and over 70s, what they did for us because that information is fairly available, it actually gives us a pretty good understanding of what what will happen for people as they.
00:19:38 Dr Torri Callan
Get older and what's the difference for men and women when it comes to physical performance?
00:19:43 Dr Torri Callan
And I think this is an understanding for everyone that comes through like what's the?
00:19:49 Dr Torri Callan
What is the best possible level of performance look like?
00:19:53 Dr Torri Callan
I think I mentioned.
00:19:54 Dr Torri Callan
This right at the top but.
00:19:56 Dr Torri Callan
We're not necessarily saying that everyone needs to be achieving world record times because I don't even think that it's probably outside the scope of health and longevity.
00:20:05 Dr Torri Callan
Once you get beyond a certain point of functional performance.
00:20:08 Dr Genevieve Hayes
And you'd also have with the people who are really pushing it. They're also going to do some damage to their body. So it's possibly not something you want to encourage.
00:20:17 Dr Torri Callan
Potentially, I think there are advantages to.
00:20:21 Dr Torri Callan
Trying to train at a fairly high level.
00:20:24 Dr Torri Callan
Maybe this is just my personality speaking, because I'm tend to be more of a generalist in most things I say.
00:20:30 Dr Torri Callan
I suggest that a lot of data scientists would be generalists in a lot of aspects as well, but I'd find it more interesting to try and be good at a number of different things than it is to be really good at one thing when it comes to physical activity.
00:20:44 Dr Torri Callan
Like trying to be a really, really good runner.
00:20:47 Dr Torri Callan
Is to me not as interesting as trying to be a a moderately decent runner and to be fairly competent.
00:20:54 Dr Torri Callan
When I go to the gym and.
00:20:55 Dr Torri Callan
To be I'm fairly modest swimming, but kind of sort of at least be better than what I was a year or two ago.
00:21:03 Dr Torri Callan
Yeah, maybe on the totality and this is something we're still weighing up is maybe the best version for everyone is to be really good or fairly good, decently good at quite a few different things.
00:21:17 Dr Torri Callan
And that and one interesting way to challenge yourself is just to find different activities to go be good at.
00:21:23 Dr Genevieve Hayes
So what you're talking about with your starting point being looking at a lot of those world records, where did you start with regard to longevity? Were you looking at say actuarial tables to look at the life expectancies and things like that?
00:21:39 Dr Torri Callan
Yeah, well this.
00:21:40 Dr Torri Callan
The starting point for us was to actually look at the World Health organisations recommendations on physical activity off the top of my head is about 150 to 300 minutes a week of physical activity is going to maximise the impact on longevity.
00:21:59 Dr Torri Callan
And beyond that point, what you want to do is start to optimise the the mix of moderate and intense activity time so.
00:22:09 Dr Torri Callan
Intense. I mean it. It is a little bit relative, but like we've used anything over 75% of your heart rate as an intense activity.
00:22:18 Dr Torri Callan
And something below that is what we call moderate. I suppose one interesting thing when you're doing this for the first time is you make all these approximations and 1st cuts that seem reasonable because I can think as I was talking through that of quite a few things that we could improve them.
00:22:33 Dr Torri Callan
But there's a lot of peer reviewed literature and sort of practitioners in the health and Wellness space who speak about the value of staying physically active and give kind of guidelines of what they think is reasonable.
00:22:46 Dr Torri Callan
And if you sample from enough of them, then you can start to triangulate between all the different recommendations.
00:22:53 Dr Torri Callan
Out there and get an idea of the common threads and it does seem to be the case that when it comes to longevity, you have.
00:23:02 Dr Torri Callan
I guess the baseline, which is what you'd see in an actuarial table and then you have shifts upwards or downwards from there based on how sedentary or active.
00:23:13 Dr Torri Callan
We've also started looking to data around resting heart rates and sort of what happens during sleep and what happens at rest and whether or not including that information is useful for people.
00:23:26 Dr Torri Callan
I tend to say it probably is so if you can then start to give recommendations around trying to.
00:23:34 Dr Torri Callan
Pay off some sleep debt and get. Make sure you have enough time spent sleeping and potentially quality if the data is available to give recommendations on that then we think that's going to be.
00:23:44 Dr Torri Callan
Useful as well.
00:23:45 Dr Torri Callan
There's a few challenges again coming back to the idea that if we want tech to be in this loop.
00:23:53 Dr Torri Callan
That you don't want tech to be driving behavioural change in an unintentional way.
00:23:59 Dr Torri Callan
So trying to measure as many things as possible and trying to get someone to optimise them all all at once. It's probably going to be overwhelming for most people.
00:24:10 Dr Torri Callan
And even if it's not overwhelming, I think.
00:24:12 Dr Torri Callan
I've certainly experienced this with my own data. You you can spend a lot of time and energy trying to work out what it is you need to be doing next, and I think we're a platform like ours can be really useful is you can just.
00:24:29 Dr Torri Callan
Just all of that down into the thing that we know is most high leverage and then borrow information from what other people have been able to do.
00:24:38 Dr Torri Callan
And say, well, this is actually what?
00:24:40 Dr Torri Callan
You need to be working on.
00:24:41 Dr Genevieve Hayes
Next, so just give one recommendation rather than a whole list of recommendations.
00:24:46 Dr Torri Callan
Yeah, exactly. So you give one recommendation and instead of trying to give 10 metrics that we think are really useful and tell you not even tell you that they need to be optimised, just imply that they need to be as high as they can because.
00:25:00 Dr Torri Callan
They're all scores out of zero to 100, or they're all plotted on graphs that go from.
00:25:05 Dr Torri Callan
Sort of low to high and give you a comparison week to week so.
00:25:09 Dr Torri Callan
Trying to say like, hey, we actually know that the most high leverage thing you need to do at the moment.
00:25:15 Dr Torri Callan
Is sleep more because we can stay. You're sleeping 4 hours a night and we know that you need to be around 8:00 or 9.
00:25:22 Dr Torri Callan
Well for most.
00:25:22 Dr Genevieve Hayes
People anyway, it sounds like a lot of the data science that you're doing here would be based predominantly on statistics, is that?
00:25:30 Dr Genevieve Hayes
Right.
00:25:31 Dr Torri Callan
It's a little bit circular because my approach to data science has always been based very heavily in statistical modelling.
00:25:37 Dr Torri Callan
That's where I did my PhD and my undergrad studies, so my approach has always been fairly kind of heavily grounded in the field.
00:25:46 Dr Torri Callan
One thing I've had to learn over time is to get really good at engineering practises. I'm still not good at them.
00:25:51 Dr Torri Callan
Get better, I should say. Get better. A lot of engineering practises cause.
00:25:55 Dr Torri Callan
You study your PhD, you can get away with a lot of crappy spaghetti code. It does what you need it to do, but if you change one thing, it'll break.
00:26:04 Dr Genevieve Hayes
I'm embarrassed about the code that I wrote for my PhD.
00:26:08 Dr Torri Callan
Yeah, it's crazy when you think about, like how little vigour goes into writing academic code. I think. And. And there's no kind of version control.
00:26:18 Dr Torri Callan
I used to kind of number like scripts from one up till whatever I needed to and run them all in sequence.
00:26:24 Dr Torri Callan
Rely a lot on like local files, local environment variables. Nothing was ever a function. Yeah, you do a lot of bad things then when you have to try and write code that sits in the production.
00:26:34 Dr Torri Callan
Environment and have to talk to other systems you.
00:26:37 Dr Torri Callan
You very quickly kind of work out what?
00:26:39 Dr Torri Callan
You need to do.
00:26:40 Dr Torri Callan
That on that front.
00:26:41 Dr Genevieve Hayes
What? What does your tech stack look like?
00:26:44 Dr Torri Callan
But one of the really like fascinating bits that I've worked on is trying to get like a lot of the data processing done in a really short amount of time in other places of works.
00:26:54 Dr Torri Callan
A lot of.
00:26:55 Dr Torri Callan
The tools you rely on have been kind of batch tools that run a whole set of data.
00:27:01 Dr Torri Callan
To process a whole set of data all at once, and that those tools can be really useful even if you want to do really short or really low latency batch processing. So you want to run something every 5 minutes.
00:27:12 Dr Torri Callan
But what we've been trying to do is have something that runs almost as soon as someone thinks or uploads data from a particular device, and we want particularly for like activity scores or activity processing. We want to actually have that.
00:27:29 Dr Torri Callan
In the moment. So as soon as you sync your device, you'll get that feedback and that information and then try and to propagate all that information through the different kind of metrics and recommendations that we're trying to give based primarily on a lot of SQS queues that are SA lot of average on SQS that are tied together.
00:27:50 Dr Genevieve Hayes
Button, A WS special.
00:27:52 Dr Torri Callan
Fabulous service. It'll just it'll add messages to a message bus and then.
00:27:56 Dr Genevieve Hayes
It's sort of like Kafka.
00:27:58 Dr Torri Callan
Yeah, very similar. Like I mentioned with my background, I had to learn what all of this is, what it.
00:28:04 Dr Torri Callan
As I've gone along because I've found writing the actual code to do data processing fairly straightforward, but trying to get it to talk to an SQS and then pass information back to a database or to another SQS is.
00:28:17 Dr Torri Callan
A little bit more challenging, but that's all part of the fun. So yeah, we rely on that because that allows us to do a lot of processing in as close to real time as Poss.
00:28:26 Dr Torri Callan
And then most of the processing at the moment is done with pandas and then sort of slowly introducing a lot of things that can be done just with really basic sequel, because I think that's going to be quite scalable over time and really easy to read.
00:28:41 Dr Torri Callan
I don't know how prevalent this is, but I have seen it for some people. There's a default temptation to reach.
00:28:47 Dr Torri Callan
Or the the tool or the package that will answer all your problems all the time. So using Panthers for data processing, you're using Spark because it'll scale to millions of rows all at once.
00:28:59 Dr Torri Callan
But I've actually found that writing quite a lot of things in sequel makes it really, really easy to work out what it is you're trying to do and really easy to onboard new people if they need to be working on it and really easy to like.
00:29:13 Dr Torri Callan
Debug handle errors. Add philtres you know all the things that it.
00:29:17 Dr Torri Callan
Was built for so.
00:29:19 Dr Torri Callan
Yeah, quite a few things. Since I'm introduced just with really basic SQL except.
00:29:23 Dr Torri Callan
So perhaps some of the more complicated processing that I've used for a bit of pandas.
00:29:28 Dr Genevieve Hayes
That's really interesting cause I've actually spoken to quite a few people who say that they use a lot of SQL for their data science.
00:29:36 Dr Torri Callan
Yeah, I think I think it's almost a foundational skill in data science is writing SQL. But writing SQL almost in your sleep been able to come up with a query that will answer something.
00:29:45 Dr Torri Callan
It's almost more valuable than trying to come up with like the really cool bit of code or the nice model that will answer something as valuable because you can just stack it on top of other bits of queries and.
00:29:56 Dr Torri Callan
See the lineage and see what's gonna happening.
00:29:59 Dr Torri Callan
Two different bits of the code, so I'd imagine most people are across this, but if you're not like, that's probably one recommendation I'd give is just get really.
00:30:07 Dr Torri Callan
Comfortable kind of answering as much as you can.
00:30:09 Dr Genevieve Hayes
In SQL and so you don't use any statistical packages on top of that.
00:30:14 Dr Torri Callan
Most of the stats is done offline. I sort of came up with a few models or a few kind of calculations. I'll give one example like.
00:30:24 Dr Torri Callan
We use someone's maximum heart rate.
00:30:27 Dr Torri Callan
To indicate if the activity they've been doing is moderate or intense, and what you'll have is, you'll have like a time series of people's heart rate data.
00:30:37 Dr Torri Callan
And you'll also have their age and gender.
00:30:41 Dr Torri Callan
What we can do is we could look at just your heart rate data overtime and take the maximum of that.
00:30:47 Dr Torri Callan
And that could be your Max.
00:30:48 Dr Torri Callan
Right.
00:30:49 Dr Torri Callan
But there's obviously there's sampling bias issues where if, for example, you are someone that I never wears a gum and watch when you're going.
00:30:58 Dr Torri Callan
For an easy run.
00:30:59 Dr Torri Callan
And you don't often do some high intense activity. We're not gonna actually see your true.
00:31:04 Dr Torri Callan
Max height, right?
00:31:05 Dr Torri Callan
On the other side, there are all these population measures of what your Max heart rate should be I think. But just based on age.
00:31:12 Dr Torri Callan
The very common one is 220 minus your H is roughly what your.
00:31:17 Dr Torri Callan
Maximum heart rate should be. There are updated versions of that which we use which don't round up to 220 and use slightly different coefficients I can.
00:31:28 Dr Torri Callan
I don't. I don't know exactly what it is off the top of my head, but it's.
00:31:31 Dr Torri Callan
It's similar in that day.
00:31:33 Dr Torri Callan
And so on one hand, we could see what your Max heart rate is just based on what we get from your device. On the other side, we get.
00:31:41 Dr Torri Callan
What we think it is from the population, what I then did is worked on a model that just tried to combine the weight of those try to combine those two estimates. If you think of.
00:31:53 Dr Torri Callan
Them as estimates.
00:31:54 Dr Torri Callan
And weighted them by.
00:31:55 Dr Torri Callan
How often someone was using their device?
00:31:58 Dr Torri Callan
That you said. Someone who used their device literally 24/7.
00:32:02 Dr Torri Callan
We don't need an approximation of what the Max heart rate is. We can just read it straight off what the device is telling us.
00:32:10 Dr Torri Callan
And on the other side, if you have someone that wears their device for one or two hours every week.
00:32:16 Dr Torri Callan
We're not going to apply a lot of weight to the Max we see from their device. We're just going to look at what the population thinks.
00:32:22 Dr Genevieve Hayes
So this is sort of a credibility theory approach if you ever come across that before.
00:32:27 Dr Torri Callan
I don't know the terminology I've come across like a weighted regression or yeah, I think in like survey sampling.
00:32:32 Dr Torri Callan
You'd probably do something similar where you kind of weight different surveys by either how guess how credible they are, or how much data you have.
00:32:41 Dr Genevieve Hayes
I have an insurance background and credibility theory was originally developed for workers compensation.
00:32:48 Dr Genevieve Hayes
And the idea is that it's basically a weighted average of industry experience and individual experience in order to calculate a workers compensation premium for a company.
00:33:00 Dr Genevieve Hayes
So it sounds like you're doing a similar thing, whereas but instead of combining the individual experience for an organisation.
00:33:08 Dr Genevieve Hayes
You've got the individual experience of a person and instead of industry experience, you've got general population.
00:33:17 Dr Torri Callan
Yeah, I think the methods are many, but the concepts are few right? Like I think you I think you end up seeing the same thing across many different domains, which is kind of the nice bit about working in the field.
00:33:28 Dr Torri Callan
It does sound like it does sound pretty close to what we're trying to achieve. We're just trying to weight different sources of data by how often we're seeing it. And then yeah, again.
00:33:37 Dr Torri Callan
How useful we actually think it is.
00:33:40 Dr Genevieve Hayes
You know, as a data scientist, I hate merging data that's being collected from multiple sources because it never quite meshes properly.
00:33:49 Dr Genevieve Hayes
Is that problematic for you, given the fact that you'd be using data that was collected from multiple devices?
00:33:56 Dr Torri Callan
Yeah, it's really problematic and it's like it's the most significant challenge we're trying to address at the moment.
00:34:02 Dr Torri Callan
You'll have some devices that have, like a really rich data set.
00:34:07 Dr Torri Callan
So if you go for a run, it'll tell you the latitude and longitude of where it.
00:34:11 Dr Torri Callan
Was so we.
00:34:12 Dr Torri Callan
Can give you kind of GPS data.
00:34:14 Dr Torri Callan
It will tell you the pace that you're running at any given point in time. It'll tell you elevation changes.
00:34:19 Dr Torri Callan
And so we can give you a lot of feedback about that particular activity. So we might be able to say if you did a really hilly run.
00:34:26 Dr Torri Callan
We might be able to adjust back the pace for the given gradient to a certain performance level because we know that.
00:34:34 Dr Torri Callan
You know, at a certain elevation change, a 6 minute kilometre is actually similar to a 4 minute kilometre in the flat so.
00:34:40 Dr Torri Callan
There's a lot of things we can do there, but then for other devices you don't get much information at all.
00:34:46 Dr Torri Callan
You might just get an activity name and then the amount of time and that's it. I think the owner we kind of got painted into was trying to treat it all as the same and then have Edge case handling if things were missing and.
00:35:03 Dr Torri Callan
But I'm more open to now is actually handling different devices differently based on the data you have available and then trying to make the UX and UI really.
00:35:15 Dr Torri Callan
Individualised based on what data is available as well. Again in that example, like you go for a run with a Garmin and you've.
00:35:22 Dr Torri Callan
Got a heart rate monitor.
00:35:24 Dr Torri Callan
What you might see in the front end of the app might show.
00:35:28 Dr Torri Callan
How far you went, how quick you went, how quick you would have gone for the same run in that same in like flat conditions?
00:35:35 Dr Torri Callan
How much time you spent resting versus moving, and then we might be able to categorise if it was an interval versus a time trial or a straight run, we can give you some feedback on your heart rate zones where you sat.
00:35:47 Dr Torri Callan
And then for example, you do it with a different device and it just tells us, let's say distance and time.
00:35:53 Dr Torri Callan
Instead of showing an empty screen with a bunch of empty data.
00:35:57 Dr Torri Callan
Or pretending that we have more data than we do and trying to like bulk it out with what's available, where we're heading is we're just gonna have a view which just says you went for a run and you added 60 minutes to activity, minutes goal, and you're that much closer to the goal you were.
00:36:14 Dr Torri Callan
Trying to achieve.
00:36:16 Dr Genevieve Hayes
So there's something behind the scenes that selects a particular visualisation to show people based on what a data is available.
00:36:23 Dr Torri Callan
Yeah, it's it's having like the data product really tightly integrated with the rest of the.
00:36:30 Dr Torri Callan
So that based on what data we have available, we can actually decide what goes into the front end and how that's visualised and kind of customise to some extent the appearance of it. Do you have?
00:36:42 Dr Torri Callan
Yeah, all these different options in the UX to say.
00:36:47 Dr Torri Callan
This activity just contributed to the activity goal and then maybe this activity we're going to show these fields of information and then we're sort of starting to explore like different customizations that you could do for different activities, so.
00:37:00 Dr Torri Callan
For example, you go on a run with a lot of different heels. We might want to show you.
00:37:05 Dr Torri Callan
How much elevation changed there was and how how we think that run kind of compared to what you would have done in the flat?
00:37:13 Dr Torri Callan
Whereas if you go on a run and it's fairly flat but actually beach.
00:37:17 Dr Torri Callan
And there's no elevation. There's so much point showing you how much elevation you had, but maybe you'd be more interested in your overall pace or your heart rate or something like that, so.
00:37:28 Dr Torri Callan
One of the nice bits about.
00:37:30 Dr Torri Callan
Being the data scientist very early on in the piece is that you can kind of work on adding all of the data products into.
00:37:39 Dr Torri Callan
Like, have it really tightly coupled with what's in the back end and what's in the front end, and almost have it as part of the product rather than this nice add-on that comes later, way down the line.
00:37:51 Dr Genevieve Hayes
How early into the game did you join the start up?
00:37:53 Dr Torri Callan
Yeah, it was more or less from conception. I knew the cofounders who manly Surf Club. They approached me as they were sort of formulating the idea and looking for some really early investment.
00:38:04 Dr Torri Callan
And I helped out in bits and pieces saying an ad hoc manner in just in trying to test out the idea and trying to see if if we could get a little bit of data.
00:38:14 Dr Torri Callan
To actually formulate what they were thinking about.
00:38:16 Dr Torri Callan
And then, yeah.
00:38:17 Dr Torri Callan
Over time, as they gathered a bit of investment just a little bit of advising on how to set up the team and yeah, how to how to kind of lay out the back end and the front end of the thing and then.
00:38:28 Dr Torri Callan
At certain times I've had to come in and actually write some code to to do a lot of the activity processing, and as it's all evolved to sort of spending a little bit more time on kind of guiding the product and saying seeing where it can, where the product and how it.
00:38:45 Dr Torri Callan
Gets displayed, can actually fit with the data we have available.
00:38:47 Dr Torri Callan
And I think we're now at a point where it's a little bit more mature because we have an idea of what it looks like as.
00:38:53 Dr Torri Callan
An MVP? Mm-hmm.
00:38:55 Dr Torri Callan
And we can now see where all these personalization and intelligent recommendation can actually feed into.
00:39:02 Dr Torri Callan
The way in which the product is presented.
00:39:04 Dr Genevieve Hayes
To change the topic a bit, in addition to your work in the startup space, you've also completed a PhD in statistical and mathematical modelling.
00:39:14 Dr Genevieve Hayes
Now one of the things you mentioned to me when we first spoke was that a lot of your PhD work involved the use of Bayesian methods.
00:39:23 Dr Genevieve Hayes
And that you saw these methods as raising the level of rigour of statistics. Given my own statistical background, I'm very interested in learning more about your thoughts on this matter.
00:39:36 Dr Genevieve Hayes
But before we go down this path, could you provide our listeners with an overview of what's involved in Bayesian methods?
00:39:44 Dr Genevieve Hayes
In case they've never come across them before.
00:39:48 Dr Torri Callan
A number of different approaches we could make into this.
00:39:51 Dr Torri Callan
Topic but what I might do is I.
00:39:53 Dr Torri Callan
Might give an overview.
00:39:54 Dr Torri Callan
Of how I kind of came into.
00:39:57 Dr Torri Callan
Bayesian methods, I think. I think most people have a.
00:40:01 Dr Torri Callan
Really broad level understanding of what Bayes theorem is and then what I think gets presented is like this is what Bayes theorem is.
00:40:08 Dr Torri Callan
This is how it falls out from the laws of conditional probability and and this is how you might actually use it, but that doesn't quite match up to what you guess used in practise, but the way which I I.
00:40:21 Dr Torri Callan
Came across it and was taught with through the use of hierarchical regression modelling or what some people call multi level regression models.
00:40:30 Dr Torri Callan
So these are regression models where you have data collected in clusters or groups where a particular parameter might be may or may not be.
00:40:42 Dr Torri Callan
Influenced by the presence of a group, so do canonical example that gets used is you want to look at the impact of.
00:40:50 Dr Torri Callan
Whether a student had breakfast before they did an exam on their exam scores, that's not quite canonical, but.
00:40:57 Dr Torri Callan
Get the idea we wanna have. We wanna have some idea of if this if something a student did was influential on their exam scores or achievement levels at school.
00:41:07 Dr Torri Callan
But what we wanna do is then account for the variability you'd get within classes, because different teachers obviously have different impacts on students.
00:41:18 Dr Torri Callan
And the variability that you'd get within schools as well?
00:41:23 Dr Torri Callan
And So what you would do if you imagine a typical regression model, it might be an intercept term plus a coefficient of the variable that you want to that you want to make some sort of inference about.
00:41:37 Dr Torri Callan
So in this example it might be the average. The intercept would be the average exam score, and then the coefficient would be.
00:41:44 Dr Torri Callan
Or did that student have breakfast that morning and in a multi level model what you'd do is you'd have an intercept for every class?
00:41:52 Dr Torri Callan
So you, let's say there's a hundred classes within your data set, and let's say there's roughly 10 students in every class, although that can, it doesn't have to be exact it.
00:42:02 Dr Torri Callan
Could vary. What you'd do is then you'd add on the average performance level within each class before you then decide to try and decide the value of.
00:42:12 Dr Torri Callan
The coefficient that you're interested in.
00:42:14 Dr Genevieve Hayes
OK, so it's a staged approach.
00:42:17 Dr Torri Callan
Yeah, that's. I mean, that's how I think.
00:42:19 Dr Torri Callan
Have it now. What you actually do is obviously you estimate all this at once, which is gonna, which is sort of like what leads me into patient.
00:42:26 Dr Torri Callan
Approaches what you'll often have with this sort of clustered data is you'll have a lot of groups with a lot of.
00:42:34 Dr Torri Callan
With not a lot of information, so you might only have one or two students in a lot of classes just because of the way in which you've sampled data, and then you might have some classes with a lot of students where it is easy to make that kind of influence. So what you will end up doing is you'll say.
00:42:51 Dr Torri Callan
The distribution of all of these class level intercepts is governed by some kind of normal distribution, where I found it really valuable.
00:43:01 Dr Torri Callan
Was then. It allows you to fit a much larger class of models for a certain set of data and a bit of a really interesting is you can actually start to fit start to think about models in a generative way.
00:43:16 Dr Torri Callan
And if you have an idea of what the generative process is for a certain set of data, you could think about writing that down.
00:43:25 Dr Torri Callan
And then if you put.
00:43:27 Dr Torri Callan
Prior distributions over all the parameters in whatever that data generating process is, you would be able to fit it to a certain set of data.
00:43:35 Dr Torri Callan
And so it's a lot more powerful and.
00:43:37 Dr Torri Callan
Flexible because you can.
00:43:39 Dr Torri Callan
For example, use nonlinear functional forms if you know that there are certain physical constraints on something you observe. So if if a certain set of data is sort of bounded below, its.
00:43:51 Dr Torri Callan
Zero and bounded above.
00:43:53 Dr Torri Callan
At another threshold that can be that can be information that you incorporate into your model, you can also have these monotonicity constraints, so you can say, well, I'm certain that this function starts at a value and ends at the value, and it's monotonic in between that and I don't know exactly what values it goes between, but this is my.
00:44:14 Dr Torri Callan
Rough guess, this is my approximation which I applied for a prior distribution.
00:44:18 Dr Torri Callan
And yeah, it it opens you up to a much larger class of modelling and allows you to be a little bit more.
00:44:26 Dr Torri Callan
Prescriptive in what model you write, and I think it does.
00:44:30 Dr Torri Callan
Ultimately, allow you to be a little bit more nuanced.
00:44:32 Dr Torri Callan
In your inference.
00:44:34 Dr Genevieve Hayes
And how have you managed to make use of these techniques either in your PhD work or in your work as a data scientist?
00:44:41 Dr Torri Callan
PhD with some. There's an interesting one. My my personal favourite chapter in that we were looking at the impacts of sort of an IVF treatment and some other characteristics of pregnancy on birth weight. Essentially birth weight is highly governed by the gestational age of the pregnancy. So.
00:45:01 Dr Torri Callan
Pregnancies that go to.
00:45:03 Dr Torri Callan
Let's say 20 or 30 weeks obviously tend to have premature babies that are quite small and then babies that go to full term are more likely to be around, well, you're you're more likely to see a higher birth weight baby. And there was this idea out in the literature that actually IVF.
00:45:21 Dr Torri Callan
Or listed reproductive technologies. So more than IVF.
00:45:25 Dr Torri Callan
Was actually impactful on.
00:45:27 Dr Torri Callan
Birth weight they would tend to reduce the average birth weight by a certain amount.
00:45:33 Dr Torri Callan
And So what you want to do when you're doing this kind of analysis, we had a bunch of data that we kind of wanted to use to investigate that.
00:45:41 Dr Torri Callan
What we wanted to do was say well.
00:45:44 Dr Torri Callan
Let's once you adjust for gestational age. Do you see the same thing?
00:45:48 Dr Torri Callan
And so the typical way you'd do this if you were a regression modelling is you would add in a term for gestational age and then you'd add in the term for usage of ART in the pregnancy or not.
00:45:59 Dr Torri Callan
And then you might add another bunch of terms for other characteristics of the mother that you're interested in.
00:46:06 Dr Torri Callan
But we're we're able to kind of apply this kind of thinking, I suppose you go back a step if you're gonna do that sort of model, you obviously want gestational age to be a non linear term because the impact of an extra week of gestation is not the same going from, let's say 29 to 30 as it is going from 39 to 40 weeks of pregnancy.
00:46:28 Dr Torri Callan
And there are lots of techniques you can use to get like a non linear term.
00:46:31 Dr Torri Callan
That you use splines or Gaussian processes, or polynomial terms, or there's a bunch of different options.
00:46:38 Dr Torri Callan
But where? Where, where I was able to apply like a Bayesian approach with a kind of non-linear parameterized model was we actually had like an explicit model of gestational age and birth weight, and that was parameterized. I think with a logistic curves with four parameters.
00:46:58 Dr Torri Callan
A logistic curve is like an S shaped curve. It starts at the lower asymptote.
00:47:03 Dr Torri Callan
It ends at the highest and tote and looks like an S in between.
00:47:07 Dr Torri Callan
And the four parameters roughly govern where the lower asymptote is, where the upper asymptote is, where the midpoint of that shape is, and then how steep the shape is in between that.
00:47:19 Dr Torri Callan
And then what you can do as you fit this model is you can have a a regression on each.
00:47:25 Dr Torri Callan
Parameter of that logistic curve.
00:47:28 Dr Torri Callan
So instead of having an overall ART effect, you could say well, this is the effect of assisted pregnancies on pregnancies that go to full term, because that applies to the upper asymptote of the S curve. And this is the effect.
00:47:47 Dr Torri Callan
Technologies on pregnancies in determining like how quickly.
00:47:52 Dr Torri Callan
Birth rates will increase between kind of early term and then full term pregnancies. That's what it actually does. It actually gives you a lot more information to conduct your inference on. So instead of looking at one parameter.
00:48:06 Dr Torri Callan
And then looking at.
00:48:08 Dr Torri Callan
The significance of that parameter I won't go down that rabbit hole of like significance and P values, but.
00:48:14 Dr Torri Callan
Instead of looking at 1 parameter and then conducting all of your scientific inference on that one parameter, you have a class of a set of parameters that you then can kind of investigate for marginal differences and decide on the whole if there is something you can observe.
00:48:30 Dr Genevieve Hayes
It's actually really interesting. It's something that I'd like to give a go at in my own work, and I can imagine a lot of other listeners would want to give it a try.
00:48:40 Dr Genevieve Hayes
For anyone who is interested in applying these sorts of techniques in their work, where would you recommend they begin?
00:48:47 Dr Torri Callan
A statistical rethinking is a really good textbook to start. I think Richard Mccawley, I know if I pronounced his name right. But statistically thinking in your favourite search engine will get you to the right place.
00:49:00 Dr Torri Callan
He's written that textbook as an introduction to statistics, but it's through a Bayesian workflow. So instead of teaching what at Test is and.
00:49:10 Dr Torri Callan
Then teaching what?
00:49:10 Dr Torri Callan
An OS regression is he'll just teach you the basics through.
00:49:15 Dr Torri Callan
Bayesian methods and so I think it's a really good place. He'll teach you it. The textbook kind of shows you where like.
00:49:22 Dr Torri Callan
Where MCMC comes from and how certain choices of prior distributions kind of impact the end result of a posterior under certain conditions and then also the different classes of models that often get used.
00:49:36 Dr Torri Callan
So there's a kind of lengthy introduction to the hierarchical models similar to the one I gave.
00:49:41 Dr Torri Callan
At the top but.
00:49:43 Dr Torri Callan
Kind of a lot more professionally done and then it kind of introduction to the idea of Gaussian processes, which are really are really valuable tool and then.
00:49:51 Dr Torri Callan
A sort of interaction to nonlinear models.
00:49:54 Dr Torri Callan
Non what I call nonlinear fully parameterized models, which is sort of the example that I just gave before. So I think I'd start there, I think if.
00:50:03 Dr Torri Callan
Online learning is more your flavour rather than textbooks. I use the Stan language, which is Bayesian modelling statistical language. We can write models that then can be called by either R or Python, so it's.
00:50:19 Dr Torri Callan
It can be used by other option for the same given model. What you'll do is you'll write a given model in Stan and then you'll pass data across either from R or Python, compile the model and estimate it, and then gather the posterior samples back in whatever language you were using before to analyse.
00:50:36 Dr Genevieve Hayes
So do you call it from Python or R in order to get the model trip?
00:50:40 Dr Torri Callan
Yeah, exactly. So you'll call the model and you'll pass it a set of data, and you'll specify in the model what data you're expecting.
00:50:48 Dr Torri Callan
And so, Stan, we'll take that set of data, take the model you've written, apply its version of MCMC to a sample posterior, or generate a bunch of posterior samples from the model, and then it will pass that back.
00:51:02 Dr Torri Callan
If you're working, it will pass that back to R and you'll have like a data frame of posterior samples.
00:51:08 Dr Torri Callan
That you can work with.
00:51:10 Dr Torri Callan
And you can almost analyse it as you would any other data set.
00:51:13 Dr Genevieve Hayes
That's interesting. And is there. Is there some sort of Python or library that connects the two languages like Pakistan or something?
00:51:20 Dr Torri Callan
Yeah, exactly. So π stand and R stand will have Connexions to some stable version, and then there's.
00:51:28 Dr Torri Callan
Packages called command stand Π and command Stand R and that will actually speak to like the most recent release of the stand language.
00:51:36 Dr Torri Callan
If you want to kind of use more up-to-date features, and if you start looking to the language resources there, there's documentation on sort of basic model building and basic usage of the language, and if you're.
00:51:49 Dr Torri Callan
Reference is to kind of learn.
00:51:53 Dr Torri Callan
The theory of things as you go, which is.
00:51:56 Dr Torri Callan
Really mine at the moment, then. That's actually a really nice way to start, because there's enough information there that you can pick up on the fundamental ideas of Bayesian models at the same time as learning how to write it in the language.
00:52:08 Dr Torri Callan
And there are wrappers. Well, there's definitely a wrapper in R called BRMS, and that will allow you to write.
00:52:15 Dr Torri Callan
Bayesian models, but as a regression model if you're more familiar with that syntax.
00:52:20 Dr Genevieve Hayes
So just use a standard regression model syntax.
00:52:24 Dr Torri Callan
Yeah, exactly. And it's quite like highly flexible because it's it, it can import a few different functions from other packages that allow you to use other splines or Gaussian processes if you want to do some sort of non-linear modelling. And I've used it before if you have.
00:52:42 Dr Torri Callan
An idea about like a functional form that you want to write out yourself, so I don't know the exact terminology. I call it like a nonlinear, fully parameterized model, so something like a logistic curve that you.
00:52:55 Dr Torri Callan
You want to do a regression of a certain Y variable and you want to do it on this logistic curve of your X variable.
00:53:01 Dr Torri Callan
You can just write that out in normal regression syntax and it will do all the estimation for you, and there's a nice set of functions.
00:53:08 Dr Torri Callan
That kind of.
00:53:09 Dr Torri Callan
Wrap around the output you get from the model. That will allow you to conduct a lot of your inference pretty easily and.
00:53:17 Dr Torri Callan
Quickly, I think there's an equivalent in Python, but I'm not 100% sure of the name.
00:53:23 Dr Genevieve Hayes
I want to go off and have a go at this after this episode's finished.
00:53:26 Dr Torri Callan
I highly recommend it. Like it it will. Yeah, I think it's. I think it forces you to be a little bit more considered and precise in how you do your modelling and it it, like I said it opens you up to a lot more kind of it's a lot more generative modelling which I think.
00:53:43 Dr Torri Callan
I know it's your background like I think if you studied regression models and statistics for and enough time.
00:53:51 Dr Torri Callan
You start to see where like typical regression models are a little bit of a.
00:53:56 Dr Torri Callan
Crusty and bed of you try and fit a data in a question into an answer of. Or does my regression model give me a significant coefficient, and if so, then my theory is correct.
00:54:07 Dr Torri Callan
So this does kind of open you up a little bit to be a little bit more considered and perhaps creative in how you do your modelling, which I really like.
00:54:15 Dr Genevieve Hayes
Is there anything on your radar in the AI data and analytics space that you think is going to become important in the next three to five years?
00:54:23 Dr Torri Callan
Building systems that can use data to give real.
00:54:26 Dr Torri Callan
Like intelligent feedback and have that like, have that really tightly coupled to a product is going to be a.
00:54:34 Dr Torri Callan
Real key feature.
00:54:36 Dr Torri Callan
Of a lot of AI data space and I think being able to do that.
00:54:41 Dr Torri Callan
On mass across a lot of different businesses will be valuable. Obviously the the leaders like Google and Facebook and a few others have been able to do this pretty well.
00:54:52 Dr Torri Callan
But the teams that don't have a large analytics stack, it's been a little bit more difficult. I think the challenges most data scientists have faced.
00:55:02 Dr Torri Callan
That come across the business, there's a number of business requirements that they need to address. So even though they're highly capable and technically gifted, you end up answering financers questions on how much revenue we had last month and then.
00:55:20 Dr Torri Callan
You want to answer marketing these questions of how many people did you sign up and you you have to crawl before you walk before you run.
00:55:27 Dr Torri Callan
But what I.
00:55:28 Dr Torri Callan
Forecast happening is that that will become easier and easier to solve because of the amount.
00:55:33 Dr Torri Callan
Of tools available.
00:55:35 Dr Torri Callan
And it it kind of opens up room for the next stage of technical sophistication.
00:55:42 Dr Torri Callan
And I think what that looks like is being able to come up with, not necessarily predictions from machine learning, but being able to come up with.
00:55:53 Dr Torri Callan
Different sources of insights that can actually be utilised by a product in almost an optional manner so you know we spoke about quite a few examples.
00:56:02 Dr Torri Callan
That we're starting to look at with you are but.
00:56:06 Dr Torri Callan
There are other cases you can think of for like operational type requirements. I know churn modelling has been like a really staple example of a lot of data science teams, but this is not not necessarily talking about prediction of who's likely to churn, but also giving a recommendation of where that should go. So.
00:56:25 Dr Torri Callan
If someone's likely to churn, you might want to place that person into whatever messaging service your customer service team is you and actually building out that system end to end. Or you might want to have some.
00:56:36 Dr Torri Callan
And notification go out into the app or some or you might want to actually kind of modify what the UX and the UI of the app that you're working on, like how that actually gets displayed based on some prediction you generate. So it's almost about building an end to end system that uses the data you have available.
00:56:57 Dr Torri Callan
And make the prediction, which I think I think most people kind of have their heads around, but then building, building a system around that to actually do something with it, because I think most of the data we have available is valuable.
00:57:12 Dr Torri Callan
Only in as much as you can make decisions off the.
00:57:15 Dr Torri Callan
Back of it.
00:57:17 Dr Genevieve Hayes
And what final advice would you give to data scientists looking to create business value from data?
00:57:23 Dr Torri Callan
I think being curious about what you're working on and trying to bring some level of passion to what you're doing almost as a data scientist or service for other peoples issues and taking on that service role is something like you can solve a lot of people's problems with the tools and.
00:57:44 Dr Torri Callan
Insights you have available.
00:57:46 Dr Torri Callan
Kind of sets you up in a really nice place. I'm just. I mean, as I'm speaking, I'm reflecting on this like.
00:57:52 Dr Torri Callan
Have a few successes and quite a few failures at doing this, so I think the successes have been where.
00:57:59 Dr Torri Callan
Being able to work with someone and find where that particular pain point or.
00:58:07 Dr Torri Callan
Potentially an unknown unknown in that they don't. They're unaware of what isn't working for them, but you are aware of it and being able to bring that to someone is really valuable.
00:58:18 Dr Torri Callan
And I think as all as I've also going on, it's not just about bringing in a set of insights and it's also not about bringing in a set of predictions. It's about working on like.
00:58:27 Dr Torri Callan
What you can actually build and thinking about it.
00:58:31 Dr Torri Callan
Almost. I mean, from a tech perspective, almost thinking about it as a product manager to say, well, if I were in charge of the product, this is what you could actually build and bringing that to someone.
00:58:41 Dr Torri Callan
I think the the this.
00:58:43 Dr Torri Callan
Successes that I've had.
00:58:44 Dr Torri Callan
In my career have always come from being able to work really closely with a product manager or a product owner and being able to be very prescriptive in what can actually be built.
00:58:56 Dr Torri Callan
So instead of just giving them a base level of insight or a base level of recommendations, give them the option of a few different things you can build.
00:59:06 Dr Torri Callan
And I think if you were very engineered doing that for someone, I think you'd want to be fairly passionate and fairly interested in the product and being able to solve those problems for the business you're working in. And for the people who are using it.
00:59:22 Dr Genevieve Hayes
So it's a product driven approach to problem solving.
00:59:26 Dr Torri Callan
It's it's my bias to be fair. Like I think I'm sure you get different answers from different people, but my perspective is that you have almost like a.
00:59:36 Dr Torri Callan
A change management.
00:59:38 Dr Torri Callan
Optional approach to product development like you can look at a problem in a product or in an operational part of a business and make some really foundational changes with how that's done.
00:59:53 Dr Genevieve Hayes
For listeners who want to learn more about you or get in contact, what can they do?
00:59:59 Dr Torri Callan
LinkedIn is probably the best way, no too big on the socials or blog writing, even though I've thought about doing it plenty of times. But yeah, LinkedIn, if you search my name, it'll come up. Yeah. Otherwise happy to share an email.
01:00:13 Dr Genevieve Hayes
May and I'll put a link to the URL homepage in the show notes for this episode.
01:00:21 Dr Genevieve Hayes
Hey, thank you for joining me today.
01:00:23 Dr Torri Callan
Yeah. Pleasure. Thanks for having me.
01:00:26 Dr Genevieve Hayes
And for those in the audience, thank you for listening. I'm doctor Genevieve Hayes, and this has been value driven data science brought to you by Genevieve Hayes Consulting.

Episode 20: Using Data Science to Live Better for Longer
Broadcast by