Episode 18: Making AI Commercially Viable

Download MP3

00:00:00 Dr Genevieve Hayes
Hello and welcome to value driven data science brought to you by Genevieve Hayes Consulting. I'm doctor Genevieve Hayes. And today I'm joined by Doctor Yaron Fendrick to discuss the challenges of making AI commercially viable.
00:00:17 Dr Genevieve Hayes
Yaron is the chief technology officer of proof.
00:00:22 Dr Genevieve Hayes
An Australian technology startup specialising in the development of AI driven software for damage detection and assessment for high value assets.
00:00:33 Dr Genevieve Hayes
He has over 20 years experience in video analytics for world leading R&D labs.
00:00:40 Dr Genevieve Hayes
And has over 25 patents in force.
00:00:43 Dr Genevieve Hayes
Yuin, welcome to the show.
00:00:46 Dr Jeroen Vendrig
Thanks for having me, Genevieve. I've I've listened to some episodes and they're very interesting and I hope we can give the same failure to your listeners.
00:00:53 Dr Genevieve Hayes
I hope so. Too many data scientists dream of using their skills to develop groundbreaking AI technology, but if you manage to translate their dreams into commercially viable products, to be honest, I suspect most data scientists haven't got the faintest idea of where to even begin about.
00:01:13 Dr Genevieve Hayes
Doing so, yet this is something you've managed to successfully achieve through your own startup proof tech.
00:01:21 Dr Genevieve Hayes
And that's something I'd really like to explore in this episode.
00:01:25 Dr Genevieve Hayes
However, for listeners who haven't come across proof tech before, could you begin by telling us a bit about it and how it makes use of AI in its products?
00:01:36 Dr Jeroen Vendrig
So as you said my my background is in computer phishing, so I've always been looking for problems to attack in that domain and I found a co-founder who is not technical at all.
00:01:48 Dr Jeroen Vendrig
But had a problem basically with damages that needed to be detected and so we joined forces to do that.
00:01:57 Dr Jeroen Vendrig
And what we basically do, technically two key things here is we detect anomalies in our datasets and those anomalies eventually are likely to correspond to damages, which is what?
00:02:08 Dr Jeroen Vendrig
Our users are interested, but the second aspect to it, which is often overlooked, is we not only do we detect our.
00:02:17 Dr Jeroen Vendrig
We try to project them back to the real world object if you like, and to make it concrete, we do a lot of work with cars as assets.
00:02:26 Dr Jeroen Vendrig
So and we monitor them overtime if we find in a normally in one part of an image and next week we do it again and we find in a normally in a part.
00:02:35 Dr Jeroen Vendrig
Of the image we somehow need to say that's the same spot of the car.
00:02:39 Dr Jeroen Vendrig
And we do a lot of work on that as well. So those are the key things that are AI engines work now.
00:02:47 Dr Jeroen Vendrig
My last count I I don't actually even know, but we have four or five neural networks plus some some other more traditional computer fishing techniques in our system. So there's a big pipeline where it's part of it.
00:03:00 Dr Jeroen Vendrig
That's a different task to fulfil, so it's not just one AI module that is being deployed.
00:03:07 Dr Genevieve Hayes
So it's four or five sequential. I assume convolutional neural networks.
00:03:12 Dr Jeroen Vendrig
That's correct. So there's different ways to acquire data, for example, so different networks are.
00:03:17 Dr Jeroen Vendrig
But for that, and eventually they all come together and go through the same process.
00:03:22 Dr Genevieve Hayes
You have to use different types of networks for different types of vehicles. So for example, for a car versus a motorbike.
00:03:30 Dr Jeroen Vendrig
You could, but I don't think so. We haven't tried to motorbikes. You could specialise, but in the end there is a power in having it more general.
00:03:40 Dr Jeroen Vendrig
Eric and an interesting example here is. So we've been doing this from the start for the use guys of cars and when we started out, we actually were driving cars in, in our on our own driveway and we we got a a question from somebody. Well, can we detect damage on infrastructure assets like mobile phone chargers?
00:04:00 Dr Jeroen Vendrig
Well, it has me made for that. But you know what? When we do this in our driveway, we actually detect dents on the fence. We don't do that anymore because we mask out the car at the moment.
00:04:11 Dr Jeroen Vendrig
But that's how we know. Yes, we can do that because in the end it's in the normally in the surface and it's a very different use case.
00:04:19 Dr Jeroen Vendrig
You can use the same model for that. If you have a big enough data set and and spoiler alert. Usually in AI you don't have that, but if you do yes then you can specialise in that and there would be a benefit.
00:04:32 Dr Jeroen Vendrig
But when you have smaller data sets, it's actually better to use the same model for different purposes.
00:04:39 Dr Genevieve Hayes
I remember this was a couple of years ago reading and I think this might be an urban myth about one of the earliest neural networks which was used for detecting American tanks versus enemy tanks, and apparently one of the reasons why it didn't work was because the images they had of the American.
00:05:00 Dr Genevieve Hayes
Tanks were all nice, clear images that were taken in good light.
00:05:04 Dr Genevieve Hayes
And up close, whereas the images of the enemy tanks were at a distance in battle, etcetera, etcetera. Did you find when you first started building your neural networks that there were issues to do with where the photos of the cars were taken?
00:05:23 Dr Jeroen Vendrig
No, but on that topic. So I what you say I think is a true.
00:05:27 Dr Jeroen Vendrig
Story and in fact yesterday one of my staff members came with exactly that problem.
00:05:33 Dr Jeroen Vendrig
We found out that we we put something in the neural networks, we we masked out the background to focus on our area of interest.
00:05:42 Dr Jeroen Vendrig
And this particular new network that we tried actually started to learn the shapes of what we marked out rather than the content.
00:05:50 Dr Jeroen Vendrig
And that's a similar case. Now we we happen to know from the main knowledge that in this case the shape doesn't actually matter. It doesn't correlate to the labels we want to detect. So yes, these problems happen regular.
00:06:02 Dr Genevieve Hayes
You've had experience working in both academia and the commercial world. What are the key differences you've found between doing data science or AI in an academic setting compared to doing it in the commercial world?
00:06:17 Dr Jeroen Vendrig
Let me let me narrow it down a bit because that's a very broad question. So I've been in commercial R&D with folks on computer.
00:06:23 Dr Jeroen Vendrig
Efficient and there's there's plenty of differences there, so we can start there. It it will be different in other fields or it will be different if you just apply AI as a component rather than doing the R&D on it.
00:06:35 Dr Jeroen Vendrig
The key difference won't surprise you, so there's a business problem or product fishing. And unlike in academia, you can't really pick your own problem.
00:06:44 Dr Jeroen Vendrig
So it's the use case that drives everything. Having said that, you still have some freedom in debt and and go in the right direction or where where the best business opportunity matches the technical feasible.
00:06:56 Dr Jeroen Vendrig
And there's many consequences of that, that drive everything you do, basically. So the the first one is your problem definition. Second one.
00:07:05 Dr Jeroen Vendrig
And this is a bit more specific to computer phishing. Is the constraints that you can apply. The third one is more generic, I call it budget, but you should think of it as.
00:07:15 Dr Jeroen Vendrig
Scalability. The deployment, our 4th one is data your favourite topic and then there is context. Maybe maybe you can go through them one by 1:00, but that is something that's just not export in academia.
00:07:29 Dr Jeroen Vendrig
The first one I mentioned is the problem definition, and I know you've been talking with other guests about it as well, and it's a very general problem.
00:07:36 Dr Jeroen Vendrig
In fact, it's not even specific to AI, but it's a real battle to find out what are the actual business objectives that are behind the stated objectives, and often the kind people in the business.
00:07:49 Dr Jeroen Vendrig
Build I think along with you and they basically say well the problem is that I want a solution and I don't have this.
00:07:55 Dr Jeroen Vendrig
Solution. So you gotta cut them back to. Well, what's the actual problem? Because they often don't understand that themselves.
00:08:02 Dr Jeroen Vendrig
To be Frank, even though I've been in this, you know, business world for quite a while, it still surprises me that the business people don't understand their own problem. You really need to guide them through it. And one of the reasons I believe.
00:08:16 Dr Jeroen Vendrig
Is that in the business world, everything is quite fluid. Everything is negotiable. There's a large extent of hand waving going on, but if we want to go down because in the end what we're gonna end up with is a loss function that we present this business objective, they're they're very far away.
00:08:32 Dr Jeroen Vendrig
But you need to formalise what these problems are step by step, and I actually I have a plan of steps to do that which I try to apply if I can, to basically ease people in and get a nice transition to the point where the business people can sign off on something that is formalised and where the technical people understand it enough that they can take it over.
00:08:53 Dr Jeroen Vendrig
And and usually I would be the bridge between that.
00:08:56 Dr Genevieve Hayes
That's a bridge between our business problem and an analytics problem.
00:09:00 Dr Jeroen Vendrig
Correct. So I use something which I call key technology indicators. So they're basically high level evaluation criteria that measure success, but they're not the actual evaluation criteria, but they're the ones that business people can understand. The simple ones are like I want results within 5 seconds. So so the, the.
00:09:20 Dr Jeroen Vendrig
You mentioned there is time, but usually you get more involved in the use case, so everybody likes to use accuracy but accuracy itself as you know, a technical measure often is not suitable, so you gotta really go in depth.
00:09:33 Dr Jeroen Vendrig
What does that mean? What you get them first. You got a way to assess the project success in the end.
00:09:39 Dr Jeroen Vendrig
But usually those key technology indicators are not easy to quantify. You usually can't use them to run your experiments on, so you have to translate those to the proxies for those.
00:09:52 Dr Jeroen Vendrig
And if you've made that translation, then basically you you have the business side and the technical side happy, both of them.
00:09:59 Dr Jeroen Vendrig
And then do their work on top of that.
00:10:02 Dr Genevieve Hayes
Could you give an example of 1?
00:10:03 Dr Genevieve Hayes
Of the proxies.
00:10:04 Dr Jeroen Vendrig
Well the the proxies for example the simple ones that everybody knows are Rico and precision. That is very hard to understand for people on the business side. I mean, I pick a hard one here because there's immediately a trade off in there.
00:10:19 Dr Jeroen Vendrig
And trade offs are not something that they're very comfortable with, but what you usually want to do if that's a suitable measure, you kind of come to fix one of those.
00:10:28 Dr Jeroen Vendrig
So you come to go to the use case and you can say, OK, well, maybe we can fix the precision also 50%, right? And translate it differently. And now you can use recall.
00:10:39 Dr Jeroen Vendrig
As a measure, because one measure they can understand and and that is how you marry those.
00:10:45 Dr Jeroen Vendrig
But often you get more complex ones, so we've actually this was not with proof tech, but we had one where we actually considered patenting the evaluation criterion because it was quite convoluted, but it was representing what the business needed and this had to do with tracking people's particular way of tracking.
00:11:05 Dr Jeroen Vendrig
People that were suitable for the applications that that business had.
00:11:10 Dr Genevieve Hayes
Did you end up editing it in the end?
00:11:12 Dr Jeroen Vendrig
To be frank, I don't remember. We might not have. So there there are some enforceability issues with that. But the reason we considered it was that we thought actually we can even use it as a marketing thing. So we can set the baseline with this and and basically force competitors to use this evaluation criterion.
00:11:34 Dr Jeroen Vendrig
And which would give us an advantage because we've been thinking about it for a year longer than they have. And but sorry, I don't remember if it ended up.
00:11:41 Dr Genevieve Hayes
As someone from a academic background, what I'm hearing that you're saying.
00:11:46 Dr Genevieve Hayes
Thing is, the way you look at things in the commercial or business world is in terms of patents, how can you patent IP whereas in the academic world, a lot of the focus is on publishing research in academic papers? Is that a good analogy?
00:12:05 Dr Jeroen Vendrig
Yes, in some ways. So I was previously working for Cannon. So Cannon is top three in patenting in the world, so everything, at least in the R&D apartments, everything is about.
00:12:13 Dr Genevieve Hayes
OK.
00:12:17 Dr Jeroen Vendrig
That, and you're absolutely right about saying that. So if we looked at the problem when we say sometimes you would say, oh, we can solve that, but there's no way to pay in it.
00:12:26 Dr Jeroen Vendrig
Then we wouldn't do it. We would send it to another department, but not it would not be done.
00:12:30 Dr Jeroen Vendrig
By the R&D department.
00:12:32 Dr Jeroen Vendrig
But the other side of the story, it's what you need to do for patents. It's not so different from publishing a paper, and in fact, it actually forces you to do better science.
00:12:44 Dr Jeroen Vendrig
Because the thing about patents is, it's not going to appear review, it's going to a reviewer from the Patent Office.
00:12:52 Dr Jeroen Vendrig
And he's basically it's standard response. He's going to say.
00:12:56 Dr Jeroen Vendrig
The work that you did is obvious, and this is a particular legal term. The the word reflects what it means.
00:13:04 Dr Jeroen Vendrig
What you can't do in patients, you can't just take some existing techniques, put them together and say, OK, I got something, I got something new which you do, but that's not quite good enough for a patent. On the other hand, in academia, you can do that.
00:13:17 Dr Jeroen Vendrig
And there's lots and lots of fibres that do, dude. And maybe sometimes there's value in it, but I I stopped reading them myself while in Peyton. She really have to focus on making a contribution.
00:13:30 Dr Jeroen Vendrig
Self so you can still put things together, but there needs to be something difficult about putting them together, right?
00:13:37 Dr Jeroen Vendrig
Something that you need to be, you know well, creative or intelligence about doing that, I call it the glue. You can patent the glue.
00:13:45 Dr Genevieve Hayes
It's it's the spark of inspiration.
00:13:47 Dr Jeroen Vendrig
Yes, although you know it's 90% transpiration. Yeah. And and those are the academic papers that get the high citations right. That would be the equivalent.
00:13:59 Dr Genevieve Hayes
Like Larry Page and Sergey Brins's, Google algorithm pay.
00:14:04 Dr Jeroen Vendrig
Yeah, and and there's still enough papers like that, but they're out of the maybe millions of papers that are being published. And so that that really focuses you or on your contribution to the art.
00:14:18 Dr Genevieve Hayes
Once you've got the patent, do companies publish any of the research as research papers typically, or do you just keep that patent locked away in a safe so that no one knows?
00:14:30 Dr Genevieve Hayes
About it.
00:14:31 Dr Jeroen Vendrig
Getting a pain is very hard and it takes a very long time. It can take like five years, so nobody is interested in the paper anymore, but what usually happens is you can file for the payment.
00:14:43 Dr Jeroen Vendrig
So you have the application and it's kind of like time stamping it. And so once you've time stamped it and you can publish it now cannon.
00:14:51 Dr Jeroen Vendrig
Was very conservative and that's how we didn't do that that much. But other companies are much quicker in.
00:14:56 Dr Jeroen Vendrig
That and then you can basically publish that paper and it comes through through the community before the application is even made visible.
00:15:06 Dr Genevieve Hayes
And you often see that with things like Facebook or Google. Often there'll be research papers where every single researcher belongs to one of those companies.
00:15:15 Dr Jeroen Vendrig
And you you can. So you can come back a year later and search for the applications in your local.
00:15:20 Dr Jeroen Vendrig
Mind that there's something underlying that paper which usually multiple birds.
00:15:25 Dr Genevieve Hayes
When you're performing commercial research, do you draw much on existing academic research?
00:15:30 Dr Genevieve Hayes
So on other journal articles.
00:15:33 Dr Jeroen Vendrig
Yes, now things now are different from when I did my PhD. So right now it's kind of the expectation that people publish their code.
00:15:42 Dr Jeroen Vendrig
It's not required, but most people do, so it's even better than a paper, right? You can go to the actual code and they usually have a nice explanation of it as well.
00:15:53 Dr Jeroen Vendrig
And uh, so that might be the first port of call and then you say, well, that's this is really going somewhere now.
00:15:59 Dr Jeroen Vendrig
Now I'm going to read the paper, so it's kind of the the other way around. So there's definitely a lot of interesting stuff out there.
00:16:07 Dr Jeroen Vendrig
And I also love to have a look at the side papers with code. I'm not sure if you're familiar with that.
00:16:12 Dr Genevieve Hayes
Oh yeah, yeah, I know that.
00:16:14 Dr Jeroen Vendrig
Has some disadvantages, but so you have to be careful using it.
00:16:18 Dr Jeroen Vendrig
But on the one hand, it's great. It's it's where everything comes to Keffer. You can see how they benchmarked and you go straight into the GitHub from there, or the papers on archive. So it's it's a fantastic world to defecate at the moment.
00:16:35 Dr Genevieve Hayes
One of my previous guests, who works in AI engineer for a metaverse.
00:16:40 Dr Genevieve Hayes
And he commented that he often would look at the code associated with research papers, but he often found that because the code was written by an academic and not a professional software developer, if he was going to incorporate that into his work, he had to rewrite it so that it was.
00:17:01 Dr Genevieve Hayes
Suitable for production.
00:17:04 Dr Genevieve Hayes
Patient is that a challenge that you've encountered?
00:17:07 Dr Jeroen Vendrig
Throughout my career, well, definitely. So we've actually done a little work with universities and it may surprise you, but sometimes we wouldn't even run the code that produce.
00:17:19 Dr Jeroen Vendrig
We could already see it. That's gonna be a lot of work running this sometimes, you know, best case, we would use it as a Rep.
00:17:26 Dr Jeroen Vendrig
Prince, it would never come anywhere near a product. But then even the R&D code, so the commercial R&D code which.
00:17:33 Dr Jeroen Vendrig
Is closer to production quality would usually be thrown away or also used for Esperance when we put something in this, especially if we put something on the chip. So you can't really afford any mistakes on it.
00:17:46 Dr Jeroen Vendrig
There might be 3 or 4 versions of the codes before it ends up in the product, but nowadays with software it's much shorter.
00:17:54 Dr Jeroen Vendrig
And I have to say I I understand your guests observation, but some of the stuff that is published in academia is actually pretty good.
00:18:04 Dr Jeroen Vendrig
So the software quality has much improved with the PhD students do, so some stuff is actually usable and it can go in production, maybe not on the chip, but if you run it on the cloud.
00:18:15 Dr Jeroen Vendrig
And you can easily replace it. And to be Frank, we have some open source code that we use. You can't launch a project without having 300 open source dependencies and not many issues.
00:18:27 Dr Jeroen Vendrig
With that, however, some of the published things look interesting, and they never make it, because indeed the code is just not good enough to try it.
00:18:36 Dr Jeroen Vendrig
And then yeah, you can improve it or you can just take something that is slightly different, but that's working out. So you take the path of least resistance.
00:18:48 Dr Jeroen Vendrig
I do find that sometimes these academic publications, or sorry, the code for it is very much geared to the publication and it's not so easy to repurpose it for something else.
00:19:01 Dr Jeroen Vendrig
But you know that's that's not why these people made it. So if you want it, that should be your job. And if you're nice, you share that with the world as well.
00:19:10 Dr Genevieve Hayes
What programming languages do you typically use?
00:19:13 Dr Jeroen Vendrig
As the CTO and I'm, I'm pretty handsome. I'm I'm all for everything. So I've used many so I actually on a day I can be in like 3 different languages.
00:19:23 Dr Jeroen Vendrig
Python is the most important one for our back end, and of course with the understanding that the libraries that we use open CV tensor flow torch where all the grunt work is done are written in other languages. So for our front end we use TypeScript which is basically JavaScript and then we also have apps.
00:19:43 Dr Jeroen Vendrig
Where we use Java and SWIFT then we are fully on the Amazon cloud. There's sometimes some weird Amazon languages.
00:19:52 Dr Jeroen Vendrig
That we have to do our thing. They're phasing them out.
00:19:55 Dr Jeroen Vendrig
Thank goodness.
00:19:56 Dr Genevieve Hayes
Oh yeah.
00:19:56 Dr Jeroen Vendrig
But the the data science is part of it is basically done in Python.
00:20:01 Dr Genevieve Hayes
OK, for the data scientists who are listening, what are the most important Python packages from your point of view?
00:20:08 Dr Jeroen Vendrig
So well, I already mentioned opens a few Tensorflow torch, but we actually use a lot of tenders as well. So Pentas is not we, we don't actually do we we don't create models with it or anything.
00:20:21 Dr Jeroen Vendrig
But to understand the data then this is a is a very powerful tool. The the Swiss Army knife for data. So we used that a lot.
00:20:29 Dr Genevieve Hayes
Does SK learn comma?
00:20:31 Dr Jeroen Vendrig
Yes, we use that as well. And again, it's not really for the actual models, but it has some tools in there especially for evaluation metrics that we use in there. And sometimes we even use it for some image processing. So that's that Sidekick image.
00:20:46 Dr Genevieve Hayes
OK, I haven't used psychic and which I usually use. I've used open CV for image process.
00:20:51 Dr Genevieve Hayes
Missing though.
00:20:52 Dr Genevieve Hayes
One thing that I was thinking often you hear about these academic commercial collaborations and I've seen a couple of them in organisations that I've worked for, but I've never really seen them being all that successful. What are your thoughts on those?
00:21:10 Dr Jeroen Vendrig
It's it's hard, but I have managed to do a couple of successful ones and in fact when I when I left the University of Amsterdam where I did my PhD, actually one of the things I worked on cut commercialised. So that's my experience from the university so.
00:21:26 Dr Jeroen Vendrig
And after that I've been doing it from the other side. When I was at university, that was a bit ad hoc and it it was actually quite funny because I made something.
00:21:36 Dr Jeroen Vendrig
I went on a sabbatical and I came back after a few months and the commercial guys that we were working with, they gave a very enthusiastic story about this product.
00:21:46 Dr Jeroen Vendrig
And they they sold it to a big German outfit.
00:21:50 Dr Jeroen Vendrig
And it was all great. And it had a name and everything. And I said, well, that that's great. And it this happened in a few months, I was where can you?
00:21:57 Dr Jeroen Vendrig
Show me a.
00:21:57 Dr Jeroen Vendrig
Demo and looked at me and I said no, this is your stuff that you made before you left and and that is the key point. If you come from a university viewpoint.
00:22:09 Dr Jeroen Vendrig
You gotta have those goods sales guys or maybe business development would be a better way to name them because they can really match that and and reformulate what you did in way.
00:22:21 Dr Jeroen Vendrig
That as an academic, you just can't, because as an academic you can't to say, Oh yeah, but this and this and but.
00:22:27 Dr Jeroen Vendrig
And no, that's not exactly how it is. So they scheme over all of that Kathy essence out, package it in a way that companies say, oh, they express it as the value for the company, right? They're not come to talk about CNN's or.
00:22:41 Dr Jeroen Vendrig
The technology behind it, but they kind of say this is going to be the value for you as a company and that's how it's done.
00:22:47 Dr Jeroen Vendrig
Now they productized it. They actually took the prototype I just told you. Don't do that. But they did take the university card.
00:22:54 Dr Jeroen Vendrig
But I checked on them last year and it's still a successful business. They differed that into completely different things now.
00:23:00 Dr Jeroen Vendrig
That, but it's a company with 50 people now that basically came out of that original.
00:23:05 Dr Jeroen Vendrig
Product that's pretty good. That's very successful spin-off, yes.
00:23:10 Dr Genevieve Hayes
Where I've seen them be unsuccessful is when it sort of ends up in this sort of buck passing exercise.
00:23:18 Dr Genevieve Hayes
The organisation wants the academics to come up with brilliant stuff and then the academics want to be told what to do by the organisation.
00:23:27 Dr Genevieve Hayes
And it just ends up.
00:23:29 Dr Genevieve Hayes
With money being thrown at the academics and everyone trying to pretend that nothing's happening and yeah, it gets swept under the rug eventually.
00:23:39 Dr Jeroen Vendrig
That's an interesting experience because when I come from the commercial side, I see it. It's exactly the opposite.
00:23:46 Dr Genevieve Hayes
OK.
00:23:47 Dr Jeroen Vendrig
Going back to your question, what makes a successful one? You actually need an overlap between the research interests of the the university group or the individuals.
00:23:57 Dr Jeroen Vendrig
And the potential to solve that problem for the company.
00:24:00 Dr Jeroen Vendrig
But it requires the company to be quite mature in thinking about that. They're dealing with R&D there, so there's no certainty.
00:24:07 Dr Jeroen Vendrig
So if they can't accept the risk, then that something is wrong. You shouldn't deal with the university for that, or they're they're special departments and universities that do those things, but not with a research part. On the other hand. So.
00:24:20 Dr Jeroen Vendrig
What happens if you're not aligned? My my experience.
00:24:23 Dr Jeroen Vendrig
This because there's some very good salespeople amongst the professors as well, so they're just going to tell the company.
00:24:30 Dr Jeroen Vendrig
Yeah, we can do that to whatever they say. And then once they got the contract in, they just kind of shape the the problem until it fits whatever they intended to do for their research. And that is not necessarily a happy marriage.
00:24:44 Dr Jeroen Vendrig
Later. So if you can get those things clear from the start, but it requires from the company side to understand the academics to some some.
00:24:54 Dr Jeroen Vendrig
So if if the academia says or, this is the problems we want to pursue, this is what we're.
00:25:00 Dr Jeroen Vendrig
Then they have to be able to understand that enough to say, oh, it can match this range of rooms.
00:25:07 Dr Jeroen Vendrig
And if you get that together, then you can actually have a very successful collaboration because everybody's aligned.
00:25:12 Dr Genevieve Hayes
I think that's an interesting point. That idea of the skill set matching a particular range of problems because one of the things that when I'm teaching data science, I always tell my students the business problem has to drive the solution, not the other way round. But you also have that situation where you've got someone who specialises.
00:25:34 Dr Genevieve Hayes
In a particular.
00:25:36 Dr Genevieve Hayes
Skill set like for example, you specialise in computer vision, it would be a waste for you to go to work for a company that wasn't trying to solve computer vision problems.
00:25:48 Dr Genevieve Hayes
So I think even though the problem has to drive the solution, people with a particular skill set need to.
00:25:56 Dr Genevieve Hayes
Match their skill set with a company that requires that skill set to solve the range of problems they're looking.
00:26:03 Dr Jeroen Vendrig
Yes, although you always have to be careful because maybe a computer phishing approach can be applied to a non computer phishing problem, right?
00:26:11 Dr Jeroen Vendrig
And if you want to do that actually going to a university would be the right thing to do because they can think at a bit more abstract level to jump between domains. But yes, in general, you're right.
00:26:23 Dr Genevieve Hayes
What would you apply a computer vision solution to a non computer vision problem?
00:26:28 Dr Jeroen Vendrig
So in the end, in computer fashion, well, let's say an image, you have two dimensional data where where the elements of the data are related to each other.
00:26:38 Dr Jeroen Vendrig
So if you have other problems like that, you actually can call them. I mean the image right, even though it might not be the traditional image as we know it.
00:26:47 Dr Jeroen Vendrig
And there might be other problems like that, so sorry from the top of my mind, I can't think of any, but there there is often migration between topics and computer phishing itself borrows a lot from natural language processing.
00:27:01 Dr Jeroen Vendrig
So basically if you want to know what's going to happen 2 years from now in computer fishing, just cheque the state of the art.
00:27:07 Dr Jeroen Vendrig
Initial language processing.
00:27:09 Dr Genevieve Hayes
OK, so natural language, you've got one dimensional sequence of letters, whereas in computer vision you've got a 2 dimensional matrix of pixels. Is that right?
00:27:19 Dr Jeroen Vendrig
Yeah, yeah. Or at least. Yeah, sometimes, yeah.
00:27:21 Dr Genevieve Hayes
Or at least two dimensional, possibly three if you've got colour.
00:27:25 Dr Jeroen Vendrig
Yes, so so that's why these techniques can't be used one on one.
00:27:30 Dr Jeroen Vendrig
But you see, for example, the Transformers that that have shown great success in natural language processing, they are now, they already have been transformed, if you like.
00:27:41 Dr Jeroen Vendrig
Into fish and transform.
00:27:42 Dr Jeroen Vendrig
Numbers, and they're not exactly the same, but the concepts behind them the the the idea behind him are the same.
00:27:48 Dr Jeroen Vendrig
So at a very high level, they look at the context and the self attention within parts of the local and and there can be a word in a sentence or it can be a spot in an image and that's how these things.
00:28:02 Dr Jeroen Vendrig
Transition now not not everything by Mac, so I'm not sure. So we the convolutional neural networks, I think they're used in natural language processing as well.
00:28:11 Dr Jeroen Vendrig
But those might actually have gone the other way around and being applied.
00:28:15 Dr Jeroen Vendrig
To those situations.
00:28:16 Dr Genevieve Hayes
So I always thought that image processing was the easier use case and that it went from image processing to natural language. So that's interesting that you say it's the other way around.
00:28:26 Dr Jeroen Vendrig
Ohh, tell me. Tell me why? Because I would think that natural language processing is much easier.
00:28:31 Dr Genevieve Hayes
It just feels like there's more image processing use cases or successful image processing use cases, but maybe that's just.
00:28:40 Dr Genevieve Hayes
And outside observe a point of view rather than a inside a point of view.
00:28:44 Dr Jeroen Vendrig
OK. Well, that's very interesting. So I think image processing is very hard and when you start so you, you said natural language processing, you start with letters, but actually you start with words.
00:28:57 Dr Jeroen Vendrig
So there's already so much semantic information, and basically you've narrowed it down so much already. Well, when you get an image, you have.
00:29:04 Dr Jeroen Vendrig
All these pixels.
00:29:06 Dr Jeroen Vendrig
That by themselves are meaningless, and then you have these groups of pixels. You have to give them a meaning.
00:29:11 Dr Jeroen Vendrig
But if the light is slightly different, these pixels completely transform, even though what you see has the same meaning.
00:29:18 Dr Jeroen Vendrig
Those kind of problems, you don't really have them in natural language. That's why I think it's a much harder problem.
00:29:24 Dr Genevieve Hayes
What you're just saying then it reminds me of that example I've seen on the Internet where people get an image and they change one pixel.
00:29:32 Dr Genevieve Hayes
And it causes an object detector to think a dog's a banana or something insane, yeah.
00:29:37 Dr Jeroen Vendrig
Absolutely, yeah. And and this, this is very hard and it's ever go back to the first question that you had.
00:29:44 Dr Jeroen Vendrig
So in computer fashion, constraints are very important and I actually I I have well it's not formed, but I have a checklist right of constraints that can apply to promo and there's like 40.
00:29:56 Dr Jeroen Vendrig
Of the that that generally happened and for when I come to a new domain, I actually go over all these 40 and cheque well. Which ones do apply.
00:30:03 Dr Jeroen Vendrig
Not all of them apply, but you often end up with like 20 constraints that you need to put in place just to be able to tackle this computer vision problem in some way.
00:30:14 Dr Genevieve Hayes
Could you give us some examples of some of those constraints?
00:30:17 Dr Jeroen Vendrig
At the extreme side of computer phishing, you have machine phishing, so that's where you control everything. You control the lighting, for example, and lighting is a very important one not. Not only can it light your cameras, that's of course you can't do anything with that.
00:30:32 Dr Jeroen Vendrig
But it can completely change how you represent what is happening in the.
00:30:37 Dr Jeroen Vendrig
Real world, what you can do more in the wild as we call it, so that that's the topic I work on.
00:30:43 Dr Jeroen Vendrig
But you don't have a whole lot of control of those circumstances. You can still do things about that. So you you can provide some shade. So for example.
00:30:52 Dr Jeroen Vendrig
If you take images, you tell people do it in a shade, do it. Don't. Everybody knows it. Don't take photos against direct sunlight, so that's already constraining.
00:31:01 Dr Jeroen Vendrig
Bit but you can do much more with that and in fact what we found. So one of our applications of proof tech is where people take the images.
00:31:09 Dr Jeroen Vendrig
So they go around the car and take images, and if that's the brief we give them, it's going to be a complete disaster.
00:31:15 Dr Jeroen Vendrig
We we can't do much with that from a computer phishing perspective. So we put constraints on them, we tell them.
00:31:20 Dr Jeroen Vendrig
How do you go?
00:31:22 Dr Jeroen Vendrig
Along the car, we actually have a little neural network that tells them how close they should be to the car and make sure they have the right angle of things because as soon as you change the angle you get all sorts of 3D effects that make the complexing well, not infinite, but a whole lot more complex and what you for example see.
00:31:42 Dr Jeroen Vendrig
In the manual so nobody reads those pages.
00:31:45 Dr Jeroen Vendrig
But if you buy a cannon product, there's a lot of exclusion, so simple ones to say, don't use this at night or sometimes they say yes, you can use it at night.
00:31:53 Dr Jeroen Vendrig
So those are all constraints under which the application is working has been tested and maybe you can try what happens if you don't put a constraint in place, right? What happens if you do use it at night?
00:32:05 Dr Jeroen Vendrig
Maybe it works, maybe it doesn't, but that's not what the work has been focusing on. And when we look at product releases.
00:32:12 Dr Jeroen Vendrig
We often say, OK, we applied those 20 constraints and have version two. We cannot lift some of those constraints.
00:32:18 Dr Jeroen Vendrig
Version three, we lift more of those constraints, but at some point we reach the limit. It says OK now certain constraints need to be in place, otherwise the problem is too opening and that is actually what you.
00:32:32 Dr Jeroen Vendrig
Spent a lot of time.
00:32:33 Dr Jeroen Vendrig
Them on and in terms of evaluation criterion. So constraints are not evaluation criteria, but they're kind of hanging in the same space, so they're they're very important and when we go to difference between business and academia, this is an important difference because you can't pick your constraints. You have to negotiate this. What is reasonable.
00:32:54 Dr Jeroen Vendrig
Users. You can't just determine them and it's pro completely failed to do that in academia. But if we put the wrong constraints in, then nobody will buy your product.
00:33:03 Dr Genevieve Hayes
With these neural networks that underpin proof text products, there must have been a point where you just had no data to work with.
00:33:13 Dr Genevieve Hayes
How do you deal with the situation where you have absolutely no data at all? I mean, how did you create your original data set?
00:33:22 Dr Jeroen Vendrig
Yeah, absolutely. So first I have proof tech. We had zero data. It didn't scare me because it was not the first time I was in that situation. In fact, I did my PhD in the 90s. So back then there were no standard data sets.
00:33:37 Dr Jeroen Vendrig
In fact, I I think there was a standard 11 standard image, Lena. But but I was in video so I couldn't even use that.
00:33:48 Dr Jeroen Vendrig
So I've always been working on making my own datasets. If I can't even think of a situation where I started out with a usable dataset.
00:33:57 Dr Jeroen Vendrig
If you're lucky, there is data, but often you don't have any data and if there is data, there are no.
00:34:03 Dr Jeroen Vendrig
Labels we we just talked about that these all these pixels are pretty meaningless. So without labels, you're you're really flying blind.
00:34:10 Dr Jeroen Vendrig
You're not going to make any sense of it. And then if you do have labels in in the very lucky situation, they're inconsistent. And actually usually you end up throwing them away and relabeling them. So creating your own.
00:34:23 Dr Jeroen Vendrig
Data set. It depends a bit on the situation. I've done a lot of work in surveillance, so security settings.
00:34:31 Dr Jeroen Vendrig
So it's very hard to get your hands on actual footage. So we actually reenact. So sometimes we woke ourselves in front of video cameras and we've even had with a talent agency, we hired a bunch of actors to walk around, dress in different things, et cetera. This was a particular topic that we were doing.
00:34:52 Dr Jeroen Vendrig
I can tell you where I I I understand now what's going on in the film set because the logistics are enormous.
00:34:58 Dr Jeroen Vendrig
But basically we did a couple of days of recording like that just to create our own data set and and have that variety in it.
00:35:06 Dr Jeroen Vendrig
What we're doing now for we were interested in damages, so we actually went to a yard near the airport and we just photographed a lot of rental cars. Fortunately, they do have a lot of damages.
00:35:19 Dr Jeroen Vendrig
Initially I I also sometimes go around in my neighbourhood. Here I I don't know what it is. I think there's bad drivers in in my neighbourhood there is a lots of interesting damages on the cars.
00:35:29 Dr Jeroen Vendrig
And that's how we start things and we can build our.
00:35:32 Dr Jeroen Vendrig
Model now at the same time and I guess in parallel I worked with a company that that had this concept of a P0 and I'm.
00:35:40 Dr Jeroen Vendrig
I'm not sure if it's a common term have. Have you heard of it and that P stands for product?
00:35:43 Dr Genevieve Hayes
No, I haven't.
00:35:46 Dr Jeroen Vendrig
So it's it's it's product 0. So it's product 0 doesn't have any fancy data models in it. Its only purpose is to collect data.
00:35:55 Dr Jeroen Vendrig
And I would say it goes into the data stream of of some some use case and it taps it off what they were very good at is making that PCR of fairy.
00:36:04 Dr Jeroen Vendrig
Nice. So that people were compelled to use it because that's the whole point. Use our P0IN return. You give us your data, we start collecting it, and maybe we ask some questions.
00:36:16 Dr Jeroen Vendrig
So we we get some labels from it and then we can power up models and once you have a first model that is halfway decent.
00:36:25 Dr Jeroen Vendrig
Now you can show and convince people say hey, well look, this is what we can do with it. It's not quite there yet, but there is a reward if you share your data with us. If you help us with.
00:36:36 Dr Jeroen Vendrig
And it keeps growing and growing. And you, you get this cycle of getting data in what it means compared to to academia or or or data science where you have big data sets already is there's a bit of bias in your data sets inevitably because you're creating it, you're going with the flow so to speak.
00:36:57 Dr Jeroen Vendrig
Where you can get the data.
00:36:58 Dr Jeroen Vendrig
So it's not necessarily a distribution of the real world. So that's one of the dangers that you have there.
00:37:06 Dr Jeroen Vendrig
I do indeed find that heart. So now we have a good model and we want to make it better.
00:37:11 Dr Jeroen Vendrig
So false positives are very easy to get feedback on but missed detections. Basically you don't know what you missed.
00:37:18 Dr Jeroen Vendrig
You don't get feedback on it usually, so those are the hard ones. When you have kind of this biassed data set acquisition approach.
00:37:27 Dr Genevieve Hayes
If you've got missed detections, would you even know that you had a missed detection in order to record it as a false negative?
00:37:36 Dr Jeroen Vendrig
Absolutely not. And in fact what we see we two months ago, we we made a new model. So not just another iteration, but actually we started from scratch with it.
00:37:46 Dr Jeroen Vendrig
We ran it on our data sets and we have false positives. So we look at.
00:37:49 Dr Jeroen Vendrig
The false positives.
00:37:51 Dr Jeroen Vendrig
Except they're not false positives. They're true detections. We just hadn't labelled them in fact.
00:37:56 Dr Jeroen Vendrig
We can we have one very small data set, a test set and it's small because it's very expensive to make that we've gone through multiple times.
00:38:07 Dr Jeroen Vendrig
And say after going through it four times, the fifth time we still find new things that we have missed before as humans.
00:38:15 Dr Jeroen Vendrig
And this is we. We've basically caught with the microscope of these things and we still find new things in there and and just for context, when I say that we detect damage, this is not a big crash.
00:38:25 Dr Jeroen Vendrig
We're detecting very small damages. There can be few millimetres, maybe half a centimetre, and that's that's exactly why we make these things because that's too hard for humans.
00:38:36 Dr Jeroen Vendrig
A human.
00:38:36 Dr Jeroen Vendrig
And do.
00:38:37 Dr Jeroen Vendrig
It but not on a scale and this is very common. People can do these tasks for about 10 minutes and after that they tune out.
00:38:45 Dr Jeroen Vendrig
And that's what makes it so hard to do that labelling. So yeah, we don't know what we missed.
00:38:51 Dr Genevieve Hayes
I was gonna ask do humans do your labelling or is so there's no magic way of doing it other than humans?
00:39:00 Dr Genevieve Hayes
I I found when I was building machine learning models in a particular organisation I was working. The hardest thing was always convincing people to label the data set for me and you could never get it done.
00:39:14 Dr Jeroen Vendrig
And that's why. So we haven't cracked that. What we're trying to similar to that PCR concept you have to reward them for it.
00:39:21 Dr Jeroen Vendrig
And I don't mean playing a music or whatever, but getting them to have business value for that. So in our case, say, well, if you do label this.
00:39:31 Dr Jeroen Vendrig
We actually automatically generate reports for you that you need, and that's how we're trying to convince them to to give that kind of feedback.
00:39:39 Dr Genevieve Hayes
Yeah, I've heard some people have tried using things like Amazon Mechanical Turk, and I've never used it myself, but I've heard the results you get from it are terrible.
00:39:49 Dr Jeroen Vendrig
I I think it depends on what you try to do. If you want to label cats versus dogs I'm sure.
00:39:55 Dr Jeroen Vendrig
Looks fine. Yeah, but in in our case, experts do not agree on what the label should be, but there's no way a mechanical Turk is going to do.
00:40:04 Dr Jeroen Vendrig
It and in fact, so we we do use parties overseas. So basically low cost countries, but they're not random people. These people have been trained for a few days.
00:40:15 Dr Jeroen Vendrig
In order to to.
00:40:16 Dr Jeroen Vendrig
This the the guidelines that we provide them, there's like 25 page guidelines right now. There's a lot of pages because there's images in there, but there's a lot in it and it's still not enough. So that that is it's worth spending a lot of time on that.
00:40:33 Dr Genevieve Hayes
With the products that you're developing, they're obviously.
00:40:37 Dr Genevieve Hayes
These are products that are used by real people. In the end, at what point do you get your potential end users involved in looking at your products?
00:40:48 Dr Jeroen Vendrig
There's two parts to the to the product right there is how people interact with it and then there is technically keeping the the basically the chops to do that properly.
00:40:58 Dr Jeroen Vendrig
I I think this was mentioned in one of your other podcasts, but I'm a big fan of the lean startup and the minimum fibre products, and you shouldn't take it literally, but there's lots of good ideas in it.
00:41:08 Dr Jeroen Vendrig
And it it actually has a bit of a scientific basis and one of the lessons for that is cat early user feedback.
00:41:16 Dr Jeroen Vendrig
You don't have to build the actual product as you're fish in it. It's very easy to do nowadays with clickable proto.
00:41:23 Dr Jeroen Vendrig
Types so anybody can use figma. You don't need to be a technical person for it. You basically can give some select use as a clickable prototype and you get a lot of feedback on that about what I like and not.
00:41:37 Dr Jeroen Vendrig
In our case, I always work business to business and most of the discussion is about how is that going to affect the workflow. How does it fit your workflow?
00:41:46 Dr Jeroen Vendrig
And those things surface, and in fact some questions you could ask them and that you don't get a good answer to like we discussed.
00:41:53 Dr Jeroen Vendrig
In the beginning.
00:41:54 Dr Jeroen Vendrig
If you show them this clickable prototype, then they'll say to you. Oh no, that's not how we do it. This is how we do it, and you'll finally get your answer.
00:42:02 Dr Jeroen Vendrig
So they're very powerful and that's what we do. We we're actually one of my staff is making one right as.
00:42:08 Dr Jeroen Vendrig
We speak which?
00:42:10 Dr Jeroen Vendrig
We've used before we make a second version of it to get more feedback from our clients to say how are you going to use our technology because the the key component that we're working on in parallel that actually hasn't changed.
00:42:23 Dr Jeroen Vendrig
But how we present it to the users is going to be different based on the feedback.
00:42:29 Dr Genevieve Hayes
Is knowing your product will ultimately be used by real people impact the way you look at AI product development right from the?
00:42:38 Dr Jeroen Vendrig
Yes, and that simply comes back to those evaluation criteria. I go very far with that and it's probably most people don't, but I really think about how is this going.
00:42:50 Dr Jeroen Vendrig
To be used.
00:42:52 Dr Jeroen Vendrig
Basically watches the UX that's going to be on top of the the technical components and then work your way back.
00:42:59 Dr Jeroen Vendrig
And a simple example of that is let's let's say you have rankings, and in academia that's how you are this. This is our top five or top ten ranking results.
00:43:10 Dr Jeroen Vendrig
I, on the other hand, might look at top nine or top 12, so that's very simple. Why 9 or 12?
00:43:16 Dr Jeroen Vendrig
Why not a round Number? Well, usually we present things in a three by three grid or a four by.
00:43:22 Dr Jeroen Vendrig
Three grid, right?
00:43:23 Dr Genevieve Hayes
Oh yeah.
00:43:23 Dr Jeroen Vendrig
That's what the user count to see. They're not going to say 10 results. They account to see that, but that's a small shift, right?
00:43:31 Dr Jeroen Vendrig
But there is bigger ones, so a very interesting example is actually coming from our life system and we got some feedback and it says the false positives are no good. Says well, yeah, we know that nobody likes us. No, no, no, that's.
00:43:46 Dr Jeroen Vendrig
What I mean?
00:43:47 Dr Jeroen Vendrig
It turns out there is acceptable false positives and non acceptable false positives, so it's it's not so easy to deal with, but it's very important to the user acceptance of your system and we're trying to take some measures to deal with that and and when I heard that, it reminded me of something.
00:44:07 Dr Jeroen Vendrig
A very long time ago. So this 25 years ago, a television manufacturer, they added voice recognition to their televisions. You could give them some very simple come.
00:44:17 Dr Jeroen Vendrig
I mean, back then this was very advanced at the time and I had an avatar. So. So a little human that would then respond to your voice.
00:44:26 Dr Jeroen Vendrig
But the voice recognition wasn't that good, so it messed up people didn't like it. And the interesting thing that I did is then then they replaced the avatar. So the actual technology exactly the same.
00:44:37 Dr Jeroen Vendrig
Instead of a a little virtual human, it was now a dog and all of a sudden people accepted it and said, yeah, this is great, he said.
00:44:44 Dr Jeroen Vendrig
Well, what about it doesn't? It's not always come right, I said. Oh, yeah. But everybody knows dogs don't understand.
00:44:51 Dr Jeroen Vendrig
So no change in technology, just in how it's presented can make a huge difference.
00:44:58 Dr Genevieve Hayes
I keep thinking of, do you remember Clippy, the Microsoft Word virtual assistant, and everyone hated Clippy? And yes, we're.
00:45:06 Dr Genevieve Hayes
All happy using virtual assistants now it's just.
00:45:10 Dr Genevieve Hayes
There was something about Clippy.
00:45:14 Dr Jeroen Vendrig
Yes. Well, I viewers can see it, but I see some nostalgia in your in your on your face.
00:45:22 Dr Genevieve Hayes
So what do you believe are the most valuable skills for data scientists who are looking to build a career in developing commercial AI based technology?
00:45:33 Dr Jeroen Vendrig
Well, continue on the topic that we just talked about. I actually think the data preparation and all that preparation that you do is more important than the actual data science that you do on it.
00:45:47 Dr Jeroen Vendrig
And and partly this is because we now have auto, ML and and all those things that can do a lot of.
00:45:54 Dr Jeroen Vendrig
Or what used to be manual work, but if you put a wrong data in it or you put a wrong evaluation criteria in, they're not going to.
00:46:02 Dr Jeroen Vendrig
And I think that is where maybe that's not a new skill, but more focus needs to be on that.
00:46:08 Dr Jeroen Vendrig
And I noticed that that people in academia and data scientists get very distracted by getting 0.01 more out of their model.
00:46:18 Dr Jeroen Vendrig
In my experience and Andrew Yang, lending AI famous for his Stanford course, he he's actually got a presentation in which he backs it up with data and he can show that with an amount of effort spent on improving the model and getting 0.02.
00:46:34 Dr Jeroen Vendrig
Improvement, less effort spent on massaging the data a bit better, gives them a 10 percentage points improvement. That's where I think data scientists should focus on.
00:46:46 Dr Jeroen Vendrig
Now I know in the world it it doesn't happen in my world, but I've seen in data science there's data engineers that are separate from data scientists.
00:46:55 Dr Jeroen Vendrig
I don't completely understand the difference, but where I did see it I would say let's give the data engineer a bit of training. You get better cells than training the data scientists to be data engineers, partly because they don't want.
00:47:11 Dr Jeroen Vendrig
So if you're a pure data scientist, you have to be very good or, you know, you might be surpassed by the data engineers that have a bit of additional training.
00:47:22 Dr Jeroen Vendrig
Another point is look at the actual data, so that's not a skill, but that should.
00:47:27 Dr Jeroen Vendrig
Be a habit I do find even in my team. I have to tell people.
00:47:31 Dr Jeroen Vendrig
They come with all these numbers, etcetera says yes. But have you gone to the actual data and looked at it because we as a torture we we don't have a distribution of the actual world, right?
00:47:43 Dr Jeroen Vendrig
So there might be something wrong with the distribution your your numbers are not going to tell you that you have to pass some human judgement.
00:47:49 Dr Jeroen Vendrig
On that, but also we we often work in a feature space and we might not have the right features.
00:47:55 Dr Jeroen Vendrig
So you have to go back and look there rather than spend months and months on squeezing something out, something impossible.
00:48:03 Dr Genevieve Hayes
There's a paper that I found that another guest pointed me in the direction of and the people who wrote the papers actually demonstrated how you could.
00:48:12 Dr Genevieve Hayes
Create all these different data sets with exactly the same summary statistics and you know you've got some that are just, you know, straight lines of data type thing. But they've actually one of them is actually a dinosaur. So.
00:48:23 Dr Jeroen Vendrig
OK.
00:48:29 Dr Genevieve Hayes
And and actually show that to my class because the point I want to make is. If you're just looking at the means and standard deviations, it does not tell you that you've got a dinosaur there.
00:48:41 Dr Jeroen Vendrig
That's right. Yeah. So yeah, make that a habit. Of course. You still. You should still look at the numbers.
00:48:47 Dr Jeroen Vendrig
As well and some doubts with that, yeah.
00:48:48 Dr Genevieve Hayes
Oh yes, definitely.
00:48:50 Dr Genevieve Hayes
They they look after the dinosaurs.
00:48:53 Dr Jeroen Vendrig
So. So the other thing is it doesn't happen in AI that often, but in more traditional data science I noticed that some data scientists, even fresh ones from uni, they can't really code, which it surprised me. I I hope that the new outtake is not like that anymore. But even if they can codes.
00:49:13 Dr Jeroen Vendrig
Software engineering skills can be quite useful and I think you you talked about your own journey on that and that it's a bit of a revelation.
00:49:23 Dr Jeroen Vendrig
So I'm not saying data scientists should be software engineers, but some of the thinking in there can be very useful.
00:49:31 Dr Jeroen Vendrig
And what surprised me when I worked with data science students is for for me, coming from a computer science background, there's not really a difference between scripts and programmes.
00:49:44 Dr Jeroen Vendrig
And that's because I learn how to programme and hence scripting comes naturally, don't need to make any effort for it. What I hadn't realised is that the other way around.
00:49:53 Dr Jeroen Vendrig
It's not obvious at all. I often we often start out with scripts, right? You're just toying around with things, and scripts are much better than a completely software engineered thing.
00:50:03 Dr Jeroen Vendrig
But at some point you say, oh, yeah, let's take this to the next level. And you turned it into more software engine.
00:50:09 Dr Jeroen Vendrig
It codes you can do repeatable experiments, parameterize it and everything like that. And I noticed that this data science this weren't able to do that and was it's not just ability, they had never thought about doing that.
00:50:23 Dr Jeroen Vendrig
It's not a hard skill to learn if you can code basic software engineering, you should be able to pick it up quite quickly.
00:50:30 Dr Genevieve Hayes
I think it's cause a lot of data scientists do all their work in Jupyter notebooks, so they've never had that experience of actually working with direct script files.
00:50:41 Dr Jeroen Vendrig
Yeah. So I would encourage them to take those scripts and turn them into codes. The actual programming won't be that big a deal when you now have copilot, et cetera.
00:50:51 Dr Jeroen Vendrig
Who? Who can help you a lot with that. But what? What you need to do is basically know what do I want to achieve with this and somehow express that.
00:51:01 Dr Genevieve Hayes
Knowing what you now know about startup life, would you recommend it to our listeners?
00:51:08 Dr Jeroen Vendrig
Yes, I would recommend it because otherwise I would I would.
00:51:11 Dr Jeroen Vendrig
Leave it of.
00:51:12 Dr Jeroen Vendrig
Course, but it's not for the faint at heart.
00:51:15 Dr Jeroen Vendrig
I'd actually some discussion with our investors because data scientists have found that the best age for startup founders is like 42. It's it's a beautiful.
00:51:27 Dr Jeroen Vendrig
Number of course.
00:51:28 Dr Jeroen Vendrig
That shouldn't deter anybody, young or young or old. But what it does tell me as an interpretation is that getting some experience under your belt.
00:51:39 Dr Jeroen Vendrig
In a bigger company probably helps you a lot when you do your startup. So technically you might be ready for it.
00:51:46 Dr Jeroen Vendrig
So let's say you you focus your CTO and you focus on the AI part of it. But there is a lot of organisational stuff that you may not have been exposed to as a younger person and that's coming on you.
00:51:59 Dr Jeroen Vendrig
Right. Everything comes on you when you're when you're founders of the business.
00:52:03 Dr Jeroen Vendrig
Until you're big enough to hire specialists for that, I wouldn't really trade in my my experience at larger companies, I'm happy for that to be part of my journey.
00:52:14 Dr Jeroen Vendrig
Maybe I should have left a little bit earlier, but at the same time it's very exciting to work at startups because of the flexibility you have. I have actually done a release.
00:52:24 Dr Jeroen Vendrig
While on the phone with a client who who had an issue with something, so you can just do.
00:52:29 Dr Jeroen Vendrig
That those are the less interesting things, of course, but you can basically hear something from a client and say, OK, yeah, that's an interesting problem. Let's do something about.
00:52:38 Dr Jeroen Vendrig
That there's no paperwork involved, you just do it. There's no skunk works every for. Everything is skunkworks. Being that close to customers makes it very rewarding to do those things, which in bigger companies you usually very far away from customers. So that's the part that I would recommend to pursue.
00:52:58 Dr Genevieve Hayes
That statistic you gave about how the optimal age to start a startup is 42. I've heard that statistic before, and what I think is interesting is, are you familiar with Ericsson's stage of life work?
00:53:14 Dr Genevieve Hayes
So Ericsson was a psychologist and he basically divided the human lifetime into all these different stages. And there are different things that you achieve at different stages.
00:53:25 Dr Genevieve Hayes
So it's starting with basically all the different stages of infancy there and the stages after that are basically consistent with primary school.
00:53:34 Dr Genevieve Hayes
Secondary school and stuff like that.
00:53:37 Dr Genevieve Hayes
But once you get beyond eighteen, he divides adult life into 3 stages. Early adulthood, middle adulthood, and late adulthood. Late adulthood is basically your retirement stage, so let's just forget about that.
00:53:54 Dr Genevieve Hayes
But with the early and middle adulthood, early adulthood is basically from when you're about 18 until you're about 40, and that's doing all the things that you need to do to set yourself up for the rest of your.
00:54:08 Dr Genevieve Hayes
Life getting an education, getting experience, working in jobs. If you're interested in having a family, finding someone you know, things like that, and then at around age 40, it transitions into middle adulthood, which is when you're doing whatever it is that's.
00:54:28 Dr Genevieve Hayes
Going to achieve your life.
00:54:32 Dr Genevieve Hayes
It might be raising a family, or it might be starting a startup so it just felt to me when I heard that statistic that 42 is consistent with where Ericson puts the start of the middle adulthood phase.
00:54:47 Dr Jeroen Vendrig
Yeah, that's that's very interesting. And you might be right there. Listeners who are young shouldn't be discouraged with that. But what what I think might happen is.
00:54:56 Dr Jeroen Vendrig
You might actually be well suited to bring a startup to a certain level and then merge with a bigger company for the time where you where you need that that life experience.
00:55:06 Dr Jeroen Vendrig
So that's another way to do it. And there there are advantages to being young as.
00:55:11 Dr Jeroen Vendrig
Well, because you have more energy. So to be frank, I can't. I can't do it.
00:55:16 Dr Jeroen Vendrig
I can't pull an all nighter anymore.
00:55:17 Dr Genevieve Hayes
I could never pull an all nighter, even when I was in high.
00:55:20 Dr Genevieve Hayes
School, OK.
00:55:24 Dr Genevieve Hayes
What final advice would you give to data scientists looking to create business value from data?
00:55:30 Dr Jeroen Vendrig
Yeah, which repeat. So don't take the data as a given. Don't take it as fixed. You can make your own data set.
00:55:36 Dr Jeroen Vendrig
I I think with the language AI breaking through recently to the bigger public, there's been a lot of discussion.
00:55:42 Dr Jeroen Vendrig
That these engines are basically stuck in their feature space, so people call it like they're they're not conscious. That's the way of saying it.
00:55:49 Dr Jeroen Vendrig
They can't go and sense in new data, right? And they they can't. And maybe they shouldn't, but as a data scientist, you can do that for them and you can do that in a responsible way.
00:55:59 Dr Jeroen Vendrig
So that's what I would recommend to pay attention to.
00:56:04 Dr Genevieve Hayes
For listeners who want to learn more about you or get in contact, what can they do?
00:56:09 Dr Jeroen Vendrig
Yeah, LinkedIn is the best way to reach me. So my my name is pretty unique, so I will be easy to find. And yeah, if you want to chat more about AI or.
00:56:20 Dr Jeroen Vendrig
Startups. I'm happy to do that. And if you're in Sydney, you can find me at several events as well that are happening here in the ecosystem.
00:56:29 Dr Genevieve Hayes
And I'll link to your LinkedIn page in the show notes.
00:56:33 Dr Jeroen Vendrig
Thank you.
00:56:33 Dr Genevieve Hayes
Well, thank you for joining me today.
00:56:36 Dr Jeroen Vendrig
Thanks for the interesting questions.
00:56:38 Dr Genevieve Hayes
I had a great time. I learned a lot from this, and for those in the audience, thank you for listening.
00:56:44 Dr Genevieve Hayes
I'm doctor Genevieve Hayes and this has been value driven data science brought to you by Genevieve Hayes Consulting.

Episode 18: Making AI Commercially Viable
Broadcast by