Episode 15: Graph-Powered Data Science

Download MP3

00:00:00 Dr Genevieve Hayes
Hello and welcome to value driven data science brought to you by Genevieve Hayes Consulting. I'm your host, doctor Genevieve Hayes and today I'm joined by guest Doctor Alessandro ***** to talk about graph powered data science. Alessandro. Welcome to the show.
00:00:19 Dr Alessandro Negro
Thank you and thank you for inviting me.
00:00:22 Dr Genevieve Hayes
Alessandro has literally written the book on graph data science. In addition to being the chief scientist at Graphaware, the world's number one, Neo 4J consultancy and managing director at Graphaware, Italy. He is also the author of graph powered machine learning.
00:00:42 Dr Genevieve Hayes
And the author of the recently released knowledge graphs applied.
00:00:47 Dr Genevieve Hayes
Now I haven't read Knowledge Graph supplied yet, but I have read graph powered machine learning and I'll just say this is an excellent book that I would recommend to any data scientists looking to get started with graph data science. Not only does it provide all the necessary theory in a manner that's.
00:01:08 Dr Genevieve Hayes
Very easy to understand.
00:01:10 Dr Genevieve Hayes
It also gives worked examples including Python And Cypher source code that's can be used to produce them. And yeah, when I worked through that book, I created my own use case and you know, just created a Sherlock Holmes knowledge graph and I was able to use that.
00:01:30 Dr Genevieve Hayes
Code to build it on my own home laptop. Something you should be very proud of, Alessandro.
00:01:36 Dr Alessandro Negro
Yeah, I'm definitely happy that.
00:01:37 Dr Alessandro Negro
You found it useful, you know, for concrete use cases and for practising with your graphs.
00:01:45 Dr Genevieve Hayes
One thing it's probably worth calling out before we go too far is when we're talking about graphs. We're not talking about histograms or pie charts here are we.
00:01:56 Dr Alessandro Negro
Yes, definitely you know because this is something that the languages sometimes can generate issue. So actually when we say graph is just nodes and relationship, generally we refer to Instagram as charts. So just to be clear, we will use the charts for histogram or pie chart or whatever else.
00:02:17 Dr Alessandro Negro
Is a graphic and graphs or whatever is a model that represents our business use cases through nodes and relationships.
00:02:29 Dr Genevieve Hayes
Yeah, so it's sort of like a network like a social network.
00:02:33 Dr Alessandro Negro
Exactly, social Network is a an example of a graph application.
00:02:37 Dr Genevieve Hayes
Yeah, so if everyone just thinks Twitter or Facebook or LinkedIn, then they're probably going to be fine.
00:02:43
Yep, that's OK.
00:02:44 Dr Genevieve Hayes
How did you first become interested in working with graphs?
00:02:48 Dr Alessandro Negro
It happened many years ago, I would say.
00:02:51 Dr Alessandro Negro
00:02:53 Dr Alessandro Negro
And the first time that I was reasoning in terms of graph was because there was a designing a a sort of multi layer hierarchy representing these agents and subagents.
00:03:09 Dr Alessandro Negro
World, you know in which you have agents having under themselves many other people per area, for example and and such.
00:03:17 Dr Alessandro Negro
And I found out that the best way to represent this was of course using a graph, because this allowed me to sort of represent in the exact way that.
00:03:28 Dr Alessandro Negro
The reality as it is, and so I was, let's say, exposed the for the first time to to the graph, and specifically to near for Jay that at the time.
00:03:37 Dr Alessandro Negro
Was like the version 0.9 or such and and from this first meeting many many other ideas.
00:03:49 Dr Alessandro Negro
Came and so I built the first recommendation engine.
00:03:54 Dr Alessandro Negro
On top of.
00:03:55 Dr Alessandro Negro
Neo 4J, as my first experience of applying data science, let's say to the graph word.
00:04:03 Dr Genevieve Hayes
Did the organisation that you are doing work for already have NEO for Jay or did you have to actually do the exploration in order to discover Neo for Jay was the best tool for this use case?
00:04:15 Dr Alessandro Negro
Well at that time.
00:04:16 Dr Alessandro Negro
This was just a night and weekend project, you know, it was my personal interest in the in the field of the science 1st and then graph. So I was just playing around and I built a career by this and item.
00:04:35 Dr Alessandro Negro
In prod.
00:04:36 Dr Genevieve Hayes
So new for Jay. That's a graph database which I take it as the name would suggest, is specially designed for holding graph data, so the nodes and the relationships.
00:04:48 Dr Genevieve Hayes
Is it feasible to work with graph data if you don't have a graph database underpinning it?
00:04:55 Dr Alessandro Negro
Well, in theory.
00:04:56 Dr Alessandro Negro
Is possible? It depends on the sides, from. From my point of view, in the sense that.
00:05:03 Dr Alessandro Negro
There are many graph databases that they offer. They say graph interface, so they reason in terms of the nodes and relationship.
00:05:13 Dr Alessandro Negro
But behind the scene they have. I don't know whatever relational database or key value data store and and such. Of course there are.
00:05:22 Dr Alessandro Negro
There are pros and cons in in any approach. Let's say that the the so called the graph native.
00:05:30 Dr Alessandro Negro
Databases like NEO 4 J. They store the the data as a graph, so literally they have a list of nodes and for each node do they store their relationships and so on. So forth. So literally they have these additionally list storage mechanism but.
00:05:48 Dr Alessandro Negro
Makes the traversal of this graph.
00:05:52 Dr Alessandro Negro
Faster because of course, while you are for example finding shortest path or you are navigating a graph starting from a node.
00:06:00 Dr Alessandro Negro
This is a much faster because you don't have to go in a table or in a key value store and look up for that node and all the relationship and then the other node and all the relationship and such because.
00:06:12 Dr Alessandro Negro
You have all these attached to each node, so you start from a node. Then you see all the relationship.
00:06:17 Dr Alessandro Negro
Navigate this relationship and you move farther from from this. So in terms of graph traversal, this type of storage mechanism is much faster. The drawback is of course that it.
00:06:29 Dr Alessandro Negro
Cannot be, let's say short so you cannot spread it across multiple machines because it is much more complicated. You know there is no easy way unless the graph can be easily split in independent subgraph. So other graph that the bases leveraged these different data structure.
00:06:49 Dr Alessandro Negro
These are again.
00:06:50 Dr Alessandro Negro
On key value store for example for sharding the database, which means dividing in peace and storing in different servers.
00:07:00 Dr Alessandro Negro
That of course has some other advantages that are not definitely for graph traversal, but for certain type of graph analytics. So the the way in which you store this graph.
00:07:11 Dr Alessandro Negro
Has a direct impact on the efficiency of certain type of use cases versus others.
00:07:19 Dr Genevieve Hayes
What's the largest graph database you've come across in your work?
00:07:23 Dr Alessandro Negro
Well, we definitely stored or created in store big databases. A few of them had like billions of of nodes and the relationship and it was related to a certain law enforcement use cases.
00:07:43 Dr Alessandro Negro
In which you have to collect data from an huge number of data sources and the hands you have a a big database to handle in this case.
00:07:55 Dr Genevieve Hayes
I'm guessing that something like Twitter or Facebook has an there for Jay or similar database underpinning their operations.
00:08:04 Dr Alessandro Negro
Well, yeah they may. They have a graph database, but in both cases they created their own graph database data structure.
00:08:15 Dr Alessandro Negro
Both Twitter and Facebook have, let's say, their own version of a graph to the base that they created by themselves, other companies.
00:08:24 Dr Alessandro Negro
Relied on on.
00:08:26 Dr Alessandro Negro
Only for Jay, but these are, let's say.
00:08:29 Dr Alessandro Negro
Big Big social network providers have their own because they have very specific type of analysis to do and so they created their own.
00:08:39 Dr Genevieve Hayes
Yeah, and I mean a big tech company has the financial capability to develop their own graph database, whereas your average company does.
00:08:49 Dr Alessandro Negro
Yeah, of course.
00:08:51 Dr Alessandro Negro
They have their resources that they need to build by themselves and let me say also that they started a bit earlier than you. For Jay, you know Twitter was there before me and for Jay, so they had this need before near for Jay sort of democratised.
00:09:09 Dr Alessandro Negro
The concept of graph database to all the other the other companies you know. As usual you have the early adapters and definitely Facebook and Twitter.
00:09:18 Dr Alessandro Negro
Where in in this area and the new for Jay literally took this. This idea and made a a product out of it that other companies.
00:09:29 Dr Alessandro Negro
News and the same lead. The Twitter in some way or the other. If you want you can access to the software that they use for storing the graph database, but it's not.
00:09:40 Dr Alessandro Negro
Indeed, you know for any use case has a a very specific set of features and very set specific tasks that you can accomplish with that database instead near for J.
00:09:52 Dr Alessandro Negro
Let's say since they were doing a business out of this case, they made it and they are still making it.
00:10:00 Dr Alessandro Negro
Generic, let's say it's for solving multiple type of problems rather than just one.
00:10:06 Dr Genevieve Hayes
So far we've touched on 2 use cases for graph databases, so we've touched on the social network use case and you also mentioned law enforcement use case that you'd come across, uh, what other use cases have you come across for graph databases?
00:10:22 Dr Alessandro Negro
Well, I would say many really their recommendation that I was mentioned before because it belongs to my heart since I started my career in this area with a recommendation engine is definitely something that is.
00:10:40 Dr Alessandro Negro
You know, very active, not only because it empowers complex type of recommendation engine, but it solves also complex issues around these type of machine learning task. I'm thinking specifically to code start or contextual recommendation. So these kind of problems can be.
00:11:00 Dr Alessandro Negro
For the in an easier way, if you are using a graph database, but apart from recommendation, that is still an on topic in the graph space, there are many others that are jumping out. I'm thinking about fraud detection. For example. I'm thinking of criminal intelligence that we were discussing before.
00:11:21 Dr Alessandro Negro
But also very recently there is a new trend that I will define the knowledge graph trend in which you know these semantic web encounter the the the graph space and from these.
00:11:36 Dr Alessandro Negro
It's a merging of ideas. These knowledge graph idea, you know was born in some way, and these are literally opened many other domains to this graph. Way of thinking, because imagine that.
00:11:55 Dr Alessandro Negro
You have a a a medical use case. The knowledge graph can literally help you in gathering.
00:12:05 Dr Alessandro Negro
From various type of data sources, literature as well as ontologies, as well as structure that the sources and combine in these big single source of knowledge where a clinician or where researchers can, let's say, rely on.
00:12:24 Dr Alessandro Negro
Or making any type of analysis, but also for exploration purposes and speeding up the current research. For example, this is another very relevant use case, so in the biological or specifically biomedical space, the.
00:12:40 Dr Alessandro Negro
Graphs so these specific.
00:12:42 Dr Alessandro Negro
Type of graphs are becoming a sort of standard and and the same is for example in the financial sector in the banking sector, where again they are using this knowledge graph.
00:12:54 Dr Alessandro Negro
Again, this single source of knowledge for offering not only for detection that I mentioned before, but also advanced services.
00:13:01 Dr Alessandro Negro
To their to their customers. There is this concept of customer 360.
00:13:05 Dr Alessandro Negro
That's it, jumping.
00:13:06 Dr Alessandro Negro
Out of here and there, in which what they are doing is to collect all the information around on the a user and performing a cross selling for example or performing advanced type of suggestion. Recommendation again as well as tailoring a certain type of offering to.
00:13:26 Dr Alessandro Negro
To them, based on their specific needs.
00:13:29 Dr Alessandro Negro
Or to the needs that they could have in the in the future. In all these cases, what there is something in common that is the ability of the graph and specifically the knowledge graph to aggregate data from different type of data sources above structure and the structure and the offering.
00:13:50 Dr Alessandro Negro
A unique view, let's say a global view on the on the.
00:13:55 Dr Genevieve Hayes
So I'm I'm just still trying to visualise this so it's very easy to visualise the idea of a social network because you've got you know the nodes being people and the edges being the Connexions between me and someone who's my Facebook friend for example, but with a knowledge graph.
00:14:16 Dr Genevieve Hayes
I'm guessing that the nodes would be individual concepts.
00:14:21 Dr Genevieve Hayes
For example, a person or a place or a disease. If we're talking about medical research.
00:14:29 Dr Alessandro Negro
Yes, exactly.
00:14:30 Dr Genevieve Hayes
Would a relationship be something like? You know, if we're talking about a tennis player, say, Novak Djokovic has played tennis at the Wimbledon Tennis Court, for example, would that?
00:14:41 Dr Genevieve Hayes
Be right.
00:14:42 Dr Alessandro Negro
Well, that's exactly what it is. You know. Let me give you a blood overview. The graph as it is, is a very, very simple mathematical concept, you know.
00:14:53 Dr Alessandro Negro
It is just a set of notes and relationships or a set of vertex and edges if you prefer so as a mathematical concept is super simple.
00:15:04 Dr Alessandro Negro
You know everybody can understand then what happens with the social network that you were mentioning before is that we are adding a sort of semantic on top of this.
00:15:13 Dr Alessandro Negro
Yeah so.
00:15:15 Dr Alessandro Negro
We are saying that nodes represent people and relationships represent social relationship between between people, friendship or working and and whatever you know the knowledge graph is exactly the same concept. It is a graph, nothing more, nothing.
00:15:35 Dr Alessandro Negro
Yes, but we applied a much more semantic on top of it, you know. And according to the domain which you are, these nodes represent different concepts. So if we are in the biomedical space as we were mentioning, nodes can be genes, diseases, protein.
00:15:52 Dr Alessandro Negro
Means treatments, whatever and the relationships are. For example, biological connexion between a gene and related protein, or can be a relationship between the proteins because they interact together or between diseases because they are connected somehow.
00:16:10 Dr Alessandro Negro
And then these diseases can.
00:16:12 Dr Alessandro Negro
Can be connected to relative.
00:16:14 Dr Alessandro Negro
Genes that are well known Connexions, for example, between genes and and diseases, and so on. So we can say that the knowledge graph is a set literally of interconnected entities with their attribute.
00:16:27 Dr Alessandro Negro
But and and and relevant relationships between between these nodes and then concepts that are specific for a domain.
00:16:36 Dr Genevieve Hayes
And with that example of the diseases and the genes I'm imagining you could have something like a disease like COVID which is connected to this particular. I don't know pro.
00:16:47 Dr Genevieve Hayes
Teen and then you could have that proteins also linked to this protein and this disease and that would allow you.
00:16:53 Dr Genevieve Hayes
To find similarities between diseases and presumably, and I know nothing about medical research, so I'm just making this up, but presumably that would help a medical researcher to identify Connexions which.
00:17:08 Dr Genevieve Hayes
Might help them to create some sort of novel treatment for this particular disease.
00:17:14 Dr Alessandro Negro
Yes, exactly, there is an interesting point about this example. You know, because first of all there is a a concept that I like to mention all the time that I.
00:17:24 Dr Alessandro Negro
Was speaking about graphs.
00:17:25 Dr Alessandro Negro
You know, once you have all your data stored in the form of.
00:17:29 Dr Alessandro Negro
A graph every single node and every single relationship could be an access point for your.
00:17:34 Dr Alessandro Negro
Analysis for your exploration, exactly as you mentioned, you know I have a specific disease in mind. I would like to explore the the surroundings. Let's say around this COVID.
00:17:44 Dr Alessandro Negro
That's one perfect and I will say one of the most common use cases or for for graph or usages.
00:17:52 Dr Alessandro Negro
If you prefer, then of course there are other type of more complex analysis and again you mentioned these you know in your example. So one of the.
00:18:04 Dr Alessandro Negro
Major use cases. I will say in the biological space is the so called drug.
00:18:09 Dr Alessandro Negro
Using that means that you have drugs or compounds already existing and you would like to see if existing drugs existing compounds can be used for a new disease.
00:18:22 Dr Alessandro Negro
This is exactly what happened for COVID. You know if you remember when we were using hydroxychloroquine for example as a as a way of.
00:18:29 Dr Alessandro Negro
Treating COVID, then we discovered that was not the case, but unfortunately at the time we didn't have the knowledge that we have right now.
00:18:38 Dr Alessandro Negro
This is the classical use case, in which case we are using complex machine learning. Let's say tasks for.
00:18:47 Dr Alessandro Negro
Performing this type of drug, repurposing this that translated is no more, no less than a so-called link prediction. So you have a graph with existing links and existing, let's say relationships and you would like to predict unseen or unused relationships, and in this case.
00:19:07 Dr Alessandro Negro
Instead of doing what we were mentioning before, like a simple exploration, you're doing a deep analysis of your graph.
00:19:14 Dr Alessandro Negro
In order to accomplish a much more complex tasks that is like in this case link prediction. So you are literally discovering.
00:19:26 Dr Alessandro Negro
New relationships where they are hidden somewhere in the structure of the graph. For example, you know and juggler proposing is a classical example.
00:19:34 Dr Genevieve Hayes
So if a new version of COVID came out so COVID 23 God help the world, let's hope that doesn't happen, but that would be something that previously not existed in that graph.
00:19:46 Dr Genevieve Hayes
But given whatever limited information we had on that, we could then predict what previous drugs is that linked to.
00:19:55 Dr Genevieve Hayes
And then hopefully come up with some treatment for it very quickly so that the world doesn't end up in another series of lockdown.
00:20:03 Dr Alessandro Negro
Yeah, exactly, this is in reality what happened already with the COVID-19. If you think that after Zeneca, for example, that was one of the first company producing a vaccine, they used literary and knowledge graph for producing their vaccine. So there are plenty of talks about this specific topic, so it happened already.
00:20:24 Dr Alessandro Negro
Of course, hopefully.
00:20:25 Dr Alessandro Negro
Like the the next time, little as you said, we wish won't happen again, but in the case knowledge graphs can definitely play another key role in the say discovery of new cures for the diseases or for finding existing.
00:20:45 Dr Alessandro Negro
Drugs that can.
00:20:46 Dr Alessandro Negro
We help.
00:20:47 Dr Genevieve Hayes
One knowledge graph that I'm familiar with is the Google Knowledge graph. So for any listeners out there who aren't familiar with it, whenever you search on something on Google, like for example a person's name or a city location, you'll often get that box down the side of the page. If you're using the desktop version.
00:21:08 Dr Genevieve Hayes
Or at the top of the screen if you're using it on mobile and it'll give you key facts about that person or that location, what is the practical application of that Google knowledge graph beyond providing interesting facts about locations and people when you search?
00:21:27 Dr Alessandro Negro
Well, definitely I would say that knowledge graphs were introduced in this world and for this specific type of usage, from from Google for the first time, you know, if you search for knowledge graphs on this Google trend, let's.
00:21:46 Dr Alessandro Negro
Say feature available.
00:21:46 Dr Alessandro Negro
Available in Google, you will notice that around 2:00.
00:21:49 Dr Alessandro Negro
00:21:51 Dr Alessandro Negro
You will see, let's say a spike that is related to the introduction for the first time. When this concept after then that.
00:21:58 Dr Alessandro Negro
Nothing was the same and they had an interesting way for introducing this concept that was searching for things instead of searching for strings, you know that is exactly what you described.
00:22:12 Dr Alessandro Negro
You know if I'm I'm searching for a specific concept, I don't want to get only the list of documents mentioning.
00:22:19 Dr Alessandro Negro
This specific word or set of words that is searching for strings, but I would like to get exactly that specific thing, so the box on the on the side of the search.
00:22:30 Dr Alessandro Negro
Well is literally the the thing, hopefully or the things that we were searching for and that change change dramatically.
00:22:39 Dr Alessandro Negro
The way in which they were offering these search results, and it is entirely powered by knowledge graph, but definitely is one of the most relevant usages.
00:22:51 Dr Alessandro Negro
Of knowledge graph for their specific case.
00:22:54 Dr Genevieve Hayes
At the end of your book graph powered machine.
00:22:58 Dr Genevieve Hayes
You go through a use case of how to build a knowledge graph from scratch. Would you be able to give the listeners a condensed version of how they'd go about building a knowledge graph?
00:23:10 Dr Genevieve Hayes
Because one of the things that I thought was really cool about. That book was even though obviously someone like me couldn't build.
00:23:17 Dr Genevieve Hayes
The knowledge graph the size of Googles it was pretty cool to be able to build my own Sherlock Holmes knowledge graph just on my laptop on the weekends.
00:23:27 Dr Alessandro Negro
Let me say that these are, you know, two chapters were so useful to many people that we decided to to write another entire book that will be on that topic.
00:23:38 Dr Alessandro Negro
So I would say that.
00:23:39 Dr Alessandro Negro
The knowledge graph applied, that is the.
00:23:41 Dr Alessandro Negro
Book we are.
00:23:42 Dr Alessandro Negro
Working to in in these norms started.
00:23:48 Dr Alessandro Negro
Exactly from from this idea, you know from the last two chapters of the of the previous book in which I was building this knowledge graph and extended to, let's say, other 600 pages, more or less.
00:24:01 Dr Alessandro Negro
The the reason is, uh, what you mentioned. You know, this is definitely one of the major concerns that many people and many companies have.
00:24:10 Dr Alessandro Negro
You know how can I build another graph? Well, let me say that there are two major not issues but approaches.
00:24:20 Dr Alessandro Negro
And the both of them are valid, and they should in some way merge on one side of.
00:24:24 Dr Alessandro Negro
Course you can have.
00:24:25 Dr Alessandro Negro
A structured data sources, CSV files or relation other bases or many other sources that are structured by.
00:24:35 Dr Alessandro Negro
And for this it's relatively simple once you identify.
00:24:40 Dr Alessandro Negro
The the the key.
00:24:41 Dr Alessandro Negro
Entities or the key concepts. As we were saying before that you would like to store in the in the knowledge graph.
00:24:48 Dr Alessandro Negro
Sorry in the knowledge graph and and you identified also the relationships and the global schema, then it's pretty.
00:24:55 Dr Alessandro Negro
Straight forward to.
00:24:57 Dr Alessandro Negro
Load this data in the form of a graph you know. Generally everybody can do it.
00:25:04 Dr Alessandro Negro
Then there is another interesting area that is much more complicated, but definitely more satisfying. That is, the conversion of the so-called unstructured sources in a knowledge graph, and that's where you could have more fun. As I said, because imagine that you have a text, you know.
00:25:24 Dr Alessandro Negro
Uh, the text, as I said, generally is referred as unstructured, but in reality our.
00:25:29 Dr Alessandro Negro
Languages have a lot of structure inside. You know our grammar, syntactic dependencies and such so you can leverage this structure and literally extract an enormous amount of information from from the text. Typical example is the so-called named entities, which means that you should recognise in a text.
00:25:51 Dr Alessandro Negro
If let's say, a couple of words are how to say a person rather than a location rather than a company, and so on so forth, or a disease and a apart from recognising these entities, you should be able to also to recognise the relationship.
00:26:10 Dr Alessandro Negro
Between these entities, you know that in some way are simple to extract and others are more complicated because the the simplest example is the connexion between subject, verb and object, and then you can extract the easy relationship between the subject and the object, for example.
00:26:29 Dr Alessandro Negro
Of of a specific sentence, others are a bit more complicated to extract.
00:26:34 Dr Alessandro Negro
But still doable. So this task is called entity relationship extraction and can be accomplished by using the rules as it is presented in the in the book.
00:26:44 Dr Alessandro Negro
But also you can create a complex, let's say machine learning models to extract these sort of relationships between.
00:26:54 Dr Alessandro Negro
Between entities
00:26:55 Dr Alessandro Negro
And there is a I'll say 1/3 task.
00:26:58 Dr Alessandro Negro
Again related to this area of conversion from unstructured to knowledge graph.
00:27:03 Dr Alessandro Negro
That is the so called the named entity disambiguation or entity linking, which means that you are connecting these extracted entities to a sort of knowledge base. So for example.
00:27:15 Dr Alessandro Negro
If you are extracting a the word diabetes then you should be able to connect it to the right.
00:27:22 Dr Alessandro Negro
Type of diabetes you know, and so on and so forth. So this connexion between an entity extracted from a text, and let's say the well known entity in a in a knowledge base, allows you to not only know more about that specific entity more than just the name.
00:27:42 Dr Alessandro Negro
But also extractor Connexions between the this entity and other entities inside the text or inside the the knowledge base that you have.
00:27:51 Dr Genevieve Hayes
I'm just guessing this is how Google did their knowledge graph so they could have taken basically every web page, extracted the named entities from those web pages, or even just from something like Wikipedia and then use that to connect nodes and entities and build their knowledge graph.
00:28:09 Dr Alessandro Negro
Yes exactly. I would say that these Wikipedia that you mentioned is still the most relevant knowledge base that everybody is using.
00:28:18 Dr Alessandro Negro
In many cases you know I will say in many generic cases like Google for example, you know you know this already, that whenever you search for some well known name or well known.
00:28:30 Dr Alessandro Negro
The first box that we were discussing before will be a wiki page, so definitely you know Wiki page represent the the the main source of this knowledge graph for things there is only one drawback that is related to the.
00:28:50 Dr Alessandro Negro
Let's say to the specific domains that you could have, you know on your path. For example, if you are speaking about a medical domain or other very tiny domains.
00:29:03 Dr Alessandro Negro
Unfortunately, the availability of a well known and well structured knowledge base is is less, let's say probable and, which means that you need to build your own knowledge base.
00:29:16 Dr Alessandro Negro
You need to build your own mechanism for extracting relevant information from text. We will of course build its own.
00:29:24 Dr Alessandro Negro
Namit recognition models and into relationship instruction models on generic applications on generic domains. Not very specific ones.
00:29:33 Dr Genevieve Hayes
So it's basically the same as what you find with any of those pre built model.
00:29:37 Dr Genevieve Hayes
The pre built models are designed to cater for the generic use case that the majority of people want to use.
00:29:44 Dr Genevieve Hayes
But if you have a very specific organisation based application you're going to have to build your own use case.
00:29:53 Dr Alessandro Negro
Yes, exactly, that's perfectly representing what happens every day. You know it's rare for a specific company like I don't know.
00:30:02 Dr Alessandro Negro
In the financial sector or in the law enforcement sector to can rely on existing models because they are too generic. You know they have specific needs they would like to recognise.
00:30:12 Dr Alessandro Negro
Specific entities in the text that that are just not available in the generic language models available on ageing face, for example. So they have to build their own.
00:30:23 Dr Genevieve Hayes
Yeah, I I was recently at a conference where there was a woman from Ambulance Victoria speaking and she was saying how ambulance Victoria had to build their own named entity recognition model because the generic models or no, sorry it's a sentiment analysis model because the generic models did not understand the way paramedics speak.
00:30:43 Dr Alessandro Negro
Exactly, there is a very very common problem. That's why we are partnering with with a company called.
00:30:50 Dr Alessandro Negro
Ubi?
00:30:52 Dr Alessandro Negro
The eye that offers a sort of annotation tool, you know in which domain experts can just go through tonnes of documents or fewer documents and annotate entities and relationships that are relevant for day specific domain and build the automatic automatically models language models.
00:31:12 Dr Alessandro Negro
Out of the of of this annotation, uh, this is a very, very relevant task because, as we said, once you approach specific domains with this specific problem, you need to build your own language model and this tools.
00:31:30 Dr Alessandro Negro
Allow you to to do this specific task that through annotation you can create your own model to recognise what matters for for you. For the domain that you are trying to handle.
00:31:42 Dr Genevieve Hayes
Yeah, in in my previous job we were working in a very specific domain and one of the biggest challenges we found was get finding individuals within the organisation who understood the data well enough to annotate it and who were prepared to spend all the hours or days that.
00:32:02 Dr Genevieve Hayes
Would require in order to annotate that data.
00:32:06 Dr Alessandro Negro
Well, I can't say what is more difficult to to find people with the right expertise or to convince them to spend time on a laptop or a computer, you know.
00:32:17 Dr Alessandro Negro
And performing the annotation well, let me say that this is hard everywhere. What we are trying to do is to make these.
00:32:27 Dr Alessandro Negro
This process more auto.
00:32:29 Dr Alessandro Negro
Created, which means that through I don't know dictionary. For example, you can feed the the first annotation, for example and and creating a sort of feedback loop.
00:32:43 Dr Alessandro Negro
You know, while you are updating the language models is build and this language model can be used for pre annotating, the next set of documents so that.
00:32:50 Dr Alessandro Negro
Really, the amount of time concretely required for the real users real people to provide feedback could be reduced, you know, and so they will be less.
00:33:02 Dr Alessandro Negro
Annoyed by these these task? So really it's hard to say what is more complicated because you are right, you know convincing them to spend hours on.
00:33:13 Dr Alessandro Negro
In front of a.
00:33:13 Dr Alessandro Negro
Computer to annotate.
00:33:15 Dr Alessandro Negro
It's not that simple.
00:33:17 Dr Genevieve Hayes
So, So what you're saying is if someone's already annotated Australia as the name of a country?
00:33:23 Dr Genevieve Hayes
Before and every time Australia comes up, it's always annotated as a country name.
00:33:29 Dr Genevieve Hayes
Then it could skip over that and just focus on. I don't know if it's never come across the name of a small country like I don't know. Lichtenstein, for example, which doesn't come up as often.
00:33:42 Dr Alessandro Negro
Yeah, exactly. That's basically the idea.
00:33:45 Dr Alessandro Negro
So let's say the the dictionary.
00:33:47 Dr Alessandro Negro
Base is much simpler because.
00:33:50 Dr Alessandro Negro
If the name.
00:33:51 Dr Alessandro Negro
Matches then you know what it is and this can be used for training a more complex language model, not dictionary based.
00:33:59 Dr Alessandro Negro
And again, this language model can be used to pre annotate. Of course the dictionary generally.
00:34:06 Dr Alessandro Negro
Let's say has.
00:34:07 Dr Alessandro Negro
A bigger precision?
00:34:10 Dr Alessandro Negro
So you know if Australia is recognised as a.
00:34:15 Dr Alessandro Negro
As a key entity, it will be always like this. You know, there are few chances that it is wrong, but the recall is very limited.
00:34:24 Dr Alessandro Negro
You know which means that you won't be able to recognise all the the name of the locations. For example, you know because you don't have a dictionary containing all this.
00:34:36 Dr Alessandro Negro
Of course, the locations is not a good example, but I think you understood.
00:34:39 Dr Alessandro Negro
What I mean?
00:34:39 Dr Alessandro Negro
Yeah, that's why on the other side the language model called the give you the opposite could have, and I recall. So in theory this language model is capable of.
00:34:50 Dr Alessandro Negro
Let's say catching more names, but at the same time could be wrong, you know, because the structure of the.
00:34:58 Dr Alessandro Negro
Sentence could suggest.
00:34:59 Dr Alessandro Negro
That that specific entity is a location, for example, but could not be.
00:35:04 Dr Alessandro Negro
It's just that it seems to be a location, but it's not, and that's where again, where the humans can not only.
00:35:11 Dr Alessandro Negro
Add annotation but can also correct annotations you know, and then in this processor you can have this sort of a human in the loop in which you are, you know, let's say helping concretely the the machine.
00:35:24 Dr Alessandro Negro
To understand the human human language, I will say that based on my personal experience, all this work payoff has a really a good payoff.
00:35:33 Dr Alessandro Negro
You know, because what you can get out of.
00:35:35 Dr Alessandro Negro
This is a.
00:35:36 Dr Alessandro Negro
Custom language model that nobody has, for example. So there is a lot of value resulting out from from from this.
00:35:44 Dr Alessandro Negro
Effort really specifically for tiny domains that you were mentioning before you know this is a a key step to extract relevant information and then build a knowledge graph out of your text.
00:36:01 Dr Genevieve Hayes
And I could imagine some startup company, for example, going to the trouble of building one of those knowledge graphs, and then they could build some sort of product around that which would presumably if it's the right product and people really want it, it would be unique and allow them to charge quite a high.
00:36:23 Dr Alessandro Negro
Yes, yes there are many, many companies you know that are doing this for living we are in contact with a few of them in which you know what they have as a business. Value is literally the the right expertise you know. So they have on one side domain experts that.
00:36:43 Dr Alessandro Negro
Or, say engaged for annotating documents for building ontologies also, for example. So not only annotating documents, but also creating relevant information in the form of.
00:37:00 Dr Alessandro Negro
Ontologies right Connexions between key concepts and on the other side they have also technical people that could help. I don't know pharmaceutical company, for example to leverage these language models. These ontologies in the right way for building a complex.
00:37:20 Dr Alessandro Negro
Let's say applications for example, you know, so that's that's an enormous.
00:37:25 Dr Alessandro Negro
Way, let's say.
00:37:27 Dr Alessandro Negro
There is an enormous opportunities for many small companies you know to build a niche domain language, for example, and offer this to their to their customers. So it's a new word. I would say opportunities for for these companies.
00:37:42 Dr Genevieve Hayes
I've come across graph databases in my own work and that was in a relatively large organisation with us in within Australia. But from speaking to other people I know many data scientists have never come across graph databases.
00:38:00 Dr Genevieve Hayes
How prevalent are graph databases at the moment?
00:38:03 Dr Alessandro Negro
Still not that much in the sense that it is growing. I mean in the last 10 years, definitely. It's much easier now to find people that are expert or at least aware of this new area. But still, you know, I would say that the data science field is so huge that.
00:38:24 Dr Alessandro Negro
Everything is very specific. For example, you have many data scientists. For example working in the NLP space you know.
00:38:34 Dr Alessandro Negro
Like not only extracting the relevant information, but also building questioning, answering systems and and so on and so forth.
00:38:41 Dr Alessandro Negro
Then you have, let's say there are scientists that are expert over for detection, and again this is a huge area. You know in which people.
00:38:54 Dr Alessandro Negro
Really specialised in that specific field or in the recommendation, or in many other, let's say high level set of applications and.
00:39:04 Dr Alessandro Negro
Let's say that.
00:39:04 Dr Alessandro Negro
A graph in this space could be an help. In each of these of these vertical, but still is still not so well understood. You know, because it's not only a niche, it's really a a new arrow in their bow.
00:39:24 Dr Alessandro Negro
They could use for example, the graphs for improving recommendation engine solving a. I don't know cold start problem. For example, the same could be for for detection. You know they can use graph for solving the for revealing.
00:39:42 Dr Alessandro Negro
Things which means that the people that are connected to each other, you know they are trying to accomplish a.
00:39:48 Dr Alessandro Negro
Certain type of frauds.
00:39:50 Dr Alessandro Negro
So it's it's not that you have to use one or the other, but they can be combined in many domains for offering better services today.
00:40:02 Dr Alessandro Negro
Their internal company or to their users. Unfortunately, this is still not perceived as this. You know, there are not that many conferences speaking about graphs or knowledge.
00:40:15 Dr Alessandro Negro
And then not then we will attain ending just yet it will take time. But definitely I see that the trend is very clear.
00:40:24 Dr Alessandro Negro
You know you can see the number of companies using graphs or leveraging graph technologies for their advanced services. It will come. It's just that we need.
00:40:35 Dr Alessandro Negro
More more time and definitely you know books like other books or other people book can can can help in this in this process.
00:40:45 Dr Genevieve Hayes
Is the prevalence of graph database uptake differ by country?
00:40:50 Dr Alessandro Negro
Well, we definitely noticed that there are some differences in in different countries. For example, when we first landed in Australia, we noticed that it was a sort of Greenfield for us. You know differently than the US where this concept was very well established.
00:41:10 Dr Alessandro Negro
But you know US, you know they are always cutting edge.
00:41:15 Dr Alessandro Negro
In the.
00:41:15 Dr Alessandro Negro
The technology in Australia was a bit more complicated for us to convince people that this could be the way to go, but I will say that after.
00:41:25 Dr Alessandro Negro
While we noticed that these generated a lot of interest and now we have different.
00:41:31 Dr Alessandro Negro
Companies working with us.
00:41:33 Dr Alessandro Negro
And we are offering our services and even our, let's say teaching effort to to them in order to educate to the user graphs as a again another technology that can be.
00:41:46 Dr Alessandro Negro
Useful in many many different scenarios.
00:41:50 Dr Genevieve Hayes
Next thing I want to explore is how can machine learning be applied to a graph database?
00:41:57 Dr Alessandro Negro
OK, this is an interesting question because I I see let's say graph databases and machine learning that can let's say use each other in.
00:42:07 Dr Alessandro Negro
In different ways.
00:42:09 Dr Alessandro Negro
Let me say that on one side you can have that the graph databases can be used for organising.
00:42:17 Dr Alessandro Negro
Your data before applying any machine learning model, you know one of the major.
00:42:26 Dr Alessandro Negro
Tasks in the machine learning is a data preparation data cleaning. Let's say feature engineering. These are complex tasks. You know that sometimes take more than 80% of the data scientists time. In this sense, graphs can help you. As I mentioned before, you know to collect.
00:42:47 Dr Alessandro Negro
The data, but not in the same way in which, for example, Data Lake was doing before because in the data lake what happened was just that people.
00:42:55 Dr Alessandro Negro
Well, we're putting all their data in the whatever structure you know and then data scientists that the poor data scientists have to literally go through an enormous set of tasks for cleaning, improving and reaching before even start.
00:43:15 Dr Alessandro Negro
Thinking about any machine learning model.
00:43:20 Dr Alessandro Negro
Graphs and knowledge graphs specifically have the semantic applied to this to this data, so it's not only data, it's organised data, which means that you know that a person is.
00:43:34 Dr Alessandro Negro
A person with a the.
00:43:35 Dr Alessandro Negro
Relevant, let's say Connexions, and with the relevant.
00:43:39 Dr Alessandro Negro
Attributes, it's totally different. You know it's really well organised source.
00:43:44 Dr Alessandro Negro
True that you can then use for performing data cleaning, but also for extracting the features that you you need for the next step.
00:43:53 Dr Alessandro Negro
So in this case, graphs can help you really in the early stages of your of your processes or your analysis.
00:44:01 Dr Alessandro Negro
Other than that, what you can do?
00:44:04 Dr Alessandro Negro
On the other way around is to literally leveraging graphs.
00:44:08 Dr Alessandro Negro
Or building your machine learning models. You know you can use a graph algorithms for example directly. If you imagine the social network case you know you can easily use the network to identify key people. For example, you know this is a classical example, but you can use it.
00:44:28 Dr Alessandro Negro
For identifying.
00:44:30 Dr Alessandro Negro
Clusters like communities inside the, let's say the graph.
00:44:35 Dr Alessandro Negro
Of course this.
00:44:35 Dr Alessandro Negro
Is useful not only in the social network analysis.
00:44:39 Dr Alessandro Negro
Imagine for example, if you are storing protein to protein interaction in your graph database, you know and you perform a community detection.
00:44:49 Dr Alessandro Negro
In this case, what you are recognising are set of the proteins that are generally well connected together and they can be for example connected to a well defined.
00:45:03 Dr Alessandro Negro
Set of diseases.
00:45:04 Dr Alessandro Negro
So you can literally create models of your reality based on graph algorithms. Recently, for example, there is this new trend called graph neural networks.
00:45:16 Dr Alessandro Negro
You know, in this case what you do is to store your information again in the form of a graph. Then you apply these neural networks.
00:45:24 Dr Alessandro Negro
Model and you are able to, let's say, move literally from the graph space to a multidimensional space. You are vectorizing. For example these. These nodes and these vectors are.
00:45:36 Dr Alessandro Negro
Are the input of complex model that you can for example use for building a classification? You can build a also link prediction as we were discussing before. So literally you know you can use graphs in many areas of your machine learning tasks you know.
00:45:56 Dr Alessandro Negro
As I said, you can use as an input so you can use as a core element.
00:45:59 Dr Alessandro Negro
Of your let's say.
00:46:03 Dr Alessandro Negro
Machine learning tasks. Or you can use even for for exploration. You know for sometimes even for understanding how certain type of models are are working. You know that recently there is also this. This new trend related to explainable AI, you know because.
00:46:22 Dr Alessandro Negro
If you are offering recommendations, nobody care. You know, nobody even asks.
00:46:29 Dr Alessandro Negro
You know how the Netflix recommendation engine is working? I don't care, you know, if Netflix will recommend this or that, I can say, oh, wow, this is very relevant for me, or I don't care if a self driving car is driving me somewhere. Well, I have no idea how it works. You know how this car?
00:46:49 Dr Alessandro Negro
And read all the the environment variables and convert these in in a path. You know. Of course I care that I.
00:46:56 Dr Alessandro Negro
Like to to reach a specific place in a in a safe way, but no more than that. But imagine if you are a a doctor and you have to recommend a specific treatment to to a patient based on a machine learning model. Well you would like to know how this.
00:47:16 Dr Alessandro Negro
Specific treatment has been produced by the machine in this sense, graphs can help you to better understand certain type of internals. Let's say of the models and so they can.
00:47:29 Dr Alessandro Negro
Since they apply this semantic on top of data, it's easier for you and for the machine to explain how certain type of decisions have been taken from the machine that you know allow the. In this case the the doctor to understand why this is.
00:47:49 Dr Alessandro Negro
Coming out from the machine and of course being more confident before healing the the patient with a specific treatment, for example.
00:47:57 Dr Genevieve Hayes
Yep, so that's because people can actually look at the graph itself and say.
00:48:02 Dr Genevieve Hayes
Yeah, and say this node connects to this node and et cetera.
00:48:07 Dr Alessandro Negro
Well, that's basically the the most relevant one, but of course you know in certain cases you can explore a huge area of the graph in one shot and understand exactly from where these decisions are are coming.
00:48:21 Dr Alessandro Negro
You know so.
00:48:22 Dr Alessandro Negro
But yeah, it is exactly exploration that allows you to to discover.
00:48:27 Dr Alessandro Negro
Certain type of decisions.
00:48:28 Dr Genevieve Hayes
And I could imagine that's also very important in the financial and legal.
00:48:33 Dr Genevieve Hayes
Domains because, well, if someone's going to be sent to gaol for something they wanna know why. And if someone's gonna be penalised financially, obviously.
00:48:43 Dr Alessandro Negro
Yeah, absolutely. This is a critical aspect, you know, really, this explainable AI is coming up here and there more and more often, even in the criminal intelligence that you were mentioning, you know there are some studies in which for several reasons they noticed that certain type of machine learning algorithms were a bit.
00:49:03 Dr Alessandro Negro
Bias it, you know.
00:49:04 Dr Alessandro Negro
So by introducing these expendability they were able to to understand why these models were biassed by certain type of I don't know characteristic of the people.
00:49:17 Dr Alessandro Negro
For example, you know and these are for them to fine tune, for example and such, so this is becoming a a really a relevant information to know about, you know.
00:49:26 Dr Alessandro Negro
How these models are working? Because the more we are using these tools, of course, the more ethical issues are jumping out and explain ability is a key aspect that allows.
00:49:39 Dr Alessandro Negro
To judge once the machine is providing a certain type of output and and then take the right decisions, you know if it's biassed or not. This will allow them to really use the best the.
00:49:54 Dr Alessandro Negro
These tools
00:49:55 Dr Genevieve Hayes
Yeah, and avoid a data scandal in the process.
00:49:58 Dr Alessandro Negro
Of course, of course, because you know what happens then is that by for a mistake. All the processes are then considered not valuable. You know, even though you spend years and years and just.
00:50:11 Dr Alessandro Negro
Because for certain reason the system is not performing well because the data that we provided is not correct. Then you know the entire processor is through is thrown away, and this is definitely not what we want as data scientist or as a machine learning engineers.
00:50:28 Dr Genevieve Hayes
I was reading a book earlier today and one of the quotes they had in it was the author was saying that he couldn't believe the number of times he'd been asked.
00:50:39 Dr Genevieve Hayes
If the wrong data goes into a particular model, will the model still spit out the right answer?
00:50:45 Dr Alessandro Negro
Yeah well, this is especially mentioned in in my book, you know, and I like to to mention this in many of my talks you know that of course the the final quality of your model is definitely dependent on the quality of the input data.
00:51:01 Dr Alessandro Negro
Yeah, that's absolutely true, and unfortunately not all the people, even in the data scientist role. Think of the, let's say, input data at the earlier stages, and again, that's where I really see that the the value of graph can.
00:51:22 Dr Alessandro Negro
Can shine, you know? Because of course if you can look at the data from a different perspective, navigate it in a simple way. Maybe that this will.
00:51:32 Dr Alessandro Negro
Course many of us to think of the data from a different perspective. You know, before using it for, let's say, feeding complex machine learning because unfortunately machine learning.
00:51:46 Dr Alessandro Negro
As a as.
00:51:46 Dr Alessandro Negro
A generic concept is an inductive process. You know it tries to generalise from.
00:51:52 Dr Alessandro Negro
From simple data there is a this nice example in which you.
00:51:57 Dr Alessandro Negro
Know you have a.
00:51:58 Dr Alessandro Negro
Bag and you are taking out from this pennies.
00:52:02 Dr Alessandro Negro
After three run, you know three tests the the machine learning will say OK, all the the coins in the bag are penny.
00:52:11 Dr Alessandro Negro
It's because.
00:52:12 Dr Alessandro Negro
Yo, there is nothing else that it can say, but in reality is not like this. So unfortunately these data input problem should be considered more and more rather than less and less.
00:52:24 Dr Genevieve Hayes
Yeah, and just because something's happened everyday, forever still doesn't mean it will happen tomorrow. I remember I used to teach Bayesian statistics and I remember one of the questions I used to get the students to answer was what is the probability that the sun will rise tomorrow given it's risen every day since the world.
00:52:45 Dr Alessandro Negro
That's an interesting question.
00:52:49 Dr Genevieve Hayes
Suppose a data scientist who's listening to this programme got really interested in graph data science and knowledge graphs. What steps could they take to get started in this field?
00:53:00 Dr Alessandro Negro
Well, I would say that it's plenty of book in this area, so not only my book, but there are many others.
00:53:08 Dr Alessandro Negro
In which you can.
00:53:09 Dr Alessandro Negro
Yeah, you know, find a useful beginning example you know which you can just start looking at small data set and start working with the with these data set and understand the basic algorithms. For example, like I don't know page rank or community detection like levane.
00:53:29 Dr Alessandro Negro
And such, and I think at a once you started looking at the power of these.
00:53:35 Dr Alessandro Negro
Let's say tools.
00:53:38 Dr Alessandro Negro
So not only the graph that are based, but also.
00:53:40 Dr Alessandro Negro
The algorithms you will.
00:53:41 Dr Alessandro Negro
Fell in love, you know fall in love with with them and start using more and more again I don't.
00:53:47 Dr Alessandro Negro
I don't want to say that graph databases can solve all the issues, but definitely it should be part of any data scientist background. You know, knowing that there is a a third way of doing such a type of things.
00:54:03 Dr Alessandro Negro
And maybe with the time you know certain type of practises will become a sort of standard. And then let's say for certain type of applications like.
00:54:15 Dr Alessandro Negro
As I said, recommendation, for example for detection, so simple basic databases and and then you will see that you will ask for more and more.
00:54:24 Dr Genevieve Hayes
Well, one thing I found really useful when I was getting started with graph databases were the NEO 4J sandboxes.
00:54:31 Dr Alessandro Negro
Oh yes.
00:54:32 Dr Genevieve Hayes
Yeah, so these are temporary environments that you can create that have Neo flagey preload.
00:54:38 Dr Genevieve Hayes
And they come with test data and you could experiment with the various graph algorithms in them.
00:54:44 Dr Alessandro Negro
Yeah, absolutely they are plenty of examples in that sense, but also, you know if you search on this is not data sets, you know it's plenty of example of simple graph databases that you can easily import. You know they are almost in a CSV format.
00:55:05 Dr Alessandro Negro
With a clear explanation of what they.
00:55:08 Dr Alessandro Negro
Change contain and you can really import them in a easiest way. So like a run a load CSV common for example in for J and once you have the database and you have also the this graph data science library that is available with with NEO 4 J you can easily run.
00:55:28 Dr Alessandro Negro
Over 70 different algorithms.
00:55:31 Dr Alessandro Negro
On top of.
00:55:32 Dr Alessandro Negro
The of the database and and see what comes out and definitely we'll find something interesting. You know some story, let's say around the data to to tell, and that's the good start.
00:55:46 Dr Genevieve Hayes
So is there anything on your radar in the AI data and analytics space that you think is going to become important in the next three to five years?
00:55:54 Dr Alessandro Negro
Well, definitely as I mentioned there is this explainable AI that is clearly recurring more and more often. You know, because since let's say more domains are approaching the the, let's say the graph space specifically, but also the machine learning. In general, you know.
00:56:14 Dr Alessandro Negro
What questions are coming in the in this? In this sense, you know how can I explain why the machine is taking a certain type of of decision? Another very relevant aspect of it?
00:56:25 Dr Alessandro Negro
Also, what we were discussing before about these annotation process, you know with the specific goal of extracting relevant information out of text annotation is a key. Let's say step to help people to, let's say, teach to the.
00:56:46 Dr Alessandro Negro
Machine how to recognise certain type of things once you have the entities.
00:56:50 Dr Alessandro Negro
In the graph.
00:56:50 Dr Alessandro Negro
Well, you know many, many things can be can be done on top of on top of it. But unfortunately without this step clearly it will be difficult to do so again, annotation can.
00:57:05 Dr Alessandro Negro
Explainable AI I mentioned already I I see also a lot of of interest around these questioning answering system you know and in this area specifically it's not all about.
00:57:19 Dr Alessandro Negro
You know there is this new trend about.
00:57:21 Dr Alessandro Negro
The Charter GPT, in which you ask something and you get.
00:57:24 Dr Alessandro Negro
A very big paragraph that describe what you asked for, but there is another.
00:57:32 Dr Alessandro Negro
Let's say tiny.
00:57:34 Dr Alessandro Negro
Yet, area in which when you ask for a question you would like to get a precise.
00:57:40 Dr Alessandro Negro
Answer like a number like the name of a disease and not an entire paragraph, you know, and in this area specifically, graphs have a key, a key role.
00:57:50 Dr Alessandro Negro
Because what happens?
00:57:51 Dr Alessandro Negro
In many of these, let's say studies, is that what they do is to take the question and convert literally in in a query.
00:58:00 Dr Alessandro Negro
But generally in a SPARQL query or in a cypher query and then they use this query to access a graph and get literally the answer. So a set of nodes or a set of.
00:58:12 Dr Alessandro Negro
Relationships coming out from this. Let's say from this.
00:58:16 Dr Alessandro Negro
Graph so it's a.
00:58:17 Dr Alessandro Negro
Totally different type of questioning answer system because in the first time you asked to chat GPT and obtain an explanation it is cool.
00:58:25 Dr Alessandro Negro
It is fine, absolutely. I also tried it a couple of days ago and it's super, you know, fun.
00:58:33 Dr Alessandro Negro
To to to get the answers even to complex questions like what is the meaning?
00:58:37 Dr Alessandro Negro
Of the life you know we we.
00:58:38 Dr Alessandro Negro
Paid a lot. Definitely useful, but there are many, many other use cases in which you don't want to get a paragraph you would like to get an answer a number again, specific set of numbers.
00:58:50 Dr Alessandro Negro
For example, you know based on your on your question, in this case, knowledge graphs are playing a key role there because.
00:58:59 Dr Alessandro Negro
They contain information, let's say structured in a way that is not text. You know, it's like nodes and relationships, so it's much easier for the for the model to extract out of these specific answers that are.
00:59:11 Dr Alessandro Negro
Not paragraphs or.
00:59:13 Dr Genevieve Hayes
So if I asked what's the population of Australia, it would extract the keywords population in Australia, convert that into some sort of query and return 20 something million.
00:59:25 Dr Alessandro Negro
Yeah, exactly. That's the purpose of.
00:59:27 Dr Alessandro Negro
This question and answer system.
00:59:28 Dr Alessandro Negro
You know, totally different than.
00:59:30 Dr Alessandro Negro
Chat bot that of course has let's say other type of issues like keep the conversation you.
00:59:35 Dr Alessandro Negro
Know eventually keeping.
00:59:36 Dr Alessandro Negro
Context, but for in this specific case you would like to get the number you know because we have a specific question you like to know.
00:59:44 Dr Alessandro Negro
Do you know what a paragraph describing a where is Australia when it was discovered? You know you like to have a number like? OK, this is the population.
00:59:56 Dr Alessandro Negro
You know for these type of questions, the let's say chat GPT or similar type of conversational AI bot cannot be helpful.
01:00:08 Dr Genevieve Hayes
What final advice would you give to data scientists looking to create business value?
01:00:12 Dr Genevieve Hayes
From data.
01:00:14 Dr Alessandro Negro
First of all, focusing on the, let's say on.
01:00:17 Dr Alessandro Negro
The business case.
01:00:19 Dr Alessandro Negro
You know, because what the note is the.
01:00:21 Dr Alessandro Negro
In the past.
01:00:23 Dr Alessandro Negro
In the even for us is that when you look at a certain type of domain or a problem what?
01:00:28 Dr Alessandro Negro
You do first is to.
01:00:29 Dr Alessandro Negro
Collect all the data that you can and then try to get.
01:00:32 Dr Alessandro Negro
Out to some answer.
01:00:34 Dr Alessandro Negro
We should start from a totally different approach. You know we should use these crisp the let's say approach. You know that is the cross industry standard for data mining.
01:00:46 Dr Alessandro Negro
That is a very good standard. You know, even when you come to the machine learning in general the the interesting thing is.
01:00:53 Dr Alessandro Negro
That you should.
01:00:54 Dr Alessandro Negro
Always start from the from the business case, so you should you should first understand the business and the goal that this business has. So ask you OK.
01:01:03 Dr Alessandro Negro
What is the the value that they would like to get out of this? Once you have this information, you should look at the data and extract only the relevant information that you need.
01:01:12 Dr Alessandro Negro
You know the the relevant portion of this data, that is the bare minimum to accomplish the task that you would like to accomplish. It is totally different than before.
01:01:22 Dr Alessandro Negro
I mentioned already this data lake issue. It was exactly this bottom up approach which you what you do is to say OK I have this bunch of data.
01:01:30 Dr Alessandro Negro
Let's put everything together and then data scientists will do their job and it was a.
01:01:35 Dr Alessandro Negro
Nightmare, you know?
01:01:36 Dr Alessandro Negro
Because you have this data like plenty of useless data for whatever transactional data, unstructured data and this pure data scientists have to really go through this lake and find a very tiny data you know distributed across this huge.
01:01:57 Dr Alessandro Negro
Set of information. Start with the problem no and focus on the on the value that there is the solution to this problem can deliver. Then go back to the data and say OK.
01:02:09 Dr Alessandro Negro
Where where it is the the the minimum set of information that I can extract and how I can extract and use these for solving my problem.
01:02:18 Dr Alessandro Negro
Most probably it won't be enough.
01:02:20 Dr Alessandro Negro
But at least you will reach.
01:02:21 Dr Alessandro Negro
Immediately your scope and then you can reiterate and for example, extend the set of data that you're using, or verifying that the results are correct.
01:02:30 Dr Alessandro Negro
And iterate again and again and again and finally you will get the result much faster than starting instead looking first for your data and spending 80% of your time just cleaning.
01:02:39 Dr Alessandro Negro
Things that you don't care about, so that's my personal suggestion.
01:02:43 Dr Genevieve Hayes
Basically, you're better off trying to fish in a barrel rather than trying to fish in the whole Pacific Ocean.
01:02:50 Dr Alessandro Negro
Absolutely yes.
01:02:51 Dr Genevieve Hayes
So that's about all we've got time for today. Alessandro for listeners who want to learn more about you or get in contact, what can they do?
01:03:01 Dr Alessandro Negro
Well, definitely they can send me an email at alessandro@grabber.com or they can search for my name on LinkedIn or just on Google because it's plenty of my talks and use my book if they want or my books now. I mean still the second one is on me, which means that it's still not fully available.
01:03:22 Dr Alessandro Negro
And you know, it's plenty of reference for them to learn. And if they have any any questions they can reach out to me through LinkedIn or through email or whatever other mean they they prefer. Even come to lecture and visit me in the office.
01:03:40 Dr Genevieve Hayes
And I'll put a link to your LinkedIn page and to your books in the show notes.
01:03:44 Dr Alessandro Negro
Thank you for that.
01:03:46 Dr Genevieve Hayes
So thanks very much for joining me here today, Alessandro.
01:03:51 Dr Alessandro Negro
Thank you for inviting me again. It was a great pleasure speaking with you. Definitely a lot of interesting questions.
01:03:57 Dr Genevieve Hayes
And for those in the audience, thank you for listening. I'm doctor Genevieve Hayes and this has been valued. Riven data science brought to you by Genevieve Hayes Consulting.

Episode 15: Graph-Powered Data Science
Broadcast by