Episode 46: Empowering Democracy with LLMs
Download MP3[00:00:00] Dr Genevieve Hayes: Hello and welcome to Value Driven Data Science brought to you by Genevieve Hayes Consulting. I'm Dr. Genevieve Hayes and today I'm joined by Vikram Oberoi to discuss how he is using LLMs to improve the accessibility of city council meetings in New York City.
[00:00:17] Vikram is a software engineer, fractional CTO and co owner of Baxter HQ, a boutique early stage tech product development firm. He also built and operates citymeetings. nyc, an LLM powered tool to make New York City Council meetings accessible. Vikram, welcome to the show.
[00:00:39] Vikram Oberoi: Thank you. Thanks for having me.
[00:00:40] Dr Genevieve Hayes: With all the reports about the spread of misinformation and disinformation on social media, sometimes it feels like one of the biggest threats to democracy is technology, but no technology is inherently good or bad.
[00:00:53] It's how you use it that matters. And just as technology has the potential to harm democracy, it also has the potential to enhance it. Now, probably the most talked about technology of the past 18 months has been generative AI and large language models, or LLMs. And Vikram, you've recently been using these technologies to enhance people's access to New York City Council meetings through your work on citymeetings.
[00:01:23] nyc. We're going to spend a lot of time talking about the more technical aspects of that project in this episode. However, to begin with, Can you give us an overview of what citymeetings. nyc is from a non technical perspective? So for someone who's not a data scientist or software engineer.
[00:01:45] Vikram Oberoi: Sure. So city meetings that NYC is a site that will break, very long municipal meetings. These are on the order of. 2 to 10 hour long New York City council meetings, and I break them up into granular, useful chunks that are generally no more than 5 minutes long.
[00:02:03] And each of these chunks is, accompanied with a title, a summary that faithfully represents what was stated on the record in that part of the meeting. That segment will generally cover, 1 topic or 1 Q and a section or standalone remarks by a council member or a public testimony that someone gave and it's also accompanied with the video right there so that viewers can, if they want to just dig in and watch the video for themselves or read the transcript.
[00:02:31] And so that's what city meetings.nyc is. I think the other thing to add is that all of these are linkable. So for the first time ever in New York City, you can link to a specific portion or moment in a New York City council meeting that may have taken about like one to 10 hours,
[00:02:45] Dr Genevieve Hayes: So previously, if you wanted to explore the New York City Council meetings, did you have to read through 10 hours worth of transcripts or watch 10 hours worth of video footage?
[00:02:54] Vikram Oberoi: correct? Yeah. The only two ways that you can do that. Well, three ways. One, you could attend it in person, which, reporters do and have done in the past. There's a lot less budget for that today. A lot of local news is dying out. But you can also watch the videos online on the New York City Council website.
[00:03:11] So you can download 2 to 10 hour videos that you don't have to watch, or you can access the transcripts, which come out, a few days to sometimes a couple of weeks late. And those transcripts are very high quality, but they are hundreds of pages long.
[00:03:24] Dr Genevieve Hayes: And even if you're a reporter, you're not going to go through hundreds of pages of every single city council meeting. Right. Right.
[00:03:29] Vikram Oberoi: Yeah, it's, it's impossible like during budget season in New York City. That's when the longest meetings occur. These are routinely 5 to 10 hours long. There's a lot of public testimony at these meetings, a lot of council questioning. Probably the longest week this week was like 20 of those meetings and there's no way.
[00:03:47] That a reporter would be able to cover.
[00:03:49] Dr Genevieve Hayes: One thing I've taken to doing recently is because the media has become so skewed one way or the other, for anything important, I try and watch, the actual event, so an important press conference, for example, but it sounds like that it would be impossible to do it for these New York City council meetings. What the website is providing is an opportunity for someone like me who wants to see the actual event , to just see the good bits.
[00:04:20] Vikram Oberoi: Yeah, absolutely. That's exactly what I'm trying to do. I am a huge fan of going back to primary source material when possible. It's not like what transpires at a city council meeting is particularly difficult to understand, it's all very easy to access if you can see it and get to the point that matters to you.
[00:04:34] And what I'm trying to do is get that part. Yeah,
[00:04:38] Dr Genevieve Hayes: I find is now that I've started focusing on primary sources, you realize that often the press will be focusing on something that isn't even the most important part of what actually transpired. And they'll leave out the bits that I think are the most important parts.
[00:04:52] Vikram Oberoi: I find that to be the case too. Like, there are many things that are covered in New York City. Like all the major stories will get covered here the major, like, as an example, last year, there was a slate of bills related to boosting transparency for the NYPD, and there was one particular bill there that was very controversial called how many stops act that the mayor vetoed.
[00:05:14] And then the city council overwrote but in that whole process and in that coverage, there was actually very little coverage over the rest of that package. And that the rest of that package is really interesting. We are going to receive over the course of this year and next year, a whole slew of reports around NYPD actions and performance that we did not have before, which are all great.
[00:05:38] And like a lot of these day to day kind of routine. Useful things that the city council does they end up getting under covered. That's a word, but yeah, a lot of things just don't go reported. And I know that some of these things are like niche issues, but they are things that the city council is working on that are elected are working on.
[00:05:57] It's I think I think it's useful to be able to look at that.
[00:06:00] Dr Genevieve Hayes: So who is the target audience for CityMeetings dot NYC? So CityMeetings dot
[00:06:05] Vikram Oberoi: so engaged citizens are 1 folks who do want to look at this. And I think New York City has a lot of these. I'm one of them. I also am part of a housing advocacy organization, so there's a bunch of folks in there. And I know folks in my community who care about what goes on at these meetings and in other sort of governing bodies.
[00:06:23] But then the other audiences are journalists. Like, I'd like journalists to start using this. There's a newsroom who linked to citymeetings. nyc for the 1st time as a source. And I completely encourage, that kind of use case. This is meant to be an authoritative source of, like, primary source material for stuff like this.
[00:06:41] That was very exciting for me. Journalists have reached out and have asked me for features, which I have not yet fulfilled, but I plan to. Folks on my newsletter, there are only about 150 people on it today, but I've done, virtually no marketing. But the audience on that newsletter is quite broad.
[00:06:57] There are folks, there are, attorneys on it real estate and land use attorneys, especially there are advocacy groups of all kinds on it. There are lobbying groups on it. There are also former city council candidates government employees who want to follow this sort of thing. All of those collectively, I think are are the audience.
[00:07:18] Dr Genevieve Hayes: NYC, it started life as a side project for you. A tool for researching government meetings is an unusual side project for a software engineer. And looking at some of your previous side projects that you have listed on your website for example you built the video game Landlubbers, which I had a good time playing around with in preparation for this meeting.
[00:07:41] It seems unusual even for you. What got you interested in democracy and city council meetings in the first place?
[00:07:50] Vikram Oberoi: So I actually became a citizen recently, like in late 2020 I became a U. S. citizen. And ever since then, I have, paid a little bit more attention to politics, especially local politics around things that I care about. And They've gotten sort of steadily more interested and, how our city runs and how these decisions are made.
[00:08:10] And even before I started working on city meetings, I had already every now then gone to like the New York City Council website to look at a new piece of legislation, like, like you do, like you go and look at that primary source material. I would too. And so I already had sort of this, this interest.
[00:08:24] The other thing that happened was that when like GPT 4 hit the scene, every time a new, Language model came out. The 1st thing that I would do is see how effectively I could apply it to public records because the government puts out so much information. New York City in particular is exceptional at this.
[00:08:43] We have a great open data law and program. We have a very open city council. We have the open meetings law that requires that anyone can attend a meeting and. I don't know what the requirements were on the recording, but they encourage it. And so there are a lot of recorded local government body meetings.
[00:08:59] And so there's so much out there, and it's all like transparent in name, which is great. It's a good start. But in reality, it's impossible to like, understand what's going on because there's too much. And so every time a new language model would come I was like, huh, I wonder how well I can apply it to this use case of making some of this stuff more accessible.
[00:09:18] And initially it was just for me. I was like, let me just like write some API calls and like throw in some context and see how this works. Gb2 for had to the initial release was like had too small a context window and it was too expensive. But come December, GPT 4 Turbo came out. And that was vastly cheaper, had a longer context window and that is what I built the first version of Sydney Meetings on.
[00:09:41] And so yeah, sorry to answer your question, it started with this interest and with this specific, interest in also using LLMs around public data. So, it began as a newsletter. Where I was trying to look at all of the city council activity that I could find bills that were passed or introduced the meetings I would try and like, write something useful based off of that information every week.
[00:10:01] What I realized in that process were really 2 things. 1. There's a much broader context that the city council. Operates in that I didn't have looking at all those documents. And, journalists go and, they'll look at the context around a bill that was passed and, what the history is around it, or what the communities thought about, whatever.
[00:10:18] That's not something you can glean from, public records documents alone and using LLMs. So I wasn't able to, do quite as good a job as I wanted to. But I also realized that it was virtually impossible to cover municipal meetings at which a lot of decisions are made. And a lot of statements are made on the record by not just elected, but many groups in the city who come and advocate for something on someone's back.
[00:10:45] And so that's when I decided I would pivot and say, you know what, let me just try and see if I can make. Meetings accessible and that's where city meeting started.
[00:10:54] Dr Genevieve Hayes: Are you going to do updated versions of citymeetings. nyc that cover other cities other than New York City?
[00:11:01] Vikram Oberoi: I have thought about it and there are newsrooms that have approached me to bring this to their municipalities. I would like to like that's something that I'm definitely considering. And it may happen in the future. There are a lot of challenges to being able to do that where what I would like to do 1st is test out, like, feasibility and how to make this sustainable in New York City alone.
[00:11:23] New York City has all the ingredients for me to test it. It has a wide set of, governing. Bodies that all the way from state to city council to even more smaller, local governing bodies in the city that are quite heterogeneous in, the content that they cover as well as how procedural and formal they are.
[00:11:43] So it will give me a good sense of how well I can cover all these different types of meetings before I go spread out to other cities. It also. Because New York is the largest city in the U. S. and so many people in media and advocacy and everything are here. I can test out ways to make it sustainable here before going.
[00:11:59] I'm not close to the idea. In fact, I would like to. I just don't think it's feasible right now and I want to be able to test out more in New York City alone before I go do that.
[00:12:05] Dr Genevieve Hayes: So even though I do find democracy and government fascinating, and I can see a lot of benefits to the tool from an accessibility point of view, what I believe the listeners of this show are most interested in is the AI aspects of this project.
[00:12:22] So how does citymeetings. nyc work under the hood?
[00:12:26] Vikram Oberoi: So I gave a talk at a civic tech conference in New York back in late March, and I published all the annotated slides for that talk. In that talk, I walk through in great detail how city meetings works under the hood. Some of the details there are stale. So we can talk a little bit more about, like, what my new approaches look like.
[00:12:45] But at a high level, the way city meetings works, I follow the same process for every meeting. First, I will transcribe the meeting, and in particular, I will get a diarized transcript, which means not only will I get, what words were stated, I'll get speaker labels that state speaker one spoke these words, speaker two spoke these words which gives me some ingredients to be able to identify who spoke those things.
[00:13:06] So after getting the diarized transcript. I will use a bunch of AI prompts to identify each of the speakers in that. So I want to know that speaker 1 was council member Ayala, and speaker 2 was council member Ressler, not speaker 1 and speaker 2. So I have a bunch of prompts that I published that you can see that I use to do that now.
[00:13:26] That actually hasn't undergone much change in the last several months. Once I do that, I review the speakers. So I have an interface that allows me to take a look and make sure that they're correct because it can get things wrong or it can hallucinate or transcription errors can propagate and cause issues with like the speaker names or organizations that I've extracted.
[00:13:44] The next step is I will extract all the chapters and chapter extraction is the process that has undergone the most iteration and has been by far the most challenging part of this project. I'll have this, like, a chapter extraction process. And then I review the chapters as well, because.
[00:14:02] There might be issues there. There might be errors in the transcript that propagate to the titles or summaries, or there might be hallucinations. So I want to make sure that those are right. I try to review all of it. Things might,, fall through the cracks sometimes, but I put in a good faith effort to make sure that the information in those summaries faithfully represents, things that were stated on the record.
[00:14:22] And then after that review, I'll write a summary, also using AI, and I'll review that. That takes very little time. And then I'll publish it and all that content is what you see when you click on a meeting today on city meetings.
[00:14:33] Dr Genevieve Hayes: What do you mean by a chapter when you talk about chapter
[00:14:36] Vikram Oberoi: Oh, good question. Chapter is the word that I use to describe this. There might be another one, but are you familiar with YouTube chapters?
[00:14:42] Dr Genevieve Hayes: yeah, I think so. They're the sort of the breaks that you can see throughout a
[00:14:46] Vikram Oberoi: exactly. That's exactly what they are. Yeah. So I think YouTube call those chapters. So I've sort of borrowed that terminology. YouTube will extract chapters or, the creators of videos can throw in their own chapters if they want. But those chapters will help people navigate to specific parts of a video that are relevant to them.
[00:15:04] YouTube's version of what, a good chapter looks like, or a creator's version of what a good chapter looks like for their video is actually quite different from, what I want a good chapter to be. The idea is the same in that it's a chunk that is, semantically coherent in some way that covers a specific topic or event.
[00:15:21] But in my case, they're a lot shorter, and I want them to cover distinct moments in a meeting that are based off of the procedures that they follow or for topics that occur. For example, when there is a section in which, members of the public are giving testimony, I want chapters that cover someone's testimony.
[00:15:38] I don't want a chapter that says, hey, here's a bunch of public testimony, here's a bunch of highlights. That's not actually useful. What I want to see is a much more granular version.
[00:15:45] Dr Genevieve Hayes: So it'd be person one's testimony and then person two's testimony, et cetera.
[00:15:50] Vikram Oberoi: Correct. Yeah. And then for, council questioning sections of a meeting where, after a member of an agency gives some testimony typically there's a long section at city council hearings where council members will take turns. I will. Generally extract 2 different types of chapters.
[00:16:08] 1 is a generic Q and a section which is an exchange between a council member and the agency member or it's what I call remarks, which are just standalone remarks by a council member. I want to be able to capture those, and I think the reason that I've settled on that is because I want to focus on being able to understand what the council members are doing as much as possible and the exchanges give context around the topic.
[00:16:31] So, I want to be able to eventually show, like. hey, council member, Crystal Hudson, my council member, like, here's all the stuff that she's said. And, in the future, you could use that as a way to, understand her stance on various issues in a much more granular way than you get from, voter guides, for example.
[00:16:48] Anyway, that's what I mean by chapter, and that's kind of my intent with chapters.
[00:16:51] Dr Genevieve Hayes: So they're intended from the point of view of helping people to understand what's going on rather than to do with the chunks that you're feeding into the LLM models.
[00:17:01] Vikram Oberoi: Correct. I would think about that as, like, two distinct things. When I say chapter, it is the user facing, granular, section of a meeting that covers a topic or a specific event that, someone might care about. In terms of what I feed the LLMs. I would just call those chunks and chunking more generally.
[00:17:16] Dr Genevieve Hayes: Right. So the, each chapter would be divided into multiple chunks,
[00:17:20] Vikram Oberoi: each chapter is not divided in chat that when a user goes to city meetings on NYC, and they click on a on a meeting there's a list of chapters that they can go look at. They can just scroll down and each chapter is a title, the video, and the summary. It's actually a start time and end time. The chapters aren't broken down any further into chunks.
[00:17:38] The chapter is like, as small as it gets.
[00:17:41] Dr Genevieve Hayes: Okay, so even when you're putting it into the LLM model, you don't Break down further to get chunks in.
[00:17:48] Vikram Oberoi: Oh, sorry. I break down the transcripts a great deal in order to extract useful chapters. Once I've extracted useful chapters, I don't break those down any further. Like, I do any of my, all my segmenting and chunking in service of. Creating useful chapters, which generally are like, no more than 5 minutes long is more like a semantics and terminology.
[00:18:10] I feel like I've explained this a few times a few different people. And like, I use a lot of different words, like sections and chunks and chapters, and it's really hard and easy to mix those. So let's think about chapters as like. User facing, like the thing you see on city meetings. And let's think about chunks as like what I give the LLM, and then I may use the word section.
[00:18:31] And when I say section, I mean a section of a government meeting in which one thing happens. So just for future reference, I think that might help the conversation.
[00:18:40] Dr Genevieve Hayes: Okay. Got it. And the other thing you kept mentioning when you were describing how citymeetings. nyc works was you kept on bringing up the issue of hallucinations, and this is something I was wondering about because, I mean, we all know, ChatGPT hallucinates, as do all LLMs, and whereas this isn't going to be a big deal if you're asking it to draft an email for you.
[00:19:06] I would imagine that this could be a real deal breaker when you're dealing with something like city council meetings. How do you deal with this? How do you make sure it doesn't just put words into a council member's mouth that they never said?
[00:19:22] Vikram Oberoi: Yeah. So the reality right now is that I actually just review everything which is not really skill. But I do see a path to making it scalable, and I'll talk about that in a sec. But the issue right now is that , after I've generated chapters, I will go and read 100 of them, then sometimes there are about 100 to 200 chapters in these.
[00:19:39] They're pretty quick to review. I have tools that allow me to see the chapters side by side. And , I will try and skim as much as possible to make it fast. But yeah, that is not scalable. I do think there's a way to make this scalable in the long run, and here's how. When I review these chapters, I do note the classes of errors or hallucinations.
[00:20:00] I will log those errors, and I generally find three classes of errors when I'm looking at a summary or a title. The most common error is one where it's actually a transcription error and not a hallucination, but a transcription error has propagated to the chapter summaries and, like, Maybe the name of a program that they're talking about or an agency is just misspelled because the like transcription model has not heard it before.
[00:20:30] So a really common one that I see a common error with with the transcripts that I get the transcription service I use has never heard of NYCHA, which is the New York City Housing Authority. It's a public housing agency here in New York City. And in those meetings, instead of NYCHA, I will see the word Nitro.
[00:20:46] And I T R O littered everywhere. And that can lead to just not hallucinations, but they're not talking about nitro, but they're talking about things related to nitro, whatever that actually means. I mean, they might also just have nitro in the summaries, but that's like error one, and that is actually relatively easy to fix by making sure that you analyze the meeting up front.
[00:21:05] And every time you extra chapter and say, like, here's all the authorities that are mentioned at this meeting is here. Here's how they're spelled. And here's what the miss transcriptions look like. LMS are actually really good at just generating.
[00:21:15] So I do that 1st. So that's class a class B is. In some, the summary has, it does not make sense. It has hallucinated something in a sense, but the summary doesn't make sense because it is referring. To things. in the context that I provided it, but it's not really an accurate summary of the chapter.
[00:21:35] So as an example, sometimes I will provide not only the section of transcript is the chapter, I will also provide preceding sections of the transcript, or preceding chapters, because those are relevant context. In order to get a useful chapter. So if you're talking about a rezoning proposal at will, it's a point where they're going to build a soccer field, a public testimony doesn't say, well, it's point zoning.
[00:22:01] All they say is I love soccer and I support it. If I don't give it all that context, it'll say this chapter is about how much someone loves soccer. It's not, they won't say this is about support. This person is in favor of the Willis Poinsonic proposal because they like soccer. However, the downside of this is it will cause potential hallucinations.
[00:22:20] We'll look at previous parts of the context and give summaries that are irrelevant to the chapter there. This turns out to be the most common form of hallucination or error that I don't even know if you'd call this hallucination, but like, that's sort of in my mind kind of is. And then the third one, which is an outright hallucination where, like, I never gave it this information and it just spat out something in its model.
[00:22:43] This is actually very rare. This does not happen much. It used to happen a lot more, like, eight months ago. I use CloudSign at 3. 5 exclusively today. I think that the Frontier models have gotten very good at referring As much as possible, the context that it's been provided. So, , if you give it context, it uses those guardrails and answer as much.
[00:23:04] And so I, I actually very rarely see that error. What I do see is errors of class A and B, which I described earlier much more.
[00:23:11] Dr Genevieve Hayes: Yeah, I think it's interesting what you just said about your class C errors, because that's something I've noticed. When ChatGPT was first released, there were so many hallucinations in that. And now, if I'm trying to create an example of a hallucination, I actually struggle to do it.
[00:23:29] Vikram Oberoi: Yeah I struggle to do that too. The frontier models have gotten very good. Like, I don't think that I run into this error as much. The one thing I did not talk about was like, given these errors, How do I more scalably deal with it? Like, I, it's not possible to do this. I'm reading 200 chapters for every meeting.
[00:23:44] It's not possible to scale. The way to do this is to start bringing LLMs into the process, which adds its own, potential errors, because those are LLMs as judges are going to be wrong sometimes too, because they are unreliable. However, if I'm going through this process and I am classifying all these errors as I go, I'm also building a data set around being able to classify a specific type of error.
[00:24:07] I've got some input data, which is here's the transcript in the context that I gave the language model. Here's the summary that came out. And here is my assessment. Of whether or not this is good. So the more that I actually do that, the more data that I've created in order for an LLM to do. I have not implemented these LLM reviews yet.
[00:24:28] But I am collecting the data so that when I get there, one, I can write a prompt with sufficient examples. And I have a dataset that allows me to test that prompt over time.
[00:24:38] Dr Genevieve Hayes: That's interesting. I was actually reading about a product on the internet that it advertises itself as basically being a tool that allows you to check for hallucinations. And I wasn't sure how that worked under the hood, but it sounds like what you've described is going to be very similar.
[00:24:58] Vikram Oberoi: I don't know what that product does. And I guess in my context. Where class A errors and class B errors, as I've described them I'm actually not even sure if people would count those as necessarily hallucination errors, like if the transcript, the context is wrong, but I guess class A is the context is wrong.
[00:25:15] The transcript is just incorrect. And even humans might get that wrong. So I don't even count that as a hallucination. Class B, I feel like is the sort of hallucination where it's like, I told you to look at the transcript, but you looked at a bunch of other stuff instead. Anyway, I don't know what this product does but like this process of aligning an LLM to your specific preferences is like a pattern that a lot of developers who use LLMs.
[00:25:39] I've talked about and are using, there's a really prominent example that I read about at a company called honeycomb and observability company where they built a feature that allows people to write the query that they want to generate. Like, I want to see. Requests that took over this many milliseconds to this endpoint or whatever, and it will generate the honeycomb query for you and then execute.
[00:26:05] And I read this post where they talk about how they got this to work as well as possible. So there's like one engineer on the team in particular who had helped develop this And as part of the process there was a spreadsheet with a bunch of queries with a bunch of like, you know, prose versions, then the generated query, and then he would like comment on it to say whether or not it was good or bad.
[00:26:30] And that column in particular, and those comments allowed them to align the LM to turn it into a reviewer. That was as close as possible to this engineer. We didn't have time to go and review every single came in. So they, I think, actually use this reviewer in production, but there's reviewers aligned as closely as possible to this.
[00:26:51] Senior engineers version of how they might assess a query that's been generated by the system. That's kind of similar to what I might do.
[00:26:59] Dr Genevieve Hayes: So it's sort of like how reinforcement learning with human feedback is used in increasing the accuracy of these models. And then they're using that again in this honeycomb tool that you just described, but then they're building a supervised learning model from the human feedback because the humans aren't scalable.
[00:27:20] Vikram Oberoi: Yes, I think that's basically what it is. I'm actually not super familiar with what happens in reinforcement learning with human feedback. I gather that it might be very similar. But yeah, that's, that's probably what they're doing.
[00:27:32] Dr Genevieve Hayes: Honestly, I've never actually seen it done, but my understanding of it is that it's basically the equivalent of. judging Olympic gymnastics. So it's, you know, is this good or
[00:27:43] Vikram Oberoi: There's like some subjectivity. Is this good or bad? Or here's a score. Yeah. Yeah, that's exactly what this is. Like you, yeah, you create all those responses. You do a bunch of legwork to create a hundred different responses or a thousand different responses. And that allows you to scale that.
[00:27:59] So in my case, a huge bottleneck will end up being, I can't review all this stuff. So let me build out the reviewer. And that's what that process looks like.
[00:28:07] Dr Genevieve Hayes: And if you expanded to cities beyond New York City, you'd definitely not be able to review it.
[00:28:11] Vikram Oberoi: Yeah, definitely would not be able to do it. And in all likelihood, I'd probably need to have, folks who are in those municipalities who understand the issues. I'm sure there'll be versions of this review that need to be a little bit more local. Especially like, in smaller municipalities where maybe a lot of concepts aren't in the language model itself, like New York City is big enough that most of the time the language model knows, but what the person is talking about.
[00:28:32] Dr Genevieve Hayes: So you mentioned previously that you're using Claude Sonnet 3. 5 under the hood. Why Claude and why not GPT, for example?
[00:28:41] Vikram Oberoi: So I've used a GPT four turbo initially I've used most of this year. And I switched over to Claude's Claude 3 when it came out to test it out for my specific tasks. It didn't work substantially better, so I just kept using GPT 4 Turbo. But then when I used 3. 5 Sonnet, like, there was a step change in how accurate.
[00:29:06] My chapter extraction process was, and now I use it exclusively. My code allows me to change these models pretty easily. And the other thing I had to do as kind of a little bit of work is like, whenever you do change the models, the underlying models that you use, there's an amount of like prompt engineering that you do need to do, for example, Cloud models typically like for, context to be wrapped.
[00:29:27] In XML tags, like, that just is friendlier there. And so you have to make sure that the context is wrapped in those and you've got to do some amount of prompt engineering per model. So it's not like trivial to change it all the time. But sauna 3. 5 was consistently so much better. I haven't measured it, but like it just anecdotally, I've added a lot of meetings, city meetings and use my tooling to do this.
[00:29:46] And so anecdotally, it is vastly better. I also, I think the other reason I've started using cloud is that I love it's writing way more. I like it. I don't love it's right. I like it's writing way more than I do open as a model writing. I think Claude is a better writer than any of opening as a model writer.
[00:30:03] Dr Genevieve Hayes: I would agree with that. Yeah. About a month ago, I saw Three webinars, one after the other where people in them were saying, Claude's fantastic. I'm using this instead of GPT. And so I thought, okay, I'll give it a go. And so I started asking questions of both Claude and Chat GPT, and I think I did Gemini perspective and Pie all at once and just, you know, asked exactly the same questions and saw what do they produce and compared it.
[00:30:31] And Claude's responses are just so much better
[00:30:35] Vikram Oberoi: Yeah, they, they are they are, they're much better. They're more consistent. That, that is a great reason. That's one of the big reasons that I want to use lots of anthropic models.
[00:30:43] Dr Genevieve Hayes: and, and Claude actually sounds like a human wrote it, whereas. ChatGPT sounds like ChatGPT wrote it.
[00:30:50] Vikram Oberoi: Totally. There are so many hallmarks of a chat. I actually can't describe what those hallmarks are to you. I think they're generally going to be kind of verbose. They might have specific words in there. Delve is a really common one. I think there are other sort of markers of ChatGPT having written something, but I find that if there's some writing and it was written by ChatGPT, I will almost certainly identify it as having been written.
[00:31:10] Dr Genevieve Hayes: Do you use any other generative AI models in citymeetings. nyc or is it just Claude?
[00:31:16] Vikram Oberoi: I use Claude and I'll probably switch over to OpenAI's models here and there. I have tried using Gemini. And I just honestly, the reason I don't use it is I haven't had enough time to go and really like test it out properly. I find that the response times are slower. And it's not like significantly better in such a way.
[00:31:36] It's not so like obviously better that I should use it. However, there are tasks in. Like the chapter extraction process. Now that will benefit from Gemini's massive context window. So Gemini pro has a 2 million token context when Gemini flash has a 1 million token context window. And in my case, all my transcripts are about 50, 000 to 100, 000 tokens.
[00:31:59] I have a part of my chapter extraction process now. This wasn't something I used to do, but now I will spend 5 minutes giving the AI a leg up by determining a few sections of the meeting by saying, Hey, here's where opening remarks are. Here's where the agency testimony is. Here's where council questioning is.
[00:32:17] And here's where public testimony for 4 sessions takes about 5 minutes to go and do that with the tools that I built. And then everything else. Works much better just because I've given it those hints now I can probably use Gemini to generate those sections with enough examples for there to be decent performance, the big issue with smaller context windows when you have lots of context, generally, like, if I have a 60, 000 token transcript, and I want a process to work accurately on a 60, 000 token transcript, When I have 120, 000 token window, I don't have a lot of space to give examples of the task being done well.
[00:32:53] And I find that if I'm adding five to 10, diverse illustrative examples of the task, I always get way better and more consistent performance. So, a million token context window and potentially allow me to do that. I haven't tested it. I have a little skeptical that it'll work because so far when I've fed really large context to language models so far, the performance has dropped considerably, but I honestly haven't tested this with Gemini yet.
[00:33:19] And I'd like to do that, but I can see a version would, I would use Gemini for context window length alone.
[00:33:25] Dr Genevieve Hayes: So just use Gemini for your pre processing and then Claude for the real work.
[00:33:31] Vikram Oberoi: Correct.
[00:33:31] Dr Genevieve Hayes: Don't you love it that we're in this point in history where we're having a conversation about using multiple different LLMs to perform a task.
[00:33:40] Vikram Oberoi: It is wild to me. I could not have, like, conceived of this, a year and a half ago. I just, it was, it is like, Yeah, I could not have conceived of this a year and a half ago. I couldn't have really even conceived of it a year ago, I think. I was still kind of getting my feet wet and trying to understand, what the limitations are, the capabilities and constraints there.
[00:33:56] But yeah, it's crazy. LLMs generally, they've, transformed my work in such a way that, , I am unwilling to go back. I am able to produce a lot more. I'm able to pursue more ambitious ideas. City meetings, not just the product itself, but what I've shipped would simply not have been possible if I didn't have language models helping me virtually every step of the way.
[00:34:15] Dr Genevieve Hayes: One thing I find interesting is, when Chachapiti was first released, everyone was using it as a toy, you know write a poem in the style of Dr. Seuss about cleaning my teeth or something stupid like that. It was garbage, but people have without even being told have instinctively figured out all these use cases.
[00:34:37] And a lot of people say, Oh, it's just a bubble and it'll go away. It'll Because, people are using it as a toy and it's like, are you actually paying attention? People are using this to make themselves better at their work without any formal training. This is going to stick.
[00:34:54] It's not a fad.
[00:34:56] Vikram Oberoi: I totally agree with you. I don't think LLMs are going away anytime soon. There probably is a bit of a bubble. Like it feels way overhyped. Many, many time. I also think that there is a lot of work that you need to do in order to wield them effectively, like you need to fail a lot, you need to budget time to like, by default, decide to use it for a task, even if you think it might fail, if you want to get good at using language models, there's like a lot of kind of an uphill battle for people to be able to adopt them.
[00:35:24] And I think when people have this expectation, Based off of all the hype, that's like, language models will do all your work for you, and they use it for the first time, and it does something dumb, because language models do really dumb things, like, frequently, and they decide they don't want to because they think they're stupid.
[00:35:39] like, I think that a lot of people are approaching language models Trying to use them, their expectation based off of the massive hype does not meet their immediate reality and they bail. And I think that's a big mistake. I think a lot of people are missing out on a huge opportunity because of that.
[00:35:55] Dr Genevieve Hayes: Yeah, I think it's the people who are using the LLMs. To basically cheat and do their work a hundred percent for them that are getting bad results. For example, the student who goes in and says here's my essay topic, write me a whole essay on, I don't know, Hamlet's motivations.
[00:36:14] But you have to just use them as an assistant. And when you use them as an assistant and have all these conversations back and forth, then you can produce. incredible work, incredibly fast that you wouldn't be able to do otherwise.
[00:36:28] Vikram Oberoi: Totally. And like, the thing is, there's no formal training on this. Like, folks kind of have to come to that discovery on their own unless they have someone who's a little bit more experienced who's discovered that thing next to them. Or they're, reading a lot of stuff about it. Yeah, I agree.
[00:36:40] I think there's the core skill of being able to decompose a problem so that it is manageable by the AI is like that still needs to happen. And yeah, if you don't do that, it doesn't work.
[00:36:51] Dr Genevieve Hayes: What really helped me was I saw a webinar where someone was demonstrating how they used it. And once I could see how they used it and their use case was completely different from mine. That was when it's like, Oh, right. I could use it this way. And then once that clicked in my mind, it's like, Oh yeah, I was off.
[00:37:07] Vikram Oberoi: Yeah. Yeah. There are a couple of use cases that I've seen that have been, I saw a video on I use a code editor called cursor. Are you familiar with it?
[00:37:15] Dr Genevieve Hayes: Yeah.
[00:37:16] Vikram Oberoi: Yeah. So I use cursor extensively. I use AI tools for virtually everything. I even have like configured new sections of meetings and new prompts entirely using cursor prompts as well.
[00:37:28] And so I've used it extensively and it really dawned on me how I have been under utilizing some of its features when I just like, Saw a video on Twitter of someone demonstrating it in their use case. That's been really helpful. I have an anecdote actually about I think who the person that might be the most skilled prompt engineer I know, who is a trainer at the gym that I go to, and it's also an influencer.
[00:37:51] And she has done like incredibly impressive work with her chat GPT so let's call this person Betty so that I don't, you know, out to them as a trainer influencer who uses AI for actually a lot of good work.
[00:38:05] So Betty has like a 200, 000 person following up on Instagram and she has a bunch of videos where she walks around the city and she's got this like brand of hers where she talks about living life in New York City and loving when she's in Italy. And so she talks about that too.
[00:38:19] I can't sorry, I'm doing a poor articulation of her brand, but she has 1. and she has articulated that in her, prompts and she's added, kind of her personality and her brand to sort of the system. And the way that her business works is she will find advertisers sponsors to sponsor specific pieces of content that go out to about 200, 000 followers, and that requires a bunch of lead generation on her part.
[00:38:41] And the way that she does that now is she takes her articulation of her brand. Then she goes and she looks at, let's say Bose speakers. Let's say she wants to pitch to them. So she'll look at like Bose brand guidelines. She'll find copy from a recent ad. She'll find copy from their website. And then she'll ask Chachi BT to write a description of what that brand is and how that works.
[00:39:04] So she's kind of decompose these two problems. She's like, I'm going to describe these two brands. As well as I can, as quickly as I can, Betty's brand is you know she's written in chat GPT and she's got this, summary of this Bose brand. Then she'll take that and she'll use those to generate an email that is a catered pitch to Bose, where she says, Hey, Bose, I know you're the brand manager.
[00:39:25] I'm an influencer. I have this many people in my following. Here's my pitch for a video that I can do sponsoring Bose that accounts for your brand guidelines that you've talked about all over your website and sort of my videos. And she'll, write whole storyboards and she'll, paint a very clear picture and she's able to do this.
[00:39:44] Process very fast. So she's able to send way more emails out, but she also gets a higher hit rate because her emails are illustrative of her work. Like you can paint a clear picture and that kind of work is very difficult to do at the scale that you see. But that, I know like lead gen is a really common Use case for AI right now, but like Betty's case, she has iterated on these prompts for.
[00:40:11] A month, 2 months at this point to get them to a point where they work as consistently as possible. And we chatted about, all of our prompt engineering strategies, and they kind of boil down to the same things that I do for city meetings as well. And, it's just awesome. I love, seeing people out in the wild who have, really, dived in and figure out how to use them.
[00:40:30] Well, yeah.
[00:40:31] Dr Genevieve Hayes: Are there any tips that you can give our audience about how to write effective prompts?
[00:40:35] Vikram Oberoi: Yes. In fact, I have a section in that talk that I gave in March. It's on my blog. I have a section where I give folks a crash course on writing effective prompts. And really I give them two specific tips with examples. So tip one is. You need to say what you actually want, and that can sometimes be hard because when you're starting out with a task, you may not actually know what you want the end result to be. And , a good way to start with a prompt when you don't know precisely what you want is to use a proxy.
[00:41:03] An example of a proxy would be, explain this to me like I'm five. This is a great proxy that encodes a lot of stuff in that one sentence. It encodes, the vocabulary that should be used in the answer. It encodes how long the answer should be. It encodes kind of the level of abstraction in which something should be explained.
[00:41:21] And so I have a bunch of proxies that I reach out to for different kinds of tasks, like a proxy that I use for summarization really commonly is to say write this in the style of an Axios article. So Axios has all these hallmarks of how they write these articles.
[00:41:37] There's bullet points with bold headings. There's always sort of like the bottom line or at a glance. And if I ask. the LLM to write a summary in an axio style, I will always get a more useful summary right off the bat than I would otherwise. However, You will reach a point if you have a more complicated task where you have more specific requirements.
[00:41:58] And at that point, you kind of have to take the training wheels off, which are to stop using the proxy and get like pedantic. And so I have examples of a prompt that I wrote to summarize New York City council bills on my website. So tip one is to say what you actually want.
[00:42:14] What you'll find you will hit a limit of your ability to articulate. Like, the output of a prompt. So in my case, I was iterating on this, like, summarize a New York City council bill, and I had a version of a prompt that worked pretty well on the example that I provided in my presentation.
[00:42:30] The issue is that there is a lot of stuff that happens in New York City bills that I just did not account for. And what you'll find is if you take that prompt and you apply it to every single bill, you're going to find all kinds of errors that crop up because you didn't account for the variations in those bills.
[00:42:45] Or you didn't also specify, maybe how many bullet points you wanted in key points. Maybe it was too, in, like, the key points around the bill, maybe it was too verbose. And so your next step after you have taken Say what you actually want as far as possible is to provide 5 to 10 diverse illustrative exists.
[00:43:04] And that is tip 2. This is like the highest leverage activity that I do when I'm writing a prompt that I want to work as well as possible as consistently as possible. And a great place to start when you're doing this is to use the prompt that you wrote. Part one to summarize the bill to give you a starting point.
[00:43:25] It's a lot easier to get presented with a summary and comment on what you like and don't like and what you do. If you like it, you can just stick it in the examples. If you don't like it, you edit it however you want to. And then you stick it with examples and you do this repeatedly 10 examples and you will find that that prompt will work much more effectively. The benefit of doing this is that you stare at your source data a lot and you start to understand the contours of it. So, like, you will look at a bunch of bills, the bills will have more complicated fine and penalty structures. Oops, and the 1st version of our prompt, we did not talk about bills that repeal sections of the administrative code.
[00:44:05] So now we need to encode that in our prompt as well. So you also go through this process where like, if you want to prompt to work well over a wide range of data, you have no option but to really just stare at a lot of them. So this process is systematic, you It forces you to iterate on your prompt, and it forces you to stare at a lot of data to do that.
[00:44:24] And , this is the best way to go about writing a prompt that works well consistently over a lot of stuff. Now, if you're, doing a one off task, I would not do that. So, for this podcast, I prepared using Clang. So I took your outline of the podcast episode. I took three transcripts of your recent episodes that were AI related.
[00:44:44] And I threw them into cloud projects and I wrote about a thousand words on here's how I'm thinking about city meetings overall, trying to cover sort of most of your questions. And then I said, all right, Claude, let me prepare for this interview. Can you just like pretend to be Dr.
[00:44:58] Genevieve Hayes and , start asking me questions. And that was really helpful. I even asked him to give me feedback and the feedback was really good. It helped me articulate a lot of these things. More effectively. So yeah, I wouldn't follow this like process of prompt iteration for something like that.
[00:45:15] But when you do want something to work really consistently. Yeah. Like, you need to be very systematic in your approach. Say what you actually want. Give a lot of exams. That's really it. On top of that, there are all these techniques you can use, but that's those are the most important thing.
[00:45:28] Dr Genevieve Hayes: The other one I've found that works really well is break up the task into as many sub tasks as you can. So rather than saying, write me an entire essay, say, okay, write me this paragraph about this topic first, and then write this paragraph about this topic, etc.
[00:45:47] Vikram Oberoi: Totally. Yes, that is super important. It is really important to decompose your task. I, I should add that to the talk. But yeah, I find that to be really important too. I think one of the challenges with that is figuring out, how to decompose a task for language models.
[00:45:59] It's not always clear what they are going to be good at and what they're going to be bad at. So you kind of need to feel it out a little bit and build an intuition.
[00:46:06] Dr Genevieve Hayes: Yeah. And that's just practice, basically.
[00:46:09] Vikram Oberoi: Totally.
[00:46:11] Dr Genevieve Hayes: So what final advice would you give to data scientists looking to create business value from data? Sure.
[00:46:17] Vikram Oberoi: Since we're talking about language models, I'll give a language model focused answer. If you want to get good at using language models and build an intuition for what they can be used for in your day to day and what they maybe should not be used for in your day to day, you have to default to using them for virtually everything.
[00:46:33] And you've got to budget it so that you are forced to learn. At this early stage when you've got a bunch of work to do and you're like, oh, yeah, I need to just do my work and finish it. It can be hard to say, you know what? I'm going to spend 10 minutes figuring out if the language model can do this for me.
[00:46:48] But, I think you actually do need to do that and barrel through a lot of the frustration that you might run into early on to discover whether or not it's good at the task you want, or whether or not. It could be later if you decompose the problem even further or approached it a little differently.
[00:47:02] So, that's how I've gotten. Good at using language models and how I built an intuition around where to use them where not to use them. And I have this like toolbox of language model tricks and that I bring to virtually everything I do now. And honestly, it has brought a ton of joy to my work.
[00:47:20] I feel like I'm able to do more. I like building things I like writing things. Language models help me do more of that and help me tackle even more ambitious ideas like city meetings would not have been possible without them, not just the product itself, but like, all the code that I've written.
[00:47:40] Dr Genevieve Hayes: For listeners who want to learn more about you or get in contact, what can they do?
[00:47:44] Vikram Oberoi: So you can find out more about me or read my writing, including the blog post that I referenced in this podcast episode you can visit vikramoberoi. com. Or you can follow me on Twitter or threads where I write more. Occasionally I post to blue sky as well. You can find me there and you can find all of that on Vikramoberoi.
[00:48:01] Dr Genevieve Hayes: Thanks for joining me today, Vikram.
[00:48:03] Vikram Oberoi: Thanks Genevieve. I appreciate it.
[00:48:05] Dr Genevieve Hayes: And for those in the audience, thank you for listening. I'm Dr. Genevieve Hayes, and this has been Value Driven Data Science, brought to you by Genevieve Hayes Consulting.
Creators and Guests
