125 – AI Summaries

The IDEMS Podcast
The IDEMS Podcast
125 – AI Summaries
Loading
/

Description

As a large language model, I can’t summarise this episode… Lily and David delve into the topic of AI summaries, considering issues of misinformation, the impact on organisational reputations, and the complexities of determining responsibility.

[00:00:00] Lily: Hello, and welcome to the IDEMS podcast. I’m Lily Clements, a Data Scientist, and I’m here with David Stern, a founding director of IDEMS. Hi, David.

[00:00:14] David: Sorry, my voice is a bit iffy today, I’d say. I’ve recovered from a seasonal illness, but my voice is still struggling.

[00:00:24] Lily: That’s okay, hopefully we’ll be able to understand what you’re saying at the very least.

[00:00:29] David: Absolutely. What’s the plan for today?

[00:00:32] Lily: The plan for today? Well, I thought we could talk about these AI summaries.

[00:00:37] David: Oh, yeah.

[00:00:38] Lily: Well, there’s two related stories but I guess we can start with Google’s AI summary.

[00:00:43] David: And this was the news article I think we both saw not so long ago, which might have inspired this thinking. What was it we discussed? This is where Google was, not Google, was this Google or Apple? I thought it was the Apple one that BBC was complaining of.

[00:01:01] Lily: Yes, yeah, that was the Apple one, but let’s start on that one then. Apple was summarising BBC articles.

[00:01:08] David: Yes.

[00:01:08] Lily: And it was using an AI tool, or my understanding was it was using some kind of AI to summarise the article. And this led to incorrect summaries.

[00:01:18] David: Yes.

[00:01:19] Lily: Which then obviously upset the BBC, because they don’t want their name to be attached to these incorrect summaries.

[00:01:27] David: Exactly. And it was something very interesting like say some leader had died or something like that when they hadn’t died. It was these really rather clearly incorrect statements that were being associated to…

[00:01:40] Lily: And another one was this story from the States of Luigi kind of shooting someone, you know shoots the CEO or something, and then, but instead it was saying Luigi shoots himself.

[00:01:54] David: Exactly. And then, actually knowing how AI works, this doesn’t surprise me that it can get these things wrong. And this is something where I understand, therefore, as an established news provider like the BBC, your reputation’s really at risk if you’re not in control of what’s associated with your brand, so to speak.

[00:02:27] Lily: Yeah, absolutely. Now, I wasn’t at the receiving end of any of these summaries, but I know if I saw one of these summaries, I would have then gone to look for the article and been frustrated that, well, the BBC said this, and that now I can’t find this article.

[00:02:42] David: Yeah.

[00:02:44] Lily: So I completely agree with you. I hope that the silver lining in it, at least, is kind of highlighting how AI summaries can mislead or misinterpret and then hopefully, at least temporarily, allow people to not take everything the AI is saying that it’s summarised at face value.

[00:03:05] David: Well, that’s certainly something which anyone who’s consuming AI generated content can choose to do for themselves. The more important question is who’s responsible for what AI generated summaries are putting out there?

So this idea that the BBC name is being associated to what is misinformation. Now, is it the BBC’s responsibility to ensure that it doesn’t get misinterpreted? If it’s names associated to it, it has reputational risk. They can’t take responsibility because it cannot control what the AI will do. And so when something like this goes wrong, who is responsible and who might take, or have to take, legal responsibility? That’s a really difficult question.

[00:04:07] Lily: And I know that there’s been a lot of thinking on this. In terms of who takes responsibility when AI reacts in certain ways, there’s certainly been a lot of interest in that for many years beyond this. So there were countless scandals where the questions come to, okay, well, who’s responsible? Is it the data scientist who’s responsible that coded it? As a data scientist, I’m going to say no.

[00:04:32] David: And I think this is something where, over the next few years, a lot of these things are going to have to be settled in law suits, almost certainly, to determine that responsibility, because I don’t think there’s any conceptual way to do that at this point in time. And it’ll be really interesting where people determine that responsibility lying.

[00:04:56] Lily: Absolutely. I know that we’ve had discussions before about generative AI, kind of making up false claims and false articles. And there we’ve kind of said, well, as the end user, if you’re going to use generative AI to say these things, you should do the research to check it.

So I guess, in this case with Apple, where’s that kind of who’s at the end that can do that research to check it? Well, it can’t be the BBC because they can’t control what’s going on here. And it shouldn’t be the kind of recipient, it shouldn’t be the person receiving that AI summary, they shouldn’t need to check it.

[00:05:33] David: And so it’s really difficult, and so therefore, is it the creator of the AI who is therefore responsible for the messages that are transmitted by it? And my guess is as soon as that lawsuit happens, and it determines that the AI creator is responsible, legally responsible for the accuracy of the generative AI summary statements. I’m fairly certain almost all those summary statements will disappear overnight. So I don’t know that the judges will be brave enough to actually determine that responsibility lies with the tech platforms either.

And so therefore, where does that leave us? It’s not clear.

[00:06:15] Lily: And then the scary thing is, okay, so these are kind of small cases of well, it gives you a certain summary, you know, it gives you a summary of a news article. That’s not so bad. But what if you were the parent who received that summary of a news article where it was giving this misinformation? You’re the parent of a child who’s in the article. Or what if those summaries kind of weren’t necessarily about news articles, but about something else and then it actually impacted.

[00:06:46] David: Yeah, who knows? The potential here is huge. And this is where these are still early days in this. These things are going to evolve in ways we don’t understand yet.

But what I think is important is we need to remember that if you understand what generative AI is doing, it is not surprising that these mistakes leading to misinformation are happening. Because they are not understanding in the way a human would understand, and then summarising. They are summarising, and there’s a fundamental difference.

[00:07:23] Lily: And there is but it’s still hard to get, I mean, I’m a human, I know how humans think. I mean, I say think but AI doesn’t think like we do. It’s still hard to get your head around this idea that, you know, how does it, how’s it doing it?

[00:07:38] David: Yeah.

You had another example, you weren’t going to choose the Apple example. So what was the one you were going to start with?

[00:07:45] Lily: So I was gonna start with this Google’s AI summary. So I’m not sure if you’ve seen this. I’m not sure how much you use it, but when you Google something it gives you this AI summary at the start. And at first I was continually ignoring that, because I wanted to ignore it, I don’t want to read your summary, I want to find the article, I want to find the relevant information.

But now I find myself even drawn to reading it. I don’t even know when that happened, I think just I started accidentally reading it, and now I do read it. I don’t know if we can turn it off, I hope I can. If I can, I’ll turn it off because then, yesterday, the day before recording this I was climbing indoors at a usual gym that we climb at, bouldering. Everything’s fine, and I had a fall off one of the heights. And I fell down and I had to go to A& E as a result. Everything’s fine.

[00:08:30] David: If I understand correctly, your ankle is not totally fine But everything’s fine, I mean, you’re healthy enough to record this podcast with me.

[00:08:39] Lily: I can record and I can do everything but move and I’ll have to wait a little bit longer until I can do some exercise again, but that’s fine, yes. Mentally and otherwise physically, everything is okay.

And while I was there, they wanted to know, okay, what was the height of the wall you came off? Because there’s some kind of threshold. If it’s above a certain threshold, and I think it was around 15 foot, but I can’t remember, but if it was above a certain threshold, then they needed to check me over for kind of damage to, well, they were going to check anyway, I guess, but damage to, like, my back and things. The landing you fall on is very soft, happens all the time.

[00:09:15] David: Yeah.

[00:09:16] Lily: And so I googled it. So I just googled the name of the place I went climbing and I googled how tall are the bouldering sacks? I know it’s not too bad. And the kind of Google AI summary came up saying 22 meters and I didn’t think about that at first.

[00:09:30] David: It’s pretty high.

[00:09:32] Lily: Yeah, that you definitely want to check me over.

[00:09:36] David: Absolutely.

[00:09:37] Lily: And I just wasn’t thinking in the moment, I was like, 22 meters. Then I said to them like, it says here 22 meters, but that definitely can’t be right. And then I read the article a bit more, and it said 22 foot, or 20 foot. And so I was like, okay, it says here it was about 20 foot, but again, that seems a little bit too tall. In the end I think it was about 15 foot.

But, what happened was, I was like, okay, this summary is saying something that’s clearly incorrect. So I went on to their website to see if I could find it on there. And on their website, I came across an article saying that they were going to install a 22 meter bouldering wall for you to climb up. And the article was then dated the 1st of April. And the article ended with Happy April Fool’s Day.

So it’s like, Oh! This 22 metre thing obviously, isn’t as tall as it was. Like, I knew that. But the AI summary can’t decipher, the Google’s AI tool can’t tell that this is an April Fool’s article.

[00:10:38] David: Oh, wonderful! Oh, that’s such a nice example. I will steal and use that example in so many cases from now on. Anytime I’m teaching about responsible AI, that example’s gonna come out. Oh, that’s a wonderful one.

[00:10:54] Lily: I felt like such a sucker for falling into it as well. I should know this. I should know to not read this stuff. Anyway, yes, so it was an April Fool’s joke.

[00:11:04] David: But now you’re going to get guidelines on your sites and so on if you’re going to have an April’s fool joke to make sure it’s labelled as such that AI can know that this isn’t true or something like that, you know, oh, this is great. Oh, I love it. I can just see how we’re going to have to change the way we design and post things on websites in the future to make sure this is actual information and this is false information. Oh, I love it.

[00:11:33] Lily: Yeah, it’ll get interesting. I know I’ve heard before that the kind of Google AI tool has summarised like satire articles.

[00:11:42] David: Yes.

[00:11:43] Lily: And obviously there’s a danger in that. But I haven’t heard about it doing to an article on an April Fool’s joke, and I am interested to see where we go with this. But again, like, who’s responsible? Because obviously, where I went bouldering, they’re not responsible. They let out a fun article a while ago as a joke, and a clear joke, because they’re not going to put in a 22 meter wall.

I guess I’m the responsible one, because I read the summary, and then I knew, no, that’s not right. And then I had to look elsewhere. And the Google summary says at the bottom as well, in very small writing, it’s like, this may not be accurate. And I’m like, okay.

[00:12:23] David: Oh, this is good. I love this example. And you’re right. I mean, it’s clear that if we want these AI summaries to be trustable, then we’re going to have to put in place so many structures to be able, enable the data they’re consuming to be distinguishable. This is, I love this because as you say, it is so understandable to understand how this mistake can happen. And yet, at the same time, it’s so difficult to imagine building reliable AI that could avoid giving misinformation out of context in this.

[00:13:14] Lily: So how can you build AI that checks?

[00:13:18] David: I suppose one of the things there that would help, you went to then find, and you went to the website.

[00:13:26] Lily: Yeah.

[00:13:26] David: That 22 meters that they had, had a reference attached where you could trace back where that specific piece of information came from, and therefore that would be something where then this could be humanly validated as being a reliable source of information in some form. And that could over time lead to the AI algorithms using it or not using it.

This is how Google Maps works. If you go on Google Maps, it asks you, have you been to this place? What is it? And so on. So actually, if you started applying that to some of the AI summaries, then you could maybe insert in elements of validation, which would make the summaries more reliable, or that could over time make summaries more reliable.

[00:14:20] Lily: Which is adding in, well, going back to a much earlier podcast how adding the humans in the loop to do that validation, that checking.

[00:14:29] David: Exactly. And you as somebody who’s read that now and looked and questioned it, you would be part of that validation effort, which is exactly what happens in many other tools. And at the moment, that isn’t happening enough in our large language models.

So this is actually where, there’s ways here, what you’re giving here is a wonderful example of, to me, a way to insert humans in the loop in a way which could actually make reliable AI in ways that I hadn’t considered before. You know, this is actually very powerful and very interesting. Oh, yes, I like this. This is a fantastic example, so much more fun than the news article I was thinking about.

And the thing I love about this, actually what’s interesting, of course, is that once you get down to the sort of question you had, you’re no longer in the realm of big data because there isn’t that much data which directly related to your question.

[00:15:38] Lily: Yeah.

[00:15:38] David: And so actually, the fact that you could trace back the source of that information, and in this case you were able to trace it to a source of misinformation. This is the sort of thing where actually, these are your back to human problems, which is where you want humans in the loop.

[00:15:59] Lily: Yeah, so this is kind of, I guess, I googled for a specific place of the height of their walls because I thought maybe they’ll say it on their website type thing. So I suppose for these specific things, because there’s less data than it is more likely to be skewed by April Fool’s articles and things.

[00:16:15] David: By a single source of information, this is the key thing. If the AI summary is getting its information from a specific source, being able to point to that reference in that source and identify it, would then enable it to be validated or not.

[00:16:36] Lily: Yeah. Yeah, absolutely.

[00:16:40] David: Now, the question is, and I don’t know this, how easy would it be to add those direct links to those references? Let’s say facts and figures, because what you’re talking about here is a fact, it’s a number, which had originated in a specific source. You found the source material. So this is where it came from, almost without a doubt. You know, AI didn’t invent this, it found this in its memory bank. So with that case, how easy would it be to keep the metadata of the source articles associated to those facts and figures?

[00:17:25] Lily: Other generative AI tools have now started to give direct citations in line, if you ask it.

[00:17:32] David: Absolutely. And this is something we’re particularly more academic tools that I’ve been using, this has been something which was a huge criticism in the past, a huge amount of work has happened which has done this and which now enables this, and you can build this in.

Could this be built in at the scale of Google’s summary? I guess it’s just a matter of time and effort. Yes, I presume it could. And you’re getting to something which is now able to build on the tools on which Google is built, which is, you know, Google Maps and other Google tools, where you can get this human information coming in at scale from people in the population, validated in ways which serve their own needs.

[00:18:15] Lily: Interesting. Yeah, to me, it makes sense, and maybe that is something that they have in the pipeline, or planned.

[00:18:24] David: I don’t know in this case.

[00:18:25] Lily: I know that the Google summary, for me, anyway, felt like it came out of nowhere. Just one day I was like, oh, apparently we have this now. But I didn’t know about that coming. I’m sure that there was some word about it, and that some people knew that it was coming. Since it kind of came out of nowhere, it was kind of a bit like, oh, do I, do I use this? How reliable is this? No one’s spoken about this to me, or had at that point.

[00:18:53] David: And I think, I suppose the two examples that we have, with this and with the Apple summary related to the BBC. Both of these are instances where AI is being used to summarise information which is more widely available for rapid consumption.

[00:19:17] Lily: Yeah.

[00:19:18] David: And this is a really general use now of these AI tools. And what I think is the key question, which is really at the heart of this is, well, what’s going to happen with this in terms of actual responsibility for accuracy. We’ve got these two nice examples, you know, one in the news, the other very personal, of cases where these summaries have got it wrong.

These are not surprising. This is exactly what one would expect. You know, you use it, you’re expecting that it will sometimes get it wrong. It’ll of course often get it right, but it’ll sometimes get it wrong.

And it brings up all these interesting questions about how do you reduce how often it gets wrong? Who takes responsibility when it does go wrong and that leads to negative knock on effects? Is that something which somebody is legally responsible for? Even if they put a disclaimer, this might not be accurate.

I would be interested to know whether Apple also put that disclaimer on and whether this is then in the eyes of the BBC, in that example, I would doubt that that would be seen as legally enough, given that they were associating their brand.

So I think, my guess is that in these two cases, I don’t see any way, if I was in Apple’s shoes or Google’s on this, where I could imagine the BBC case being legally not at fault. If I was the one creating a summary and associating it to another legal entity, whatever disclaimer I have or anything, I would be worried that I would be found at fault.

In the Google case for you, I think that disclaimer they have is probably enough.

[00:21:30] Lily: Yes.

[00:21:30] David: But I think it’s interesting that in that case, I can see more easily how to fix it.

[00:21:37] Lily: Yeah.

[00:21:38] David: Because all you need to do is have this flagged enough time by enough people, as the fact that this is misinformation or this is incorrect, and even if you don’t have human intervention at the algorithm side or anything like that, that should be enough to start skewing the distribution in favour of other results.

[00:21:59] Lily: And I guess with the Apple cases, where do the humans, where can that feedback loop come in?

[00:22:06] David: It can’t, because the whole point of that one, there’s so many differences, but in that case, this was news, so this was at a moment in time. So there’s no time for any human feedback to come in on that.

[00:22:23] Lily: Yeah, by definition?

[00:22:26] David: By definition. Whereas your information, my guess is that particular climbing sort of thing, you know, you’re probably going to get a question like that coming up every so often over long periods of time.

[00:22:40] Lily: Yeah.

[00:22:41] David: And so it’s something which then the algorithm has time in between to learn and to respond to its feedback and so on, if that comes in. So even if not everyone who looks at it gives the sort of feedback that you might have given to be able to flag this as no, the reference you’ve got to this is misinformation, whatever it is. Now of course, then there’s all sorts of problems of, well, what happens if people deliberately misuse it.

Now, that’s a malicious attack on your AI system, that’s a whole different thing to deal with. So, I’m not saying that adding a random human input to this is necessarily a good thing.

[00:23:21] Lily: By deliberately misuse it, do you mean deliberately flagging articles as saying…

[00:23:27] David: So, you found there was an article they flagged which was basically an April Fool’s article which misidentified the height of the wall it sort of came at. And you could flag that as being sort of misinformation or whatever it is, and over time that information will be ignored by the algorithm and so on.

I’m a group of people who don’t believe in climate change. When we search, you know, what change in temperature is there, articles get referred back to the International Panel on Climate Change, which they flag as misinformation. And they made sure them and all their friends do it, and so now the algorithm therefore learns.

[00:24:10] Lily: But what if, could the algorithm say, kind of an accuracy, like, you know, so this is what we’re saying, we’re saying it’s 22 meters, but by the way, this has been previously flagged as misinformation, so that then the user can say, okay, this has been flagged as misinformation, let me just check that by clicking on your linked article and I’ll see for myself if I believe that this is misinformation or not.

[00:24:39] David: We’re getting into a design process. Yes, there’s all sorts of things that could be designed. And yes, I think there’s things that could be done. I don’t know how it would work, which would be sensible, which would be what people want, because the whole point is, if you’re actually wanting to go to the site, why don’t you just skip the Google summary and just go to the sites?

[00:24:55] Lily: Yes, because the Google summary comes up at the very top, but then so do the Google ads and you know.

[00:25:02] David: Anyway, I think that is a beautiful example of these summaries, the way they are entering into the systems, the fact that this is just in very early days. This will be interesting because some of this is going to play out in lawsuits for sure. It’s a difficult topic.

[00:25:20] Lily: Good. Well, it’s been very interesting discussing it with you, David, as always. So thank you very much.

[00:25:26] David: No, thank you. And thank you for bringing that second example.