Tech & AI

How To Use Gemini AI To Summarize YouTube Videos


A follow-up question about the final score was answered correctly, but Gemini got the name of the scorer of the first touchdown wrong: The AI suggested it was Johan Dotson. Dotson was shown getting a touchdown in the highlights with the scores at 0-0, but it was ruled out—an example of the nuances that AI doesn’t necessarily pick up on.

Gemini did successfully identify when the Kansas City Chiefs got their first points, and even included a timestamp linking straight to the touchdown in the YouTube clip. It also got the name of the scorer right. It seems Gemini is heavily reliant on the commentary for sports clips, which isn’t surprising.

Summarize Video Contents

Image may contain File and Webpage

The AI can pick out video details—if they’re mentioned in the audio.

Photograph: David Nield

Next, we tried putting Gemini up against a behind-the-scenes featurette for The Grand Budapest Hotel, directed by Wes Anderson. The clip runs to four-and-a-half minutes, and Gemini fired back some replies almost instantly: It identified the name of the film being talked about, and the main beats of the clip’s narrative.

However, it’s all reliant on the audio (or the transcript) again—there doesn’t seem to be any analysis of the actual video contents. The AI couldn’t say who the talking heads were in the video, even though their names were shown on screen, and wasn’t able to say who the director was (even though this was also mentioned in the video description).

On the plus side, Gemini did do an impressive job of summing up the audio of the video. It correctly identified some of the filmmaking challenges that were mentioned throughout, and provided timestamps to them — from looking for a set to represent the Grand Budapest, to filling it with extras.

Summarize Interviews

Image may contain Page Text File and Webpage

Gemini can provide timestamps for the specified video.

Photograph: David Nield

Finally, we tried Google Gemini with an interview: Channel 4 in the UK speaking to Charlie Brooker and Siena Kelly about the latest series of Black Mirror (perhaps appropriate for an article on AI). Gemini proved itself very capable at picking out the talking points, and adding timestamps, though of course the whole video is mostly talking.

Again though, there’s no context about anything outside of the audio or the transcript. Gemini AI couldn’t say where the interview took place, or how the participants were acting, or anything else about the visuals of the video—which is worth bearing in mind if you use it yourself.

For videos where the answers you want are in the audio of a YouTube video, and its associated transcript, Gemini works really well at summarizing and providing accurate answers (provided the commentators mention when a touchdown is ruled out, as well as when one is scored). For any kind of visual information, you’re still going to have to watch the video yourself.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *