Video transcription
This example demonstrates how to transcribe the content of a video using the Gemini API. Note: For videos larger than 20MB, you must use the File API for uploading.
Import the Gemini API
from google import genai
from google.genai import types
import os
import requests
Initialize the Gemini client with your API key
client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))
video_url = "https://download.samplelib.com/mp4/sample-5s.mp4"
Download the video file.
Read the video file as bytes for inline upload.
response = requests.get(video_url)
video_bytes = response.content
Define our prompt
prompt = (
"Transcribe the audio from this video, giving timestamps for "
"salient events in the video. Also provide visual descriptions."
)
Create a Gemini request with the video and our prompt.
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=types.Content(
parts=[
types.Part(text=prompt),
types.Part(inline_data=types.Blob(data=video_bytes, mime_type="video/mp4")),
]
),
)
Print the model's response
print(response.text)
Running the Example
First, install the Google Generative AI library
$ pip install google-genai
Then run the program with Python
$ python video_transcription.py
Okay, here's the transcription and visual descriptions of the video:
**Video Description:**
The video pans up from a low angle showing a park with lush green trees. Sunlight filters through the leaves. In the distance, cars and a bus can be seen on a road next to the park. There is a paved walkway and low bushes.
**Timestamps:**
* **0:00** Camera starts panning up showing a park with trees and sunlight.
* **0:04** The camera reaches its highest point in its view.