Audio

Audio summarization

This example demonstrates how to summarize the content of an audio file using the Gemini API.

Import the necessary libraries

from google import genai
from google.genai import types
import requests
import os

Replace with your actual API key

client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))

Define a descriptive User-Agent following Wikimedia's policy

user_agent = "GeminiByExample/1.0 (https://github.com/strickvl/geminibyexample; contact@example.org) python-requests/2.0"

Download the audio file from the URL

url = "https://upload.wikimedia.org/wikipedia/commons/1/1f/%22DayBreak%22_with_Jay_Young_on_the_USA_Radio_Network.ogg"
headers = {"User-Agent": user_agent}
response = requests.get(url, headers=headers)
response.raise_for_status()  # Raise an exception for bad status codes

Save the content to a variable Note: If your audio file is larger than 20MB, you should use the File API to upload the file first. The File API allows you to upload larger files and then reference them in your requests.

audio_bytes = response.content

You can now pass that audio file along with the prompt to Gemini

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[
        "What is this audio about?",
        types.Part.from_bytes(
            data=audio_bytes,
            mime_type="audio/ogg",
        ),
    ],
)

print(response.text)

Running the Example

First, install the Google Generative AI library and requests

$ pip install google-genai requests

Then run the program with Python

$ python audio-summarization.py
This audio is about travel tips, particularly what to do in the event of a disaster while on vacation.
The speaker emphasizes the importance of staying informed about the destination, traveling with a buddy,
having a plan in place, and investing in travel insurance. They also mention the importance of connecting
with home base and knowing the location of the American Red Cross in case of emergencies.

Further Information

Gemini docs link 1