PDFs and other data types

PDF and CSV data analysis and summarization

This example demonstrates how to use the Gemini API to analyze data from PDF and CSV files.

Import necessary libraries

from google import genai
from google.genai import types
import httpx
import os

Initialize the Gemini client with your API key

client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))

We start with the PDF analysis. Download the PDF file

pdf_url = "https://www.princexml.com/samples/invoice/invoicesample.pdf"
pdf_data = httpx.get(pdf_url).content

Prompt to extract main players from the PDF

pdf_prompt = (
    "Identify the main companies or entities mentioned in this invoice. "
    "Summarize the data."
)

Generate content with the PDF and the prompt

pdf_response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[
        types.Part.from_bytes(data=pdf_data, mime_type="application/pdf"),
        pdf_prompt,
    ],
)

Print the PDF analysis result

print("PDF Analysis Result:\n", pdf_response.text)

Moving on to the CSV analysis now. You'll note that the process is very similar. You can also pass in code files, XML, RTF, Markdown, and more. We download the CSV file here.

csv_url = "https://gist.githubusercontent.com/suellenstringer-hye/f2231b3383538bcb1a5b051c7908f5b7/raw/0f4e0733a434733cda8e749bbbf33a93c2b5bbde/test.csv"
csv_data = httpx.get(csv_url).content

Prompt to analyze the CSV data

csv_prompt = "Analyze this data and tell me about the contents. Summarize the data."

Generate content with the CSV data and the prompt

csv_response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=[
        types.Part.from_bytes(
            data=csv_data,
            mime_type="text/csv",
        ),
        csv_prompt,
    ],
)

Print the CSV analysis result

print("\nCSV Analysis Result:\n", csv_response.text)

Running the Example

First, install the Google Generative AI library and httpx (for downloading files)

$ pip install google-genai httpx pandas

Then run the program with Python

$ python pdf_csv_analysis.py
PDF Analysis Result:
 The main company mentioned in the invoice is Sunny Farm.
CSV Analysis Result:
 Okay, I've analyzed the provided data. Here's a summary of its contents:
**Data Format:**
*   The data appears to be in CSV (Comma Separated Values) format.
*   The first line is a header row defining the fields.
*   Each subsequent line represents a record containing information about a person.
**Fields Present:**
The data includes the following fields for each person:
1.  **first\_name:** The person's first name.
2.  **last\_name:** The person's last name.
3.  **company\_name:** The name of the company they are associated with.
4.  **address:** The street address.
5.  **city:** The city.
6.  **county:** The county.
7.  **state:** The state.
8.  **zip:** The zip code.
9.  **phone1:** The primary phone number.
10. **phone2:** A secondary phone number.
11. **email:** The email address.
12. **web:** The website address (presumably for the associated company).

PDF and CSV data analysis and summarization

Running the Example

Further Information