Images

Image segmentation

This example demonstrates how to use the Gemini API to perform image segmentation on a picture of a cat.

Import the Gemini API and necessary libraries. Make sure Pillow is installed!

from google import genai
import os
import requests
from PIL import Image
from io import BytesIO
import json
import base64

Initialize the Gemini client with your API key

client = genai.Client(api_key=os.getenv("GEMINI_API_KEY"))

Define the prompt for image segmentation, focusing on cats

prompt = """
Give the segmentation masks for the cat in the image.
Output a JSON list of segmentation masks where each entry contains the 2D
bounding box in the key \"box_2d\", the segmentation mask in key \"mask\", and
the text label in the key \"label\". Use descriptive labels.
"""

Download the cat image from cataas.com

image_url = "https://cataas.com/cat"
response = requests.get(image_url)
cat_image = Image.open(BytesIO(response.content))

Save the original image

original_filename = f"cat_original.png"
cat_image.save(original_filename)
print(f"Original image saved as: {original_filename}")

Call the Gemini API to generate content with the image and prompt

response = client.models.generate_content(
    model="gemini-2.5-pro-exp-03-25", contents=[cat_image, prompt]
)

Print the response containing segmentation information.

print(response.text)

Display and save the overlaid mask. Extract the JSON part from the response (it might be wrapped in markdown)

response_text = response.text
if "```json" in response_text:
    json_str = response_text.split("```json")[1].split("```")[0].strip()
elif "[" in response_text and "]" in response_text:
    start = response_text.find("[")
    end = response_text.rfind("]") + 1
    json_str = response_text[start:end]
else:
    json_str = response_text

Parse JSON data

mask_data = json.loads(json_str)

Get the first mask. This assumes a mask was returned.

first_mask = mask_data[0]

Extract base64 encoded mask

mask_base64 = first_mask.get("mask", "")
if "base64," in mask_base64:
    mask_base64 = mask_base64.split("base64,")[1]

Decode and load the mask image

mask_bytes = base64.b64decode(mask_base64)
mask_image = Image.open(BytesIO(mask_bytes))

Convert images to RGBA

cat_image = cat_image.convert("RGBA")
mask_image = mask_image.convert("L")  # Convert mask to grayscale

Create a bright colored overlay (bright pink)

overlay = Image.new(
    "RGBA", mask_image.size, (255, 0, 255, 128)
)  # Bright pink, semi-transparent

Use the mask to determine where to apply the color. We need the mask as an alpha channel

overlay.putalpha(mask_image)

Resize the overlay to match the original image if needed

if overlay.size != cat_image.size:
    overlay = overlay.resize(cat_image.size)

Overlay the colored mask on the original image. Save both images.

result = Image.alpha_composite(cat_image, overlay)

mask_filename = f"cat_mask.png"
mask_image.save(mask_filename)

merged_filename = f"cat_with_mask.png"
result.save(merged_filename)

Running the Example

First, install the Google Generative AI library

$ pip install google-genai

Then run the program with Python

$ python cat_segmentation.py
# Expected output (example):
# [{"box_2d": [100, 50, 900, 750], "mask": "base64_encoded_png_data", "label": "Main Coon Cat"}, ...]

An illustration or output
from the example code

Further Information

Gemini docs link 1