Quickstart

batchling is a powerful python library that lets you transform any async GenAI workflow to batched inference, saving you money for workloads that can complete asynchronously (up to 24 hours). You typically want to use batched inference when running any process that does not require immediate response and can wait at most 24 hours (most jobs are completed in a few hours).

Example use-cases

The range of use-cases you can tackle effortlessly with batchling is extremely wide, here are a few examples:

Embedding text chunks for your RAG application overnight
Running large-scale classification with your favourite GenAI provider and/or framework
Run any GenAI evaluation pipeline
Generate huge volume of synthtetic data at half the cost
Transcribe or translate hours of audio in bulk

Things you might not want to run with batchling (yes, there are some..):

User-facing applications e.g. chatbots, you typically want fast answers
A whole agentic loop with tons of calls
Full AI workflows with a lot of sequential calls (each additional step can add another asynchronous batch cycle, up to 24 hours at worst)

Installation

batchling is available on PyPI as batchling, install using either pip or uv:

uvpip

uv add batchling

pip install batchling

Art Metadata Generation Example

batchling integrates smoothly with any async function doing GenAI calls or within a whole async script that you'd run with asyncio.

Let's try to generate metadata for famous pieces of art from the National Gallery of Arts.

For this example, we'll use the three following art pieces:

Portrait of a Napoleonic Officer in Dress Uniform

For each art piece, we will generate the following metadata:

author name
name of the art piece
period in which it was created
movement to which it belongs
material used to create the piece
list of tags representing visual elements of the art piece
short description about the background
fun fact about it

Let's suppose we have an existing script art_metadata.py that uses the OpenAI client to make parallel calls using asyncio.gather to generate metadata to the three art pieces:

art_metadata.py

import asyncio

from dotenv import load_dotenv
from openai import AsyncOpenAI
from pydantic import BaseModel, Field

from batchling import batchify

load_dotenv()


class ArtMetadata(BaseModel):
    author: str = Field(
        description="The name of the artist who created the artwork.",
        examples=["Vincent van Gogh", "Leonardo da Vinci", "Michelangelo", "Pablo Picasso"],
    )
    name: str = Field(
        description="The title of the artwork.",
        examples=["The Starry Night", "The Scream", "The Last Supper", "The Mona Lisa"],
    )
    period: str = Field(
        description="The period or time of the artwork. Be as precise as possible.",
        examples=["1600's", "1758", "Renaissance"],
    )
    movement: str = Field(
        description="The movement or style of the artwork.",
        examples=["Impressionism", "Baroque", "Rococo", "Neoclassicism"],
    )
    material: str = Field(
        description="The material or medium of the artwork.",
        examples=["Oil on canvas", "Watercolor", "Gouache", "Pastel"],
    )
    tags: list[str] = Field(
        description="A list of tags or keywords that describe the artwork.",
        examples=["landscape", "portrait", "still life", "war scene", "historical event"],
    )
    context: str = Field(description="A short text describing the context of the artwork.")
    fun_fact: str = Field(description="A fun fact about the artwork.")


client = AsyncOpenAI()


async def generate_art_metadata(image_url: str):
    input_query = [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_image",
                    "image_url": image_url,
                },
            ],
        }
    ]
    return await client.responses.parse(
        input=input_query,
        model="gpt-5-nano",
        text_format=ArtMetadata,
    )


async def build_tasks():
    image_urls = [
        "https://api.nga.gov/iiif/a2e6da57-3cd1-4235-b20e-95dcaefed6c8/full/!800,800/0/default.jpg",
        "https://api.nga.gov/iiif/9d8f80cf-7d2c-455a-9fb9-73e5ce2012b2/full/!800,800/0/default.jpg",
        "https://api.nga.gov/iiif/54ee6643-e0f9-4b92-a1d2-441e5108724d/full/!800,800/0/default.jpg",
    ]
    return [generate_art_metadata(image_url) for image_url in image_urls]


async def enrich_art_images() -> list[ArtMetadata]:
    tasks = await build_tasks()
    responses = await asyncio.gather(*tasks)
    for response in responses:
        print(response.output_parsed.model_dump_json(indent=2))

CLIPython SDK

For you to switch this async execution to a batched inference one, you just have to run your script using the batchling CLI and targetting the enrich_art_images function ran by asyncio:

batchling art_metadata.py:enrich_art_images

To selectively batchify certain pieces of your code execution, you can rely on the batchify function, which exposes an async context manager.

First, add this import at the top of your file:

+ from batchling import batchify

Then, let's modify our async function enrich_art_images to wrap the asyncio.gather call into the batchify async context manager:

 async def enrich_art_images() -> None:
     """Run the OpenAI example."""
     tasks = await build_tasks()
-    responses = await asyncio.gather(*tasks)
+    async with batchify():
+        responses = await asyncio.gather(*tasks)
     for response in responses:
         print(response.output_parsed.model_dump_json(indent=2))

Output:

{
  "author": "",
  "name": "Portrait of a Woman in Striped Dress",
  "period": "Early 20th century",
  "movement": "Post-Impressionism / Fauvism influence",
  "material": "Oil on canvas",
  "tags": [
    "portrait",
    "woman",
    "stripes",
    "bold color",
    "flower bouquet",
    "interior",
    "fauvism",
    "post-impressionism"
  ],
  "context": "A seated young woman in a striped red-and-blue blouse and a polka-dot blue skirt, holding flowers, set against a flat mint-green background. The brushwork and color treatment emphasize flat planes and expressive color typical of Post-Impressionist/Fauvist portraiture.",
  "fun_fact": "The composition and bold color palette resemble early 20th-century French modernist works; exact attribution is difficult from a single image."
}

{
  "author": "",
  "name": "",
  "period": "Early 19th century (Napoleonic era)",
  "movement": "",
  "material": "Oil on canvas",
  "tags": [
    "portrait",
    "military",
    "uniform",
    "Napoleonic era",
    "European",
    "blue coat",
    "epaulettes",
    "medal",
    "sash"
  ],
  "context": "A formal portrait of a high-ranking military officer in full dress uniform, set in an ornate interior with classical furnishings, suggesting a studio or official setting typical of early 19th-century noble or military portraiture.",
  "fun_fact": "The subject wears a star medal and epaulettes common to high-ranking officers of Napoleonic Europe; the rich interior and ceremonial attire indicate status and prestige rather than battlefield activity."
}

{
  "author": "Vincent van Gogh",
  "name": "Self-Portrait with Palette",
  "period": "1889",
  "movement": "Post-Impressionism",
  "material": "Oil on canvas",
  "tags": [
    "Self-portrait",
    "palette",
    "brushwork",
    "van Gogh",
    "Post-Impressionism",
    "blue background",
    "oil on canvas"
  ],
  "context": "A self-portrait painted by Vincent van Gogh during his stay at the Saint-Paul asylum in Saint-Rémy-de-Provence in 1889. The image shows van Gogh holding a palette, set against a swirling blue background that demonstrates his bold, expressive brushwork and distinctive color sense.",
  "fun_fact": "Van Gogh produced hundreds of self-portraits, using them to study color and emotion; this piece is notable for its cool blue tones contrasting with the warmer tones of his hair and beard."
}

You can run the script and see for yourself, normally small batches like that should run under 5-10 minutes at most.

Next Steps

Now that you've seen how batchling can be used and want to learn more about it, you can header over to the following sections of the documentation:

Learn how to control batching behavior through batchify parameters
Learn more about the Python SDK usage
Learn more about CLI usage
Learn about supported Providers & Frameworks
Stream batch results progressively with asyncio.as_completed