Skip to content
PDF

OpenAI APIs

OpenAI Batch Processing

Introduction

OpenAI Batch Processing allows you to submit multiple API requests for asynchronous processing at 50% of the cost of standard requests. All requests in a batch must be of the same type (all embeddings, all chat completions, etc.) and are organized in a JSONL (JSON Lines) file format. Requests are processed when resources are available, guaranteed within 24 hours.

How Does Batch Processing Work?

The batch processing workflow follows these key steps:

  1. Create and upload the batch file to OpenAI
  2. Create a batch job from the uploaded file
  3. Monitor the batch status
  4. Retrieve results once processing is complete

Examples

Example 1

Create multiple embedding tasks and upload them as a batch file. All tasks must be of the same type:

// List of texts to create embeddings for
def texts = [
    "Hello world",
    "Machine learning is fascinating",
    "Natural language processing enables AI",
    "Batch processing saves costs",
    "OpenAI provides powerful AI models"
]

// Build the batch file with all tasks
// All tasks must be of the same type (all embeddings in this case)
def batchFile = openai.newBatchFileRequestBuilder("embeddings_batch")

// Create embedding requests and tasks using a closure
texts.eachWithIndex { text, i ->
    def embeddingRequest = openai.newEmbeddingRequestBuilder()
            .model("text-embedding-3-small")
            .input(text)
            .user("test-user")
            .build()

    def task = openai.newBatchTaskForEmbeddingsBuilder("embedding_${String.format('%03d', i + 1)}")
            .task(embeddingRequest)
            .build()

    batchFile.addTask(task)
}

// Build and upload the batch file
def uploadedFile = openai.uploadBatchFile(batchFile.build())
out << "File uploaded with ID: ${uploadedFile.id}"

This example shows how to create multiple embedding requests and tasks in a loop, then combine them into a single batch file. Each task is assigned a unique custom ID that you can use to track and identify individual requests in the batch results.

Example 2

After uploading a batch file, create a batch job to start processing:

// Use the file ID from the uploaded batch file
def fileId = "file-xxx"

// Create batch parameters for embeddings endpoint
def builder = openai.newBatchCreateParamsForEmbeddingsBuilder(fileId)

// Create the batch job
def batch = openai.createBatch(builder.build())
out << "Batch created with ID: ${batch.id}"
out << "<br>Status: ${batch.status}"

The batch processes asynchronously. Check status later.

Example 3

Once a batch job is created, you can check its status and retrieve results:

// Use the batch ID from the previous script
def batchId = "batch_xxx"

def batch = openai.retrieveBatch(batchId)
out << "Batch ID: ${batch.id}"
out << "<br>Status: ${batch.status}"
out << "<br>Created at: ${batch.createdAt}"
out << "<br>Completed at: ${batch.completedAt}"

// Check if batch is completed
if (batch.status == "completed") {

    // Retrieve batch file results
    def results = openai.retrieveBatchFileResult(batch)

    // Display batch results
    out << "<br><h3>Batch Results:</h3>"

    // Process results
    results.taskResults.each { result ->
        def embeddingResult = result.response
        out << "<br><strong>Task ID:</strong> ${result.customId}"
        out << "<br>&nbsp;<strong>Model:</strong> ${embeddingResult.model}"
        out << "<br>&nbsp;<strong>Embeddings:</strong>"

        embeddingResult.data.eachWithIndex { embedding, index ->
            def vector = embedding.embedding
            out << "<br>&nbsp;  [${index}] Dimension: ${vector.size()}, Sample: ${vector.take(5)}"
        }
    }
}

This example shows how to retrieve batch status and results. Batch status can be: validating, in_progress, finalizing, completed, expired, cancelling, or cancelled. Once completed, you can access the output file and process individual task results using their custom IDs.

Example 4

List all batch jobs with pagination:

// List batches with a limit
def page = openai.listBatches(openai.newBatchListParamsBuilder().limit(5).build())

// Display current page results
out << "Batches on current page:"
page.data.each { batch ->
    out << "<br>  - ${batch.id}: ${batch.status}"
}

// Navigate through all pages
while (page.hasNextPage) {
    page = page.nextPage
    out << "<br>Batches on next page:"
    page.data.each { batch ->
        out << "<br>  - ${batch.id}: ${batch.status}"
    }
}

Use pagination to navigate through all batch jobs and monitor their statuses.

Supported Request Types

Batch processing supports these request types:

  1. Chat Completions: Use newBatchTaskForChatCompletionsBuilder() for chat-based interactions
  2. Completions: Use newBatchTaskForCompletionsBuilder() for text completion requests
  3. Embeddings: Use newBatchTaskForEmbeddingsBuilder() for embedding generation
  4. Responses: Use newBatchTaskForResponsesBuilder() for response API requests

Batch Status Lifecycle

A batch job goes through several status stages:

  1. validating: The batch file is being validated
  2. in_progress: The batch is being processed
  3. finalizing: The batch is completing and results are being prepared
  4. completed: The batch has finished successfully
  5. expired: The batch expired before completion
  6. cancelling: The batch is being cancelled
  7. cancelled: The batch was cancelled

Best Practices

  1. Use Custom IDs: Always assign meaningful custom IDs to your tasks to easily identify results
  2. File Organization: Keep track of uploaded file IDs and batch IDs for result retrieval
  3. Same Task Type: Ensure all tasks in a batch file are of the same type (all embeddings, all chat completions, etc.)

When to Use Batch Processing

Use batch processing when you have many requests to process, immediate results are not required, and you want to reduce API costs. Ideal for processing large document collections or generating embeddings for knowledge bases.

Completion Window

Batch requests are guaranteed to complete within 24 hours. Ensure your application can handle this delay.

OpenAI Evaluations (Evals)

Introduction

OpenAI Evaluations (Evals) let you test and measure AI model performance against specific criteria. Use them to validate accuracy, compare models, and ensure quality.

What are Evals Used For?

Evals are used to:

  • Measure Model Performance: Evaluate how well a model performs on specific tasks or datasets
  • Compare Models: Test different models against the same criteria to determine which performs better
  • Quality Assurance: Ensure models meet quality standards before deployment
  • Continuous Monitoring: Track model performance over time and detect degradation
  • A/B Testing: Compare different model configurations or prompts

How Do Evals Work?

The process has two steps:

  1. Eval Creation: Define the evaluation framework with data schema, testing criteria (graders), and metadata
  2. Eval Run Creation: Execute the evaluation with a test data file and model specification

You can reuse the eval definition for multiple runs to test different models or datasets.

Testing Criteria (Graders)

Evals support different types of graders to evaluate model outputs:

  • String Check Grader: Compares model output to a reference string using operations like eq (equals), contains, starts_with, or ends_with
  • Text Similarity Grader: Measures semantic similarity using metrics like cosine similarity or Jaccard similarity
  • Python Grader: Custom Python code for complex evaluation logic
  • Label Model Grader: Uses another AI model to classify or label the output
  • Score Model Grader: Uses an AI model to score the output on a scale

Example: Creating and Running an Eval with String Check Grader

This example demonstrates the complete workflow: creating an eval and then running it.

Step 1: Create the Eval

// Define the data schema for your test cases
def itemSchema = [
    "type": "object",
    "properties": [
        "ticket_text": ["type": "string"],
        "correct_label": ["type": "string"]
    ],
    "required": ["ticket_text", "correct_label"]
]

// Create the eval with a String check grader
def evalParams = openai.newEvalCreateParamsBuilder()
    .name("Ticket Classification Evaluation")
    .dataSourceConfigOfCustom(itemSchema)
    .addStringCheckGraderTestingCriterion(
        "Match output to human label",
        '{{ sample.output_text }}',
        '{{ item.correct_label }}',
        "eq"
    )
    .build()

// Create the eval
def eval = openai.createEval(evalParams)
out << "Eval created with ID: ${eval.id}"

This example creates an eval that:

  • Defines a data schema with ticket_text (the test case) and correct_label (the expected answer)
  • Uses a String Check Grader to compare the model's output ({{ sample.output_text }}) with the correct label ({{ item.correct_label }})
  • The eq operation checks for exact equality

The {{ sample.output_text }} and {{ item.correct_label }} syntax uses JSONPath to reference fields from the evaluation data.

Step 2: Prepare the Eval Data File

Before creating an eval run, you need to prepare a JSONL (JSON Lines) file with your test data. Each line must be a valid JSON object that matches your data schema. The file should be structured as follows:

{ "item": { "ticket_text": "My monitor won't turn on!", "correct_label": "Hardware" } }
{ "item": { "ticket_text": "I'm in vim and I can't quit!", "correct_label": "Software" } }
{ "item": { "ticket_text": "Best restaurants in Cleveland?", "correct_label": "Other" } }

Each line contains an item object with the fields defined in your schema (ticket_text and correct_label in this example).

Step 3: Create an Eval Run

// Use the eval ID from the previous script
def evalId = "eval-xxx"

// First, upload your test data file (JSONL format)
def dataFile = docman.getNodeByPath("TestData:eval_data.jsonl").content
def uploadedFile = openai.uploadFile(
    openai.newFileCreateParamsBuilder("evals", dataFile).build()
)

// Define the input messages for the model
def inputMessages = [
    openai.newEvalInputMessage(
        "developer",
        "You are an expert in categorizing IT support tickets. Given the support ticket below, categorize the request into one of Hardware, Software, or Other. Respond with only one of those words."
    ),
    openai.newEvalInputMessage(
        "user",
        '{{ item.ticket_text }}'  // Reference to the ticket_text field from your data
    )
]

// Create eval run parameters for responses
def runParams = openai.newEvalRunCreateParamsForResponsesBuilder(
    "test_run_001",
    uploadedFile.id,
    inputMessages,
    "gpt-4o"
).build()

// Create and run the eval
def evalRun = openai.createEvalRun(evalId, runParams)
out << "Eval run created with ID: ${evalRun.id}"
out << "<br>Status: ${evalRun.status}"

This example:

  • Uploads a JSONL file containing test cases in the required format (each line is a JSON object with an item containing ticket_text and correct_label fields)
  • Defines the prompt messages that will be sent to the model, using {{ item.ticket_text }} to reference the input field
  • Creates an eval run that tests the gpt-4o model against the eval criteria
  • The model's responses will be automatically graded using the String Check Grader defined in the eval, comparing {{ sample.output_text }} with {{ item.correct_label }}

You can retrieve the eval run results later to see how the model performed.

Step 4: Managing Eval Runs

Once you've created eval runs, you can manage them through various operations:

def evalId = "eval_xxx"
def runId = "evalrun_xxx"

// List all runs for an eval
def page = openai.listEvalRuns(evalId)
page.data.each { run ->
    out << "<br>Run: ${run.id} - Status: ${run.status}"
}

// Retrieve a specific eval run
def run = openai.retrieveEvalRun(
    runId, 
    openai.newEvalRunRetrieveParamsBuilder(evalId).build()
)
out << "Run status: ${run.status}"

// Cancel an eval run
openai.cancelEvalRun(runId, evalId)
out << "<br>Eval run cancelled"

// Delete an eval run
openai.deleteEvalRun(runId, evalId)
out << "<br>Eval run deleted"

This example demonstrates how to:

  • List all runs associated with an eval
  • Retrieve details about a specific run, including its status and progress
  • Cancel a specific eval run
  • Delete an eval run

Eval runs can have different statuses such as queued, in_progress, completed, cancelled, or failed. Use these operations to monitor and control your evaluation runs.

Managing Evals

You can manage your evals through various operations:

// Use the eval ID from the previous script
def evalId = "eval-xxx"
// List all evals
def page = openai.listEvals(openai.newEvalListParamsBuilder().limit(10).build())
page.data.each { eval ->
    out << "<br>Eval: ${eval.name} (ID: ${eval.id})"
}

// Retrieve a specific eval
def eval = openai.retrieveEval(evalId)

// Update an eval
out << openai.updateEval(evalId, openai.newEvalUpdateParamsBuilder().name("Updated Name").build())

// Delete an eval
out << openai.deleteEval(eval.id)

Other Grader Types

In addition to String Check Grader, you can use:

  • Text Similarity Grader: addTextSimilarityTestingCriterion() - Measures semantic similarity using cosine or Jaccard metrics
  • Python Grader: addPythonGraderTestingCriterion() - Custom Python code for complex evaluation logic
  • Label Model Grader: addLabelModelTestingCriterion() - Uses another model to classify outputs
  • Score Model Grader: addTestingCriterion() - Uses a model to score outputs on a scale

Each grader type is suited for different evaluation scenarios. Choose the one that best matches your evaluation needs.

OpenAI Fine-Tuning

Introduction

Cost Warning

Fine-tuning jobs can incur significant costs, with training costs potentially reaching up to $100 per hour depending on the model and method used. Costs vary based on:

  • The base model being fine-tuned
  • The fine-tuning method (DPO, Supervised, or Reinforcement)
  • The size of your training dataset
  • Training duration

For the most current and detailed pricing information, refer to the OpenAI Pricing Documentation. Always monitor your usage and costs when running fine-tuning jobs.

OpenAI Fine-Tuning allows you to customize models for your specific use case by training them on your own data. Fine-tuning improves model performance on specific tasks, enables you to teach the model new behaviors, and can reduce costs by allowing you to use smaller models effectively.

Fine-Tuning Methods

Module Suite supports three fine-tuning methods:

  1. DPO (Direct Preference Optimization): Optimizes models using pairwise preference data, teaching the model to favor certain outputs over others. Ideal for subjective quality improvements like tone, style, or appropriateness.
  2. Supervised Fine-Tuning: Trains models on input-output pairs, teaching them to follow specific patterns or formats.
  3. Reinforcement Learning: Uses reward models to guide training, suitable for complex optimization scenarios.

Example: DPO Fine-Tuning

This example demonstrates how to create a DPO fine-tuning job:

Step 1: Prepare DPO Training Data

DPO requires pairs of responses where one is preferred over the other. Each sample needs: - An input (the prompt/request) - A preferred output (the better response) - A non-preferred output (the less desirable response)

// Create a DPO fine-tuning file builder
def builderFile = openai.newDpoFineTuneFileRequestBuilder("my-dpo-training-data")

// Example: Create a chat completion request as input
def userQuestion = "How is the weather in the north pole?"
def requestBuilder = openai.newChatCompletionRequestBuilder()
    .model("gpt-4o")
    .addChatMessage("user", userQuestion)
    .build()

// Create a preferred response (more desirable answer)
def preferredBuilder = openai.newChatCompletionRequestBuilder()
    .model("gpt-4o")
    .addChatMessage("assistant", "The weather at the North Pole is extreme, with very long, dark, and cold winters and constant daylight with cool summers.")
    .build()
def preferredMessage = preferredBuilder.messages[0]

// Create a non-preferred response (less desirable answer)
def nonPreferredBuilder = openai.newChatCompletionRequestBuilder()
    .model("gpt-4o")
    .addChatMessage("assistant", "The weather is hot and humid")
    .build()
def nonPreferredMessage = nonPreferredBuilder.messages[0]

// Create a DPO sample
def sample = openai.newDpoFineTuneSampleBuilder()
    .input(requestBuilder)
    .preferred(preferredMessage)
    .nonPreferred(nonPreferredMessage)
    .build()

// Add multiple samples to the file (you'll need many samples for effective training)
for (int i = 0; i < 50; i++) {
    builderFile.addSample(sample)
}

// Upload the DPO training file
def uploadedFile = openai.uploadDpoFineTuneFile(builderFile.build())
out << "File uploaded with ID: ${uploadedFile.id}"

Creates a DPO file with multiple samples. Each sample has an input, preferred response, and non-preferred response.

Training Data Requirements:

  • The minimum number of examples you can provide for fine-tuning is 10
  • We see improvements from fine-tuning on 50–100 examples, but the right number for you varies greatly and depends on the use case
  • We recommend starting with 50 well-crafted demonstrations and evaluating the results

Step 2: Create the Fine-Tuning Job

// Use the uploaded file ID
def fileId = uploadedFile.id

// Create fine-tuning job parameters with DPO method
def builder = openai.newFineTuneJobCreateParamsBuilder(
    "gpt-4.1-mini-2025-04-14",  // Base model to fine-tune
    fileId                        // Training file ID
).methodDPO()                     // Use DPO fine-tuning method

// Create the fine-tuning job
def fineTuneJob = openai.createFineTuneJob(builder.build())
out << "Fine-tuning job created with ID: ${fineTuneJob.id}"
out << "<br>Status: ${fineTuneJob.status}"

Creates a DPO fine-tuning job with the base model and training file. Jobs process asynchronously and can take significant time depending on dataset size and model complexity.

Step 3: Managing Fine-Tuning Jobs

You can monitor and manage your fine-tuning jobs:

// Use the finetune ID from the previous script
def fineTuneId = "ftjob-xxx"

// List all fine-tuning jobs
def page = openai.listFineTuneJobs()
page.data.each { job ->
    out << "<br>Job: ${job.id} - Status: ${job.status} - Model: ${job.model}"
}

// Navigate through pages
while (page.hasNextPage) {
    page = page.nextPage
    page.data.each { job ->
        out << "<br>Job: ${job.id} - Status: ${job.status}"
    }
}

// Retrieve a specific fine-tuning job
def job = openai.retrieveFineTuneJob(fineTuneId)
out << "Status: ${job.status}"
out << "<br>Trained tokens: ${job.trainedTokens}"

// ONLY reinformecent job can be paused and resumed
// Pause a fine-tuning job 
/*def pausedJob = openai.pauseFineTuneJob(job.id)
out << "<br>Job paused: ${pausedJob.status}"*/

// Resume a paused fine-tuning job
/*def resumedJob = openai.resumeFineTuneJob(job.id)
out << "<br>Job resumed: ${resumedJob.status}"*/

List jobs with pagination, retrieve job details, and pause or resume jobs. Statuses: validating_files, queued, running, succeeded, failed, cancelled, or paused.

Other Fine-Tuning Methods

In addition to DPO, you can use:

  • Supervised Fine-Tuning: Use methodSupervised() with newSupervisedFineTuneFileRequestBuilder() and uploadSupervisedFineTuneFile() - Trains on input-output pairs
  • Reinforcement Learning: Use methodReinforcement() with reward models - Uses reinforcement learning with human feedback