OpenAI APIs
OpenAI Batch Processing¶
Introduction¶
OpenAI Batch Processing allows you to submit multiple API requests for asynchronous processing at 50% of the cost of standard requests. All requests in a batch must be of the same type (all embeddings, all chat completions, etc.) and are organized in a JSONL (JSON Lines) file format. Requests are processed when resources are available, guaranteed within 24 hours.
How Does Batch Processing Work?¶
The batch processing workflow follows these key steps:
- Create and upload the batch file to OpenAI
- Create a batch job from the uploaded file
- Monitor the batch status
- Retrieve results once processing is complete
Examples¶
Example 1¶
Create multiple embedding tasks and upload them as a batch file. All tasks must be of the same type:
// List of texts to create embeddings for
def texts = [
"Hello world",
"Machine learning is fascinating",
"Natural language processing enables AI",
"Batch processing saves costs",
"OpenAI provides powerful AI models"
]
// Build the batch file with all tasks
// All tasks must be of the same type (all embeddings in this case)
def batchFile = openai.newBatchFileRequestBuilder("embeddings_batch")
// Create embedding requests and tasks using a closure
texts.eachWithIndex { text, i ->
def embeddingRequest = openai.newEmbeddingRequestBuilder()
.model("text-embedding-3-small")
.input(text)
.user("test-user")
.build()
def task = openai.newBatchTaskForEmbeddingsBuilder("embedding_${String.format('%03d', i + 1)}")
.task(embeddingRequest)
.build()
batchFile.addTask(task)
}
// Build and upload the batch file
def uploadedFile = openai.uploadBatchFile(batchFile.build())
out << "File uploaded with ID: ${uploadedFile.id}"
This example shows how to create multiple embedding requests and tasks in a loop, then combine them into a single batch file. Each task is assigned a unique custom ID that you can use to track and identify individual requests in the batch results.
Example 2¶
After uploading a batch file, create a batch job to start processing:
// Use the file ID from the uploaded batch file
def fileId = "file-xxx"
// Create batch parameters for embeddings endpoint
def builder = openai.newBatchCreateParamsForEmbeddingsBuilder(fileId)
// Create the batch job
def batch = openai.createBatch(builder.build())
out << "Batch created with ID: ${batch.id}"
out << "<br>Status: ${batch.status}"
The batch processes asynchronously. Check status later.
Example 3¶
Once a batch job is created, you can check its status and retrieve results:
// Use the batch ID from the previous script
def batchId = "batch_xxx"
def batch = openai.retrieveBatch(batchId)
out << "Batch ID: ${batch.id}"
out << "<br>Status: ${batch.status}"
out << "<br>Created at: ${batch.createdAt}"
out << "<br>Completed at: ${batch.completedAt}"
// Check if batch is completed
if (batch.status == "completed") {
// Retrieve batch file results
def results = openai.retrieveBatchFileResult(batch)
// Display batch results
out << "<br><h3>Batch Results:</h3>"
// Process results
results.taskResults.each { result ->
def embeddingResult = result.response
out << "<br><strong>Task ID:</strong> ${result.customId}"
out << "<br> <strong>Model:</strong> ${embeddingResult.model}"
out << "<br> <strong>Embeddings:</strong>"
embeddingResult.data.eachWithIndex { embedding, index ->
def vector = embedding.embedding
out << "<br> [${index}] Dimension: ${vector.size()}, Sample: ${vector.take(5)}"
}
}
}
This example shows how to retrieve batch status and results. Batch status can be: validating, in_progress, finalizing, completed, expired, cancelling, or cancelled. Once completed, you can access the output file and process individual task results using their custom IDs.
Example 4¶
List all batch jobs with pagination:
// List batches with a limit
def page = openai.listBatches(openai.newBatchListParamsBuilder().limit(5).build())
// Display current page results
out << "Batches on current page:"
page.data.each { batch ->
out << "<br> - ${batch.id}: ${batch.status}"
}
// Navigate through all pages
while (page.hasNextPage) {
page = page.nextPage
out << "<br>Batches on next page:"
page.data.each { batch ->
out << "<br> - ${batch.id}: ${batch.status}"
}
}
Use pagination to navigate through all batch jobs and monitor their statuses.
Supported Request Types¶
Batch processing supports these request types:
- Chat Completions: Use
newBatchTaskForChatCompletionsBuilder()for chat-based interactions - Completions: Use
newBatchTaskForCompletionsBuilder()for text completion requests - Embeddings: Use
newBatchTaskForEmbeddingsBuilder()for embedding generation - Responses: Use
newBatchTaskForResponsesBuilder()for response API requests
Batch Status Lifecycle¶
A batch job goes through several status stages:
validating: The batch file is being validatedin_progress: The batch is being processedfinalizing: The batch is completing and results are being preparedcompleted: The batch has finished successfullyexpired: The batch expired before completioncancelling: The batch is being cancelledcancelled: The batch was cancelled
Best Practices¶
- Use Custom IDs: Always assign meaningful custom IDs to your tasks to easily identify results
- File Organization: Keep track of uploaded file IDs and batch IDs for result retrieval
- Same Task Type: Ensure all tasks in a batch file are of the same type (all embeddings, all chat completions, etc.)
When to Use Batch Processing
Use batch processing when you have many requests to process, immediate results are not required, and you want to reduce API costs. Ideal for processing large document collections or generating embeddings for knowledge bases.
Completion Window
Batch requests are guaranteed to complete within 24 hours. Ensure your application can handle this delay.
OpenAI Evaluations (Evals)¶
Introduction¶
OpenAI Evaluations (Evals) let you test and measure AI model performance against specific criteria. Use them to validate accuracy, compare models, and ensure quality.
What are Evals Used For?¶
Evals are used to:
- Measure Model Performance: Evaluate how well a model performs on specific tasks or datasets
- Compare Models: Test different models against the same criteria to determine which performs better
- Quality Assurance: Ensure models meet quality standards before deployment
- Continuous Monitoring: Track model performance over time and detect degradation
- A/B Testing: Compare different model configurations or prompts
How Do Evals Work?¶
The process has two steps:
- Eval Creation: Define the evaluation framework with data schema, testing criteria (graders), and metadata
- Eval Run Creation: Execute the evaluation with a test data file and model specification
You can reuse the eval definition for multiple runs to test different models or datasets.
Testing Criteria (Graders)¶
Evals support different types of graders to evaluate model outputs:
- String Check Grader: Compares model output to a reference string using operations like
eq(equals),contains,starts_with, orends_with - Text Similarity Grader: Measures semantic similarity using metrics like cosine similarity or Jaccard similarity
- Python Grader: Custom Python code for complex evaluation logic
- Label Model Grader: Uses another AI model to classify or label the output
- Score Model Grader: Uses an AI model to score the output on a scale
Example: Creating and Running an Eval with String Check Grader¶
This example demonstrates the complete workflow: creating an eval and then running it.
Step 1: Create the Eval¶
// Define the data schema for your test cases
def itemSchema = [
"type": "object",
"properties": [
"ticket_text": ["type": "string"],
"correct_label": ["type": "string"]
],
"required": ["ticket_text", "correct_label"]
]
// Create the eval with a String check grader
def evalParams = openai.newEvalCreateParamsBuilder()
.name("Ticket Classification Evaluation")
.dataSourceConfigOfCustom(itemSchema)
.addStringCheckGraderTestingCriterion(
"Match output to human label",
'{{ sample.output_text }}',
'{{ item.correct_label }}',
"eq"
)
.build()
// Create the eval
def eval = openai.createEval(evalParams)
out << "Eval created with ID: ${eval.id}"
This example creates an eval that:
- Defines a data schema with
ticket_text(the test case) andcorrect_label(the expected answer) - Uses a String Check Grader to compare the model's output (
{{ sample.output_text }}) with the correct label ({{ item.correct_label }}) - The
eqoperation checks for exact equality
The {{ sample.output_text }} and {{ item.correct_label }} syntax uses JSONPath to reference fields from the evaluation data.
Step 2: Prepare the Eval Data File¶
Before creating an eval run, you need to prepare a JSONL (JSON Lines) file with your test data. Each line must be a valid JSON object that matches your data schema. The file should be structured as follows:
{ "item": { "ticket_text": "My monitor won't turn on!", "correct_label": "Hardware" } }
{ "item": { "ticket_text": "I'm in vim and I can't quit!", "correct_label": "Software" } }
{ "item": { "ticket_text": "Best restaurants in Cleveland?", "correct_label": "Other" } }
Each line contains an item object with the fields defined in your schema (ticket_text and correct_label in this example).
Step 3: Create an Eval Run¶
// Use the eval ID from the previous script
def evalId = "eval-xxx"
// First, upload your test data file (JSONL format)
def dataFile = docman.getNodeByPath("TestData:eval_data.jsonl").content
def uploadedFile = openai.uploadFile(
openai.newFileCreateParamsBuilder("evals", dataFile).build()
)
// Define the input messages for the model
def inputMessages = [
openai.newEvalInputMessage(
"developer",
"You are an expert in categorizing IT support tickets. Given the support ticket below, categorize the request into one of Hardware, Software, or Other. Respond with only one of those words."
),
openai.newEvalInputMessage(
"user",
'{{ item.ticket_text }}' // Reference to the ticket_text field from your data
)
]
// Create eval run parameters for responses
def runParams = openai.newEvalRunCreateParamsForResponsesBuilder(
"test_run_001",
uploadedFile.id,
inputMessages,
"gpt-4o"
).build()
// Create and run the eval
def evalRun = openai.createEvalRun(evalId, runParams)
out << "Eval run created with ID: ${evalRun.id}"
out << "<br>Status: ${evalRun.status}"
This example:
- Uploads a JSONL file containing test cases in the required format (each line is a JSON object with an
itemcontainingticket_textandcorrect_labelfields) - Defines the prompt messages that will be sent to the model, using
{{ item.ticket_text }}to reference the input field - Creates an eval run that tests the
gpt-4omodel against the eval criteria - The model's responses will be automatically graded using the String Check Grader defined in the eval, comparing
{{ sample.output_text }}with{{ item.correct_label }}
You can retrieve the eval run results later to see how the model performed.
Step 4: Managing Eval Runs¶
Once you've created eval runs, you can manage them through various operations:
def evalId = "eval_xxx"
def runId = "evalrun_xxx"
// List all runs for an eval
def page = openai.listEvalRuns(evalId)
page.data.each { run ->
out << "<br>Run: ${run.id} - Status: ${run.status}"
}
// Retrieve a specific eval run
def run = openai.retrieveEvalRun(
runId,
openai.newEvalRunRetrieveParamsBuilder(evalId).build()
)
out << "Run status: ${run.status}"
// Cancel an eval run
openai.cancelEvalRun(runId, evalId)
out << "<br>Eval run cancelled"
// Delete an eval run
openai.deleteEvalRun(runId, evalId)
out << "<br>Eval run deleted"
This example demonstrates how to:
- List all runs associated with an eval
- Retrieve details about a specific run, including its status and progress
- Cancel a specific eval run
- Delete an eval run
Eval runs can have different statuses such as queued, in_progress, completed, cancelled, or failed. Use these operations to monitor and control your evaluation runs.
Managing Evals¶
You can manage your evals through various operations:
// Use the eval ID from the previous script
def evalId = "eval-xxx"
// List all evals
def page = openai.listEvals(openai.newEvalListParamsBuilder().limit(10).build())
page.data.each { eval ->
out << "<br>Eval: ${eval.name} (ID: ${eval.id})"
}
// Retrieve a specific eval
def eval = openai.retrieveEval(evalId)
// Update an eval
out << openai.updateEval(evalId, openai.newEvalUpdateParamsBuilder().name("Updated Name").build())
// Delete an eval
out << openai.deleteEval(eval.id)
Other Grader Types¶
In addition to String Check Grader, you can use:
- Text Similarity Grader:
addTextSimilarityTestingCriterion()- Measures semantic similarity using cosine or Jaccard metrics - Python Grader:
addPythonGraderTestingCriterion()- Custom Python code for complex evaluation logic - Label Model Grader:
addLabelModelTestingCriterion()- Uses another model to classify outputs - Score Model Grader:
addTestingCriterion()- Uses a model to score outputs on a scale
Each grader type is suited for different evaluation scenarios. Choose the one that best matches your evaluation needs.
OpenAI Fine-Tuning¶
Introduction¶
Cost Warning
Fine-tuning jobs can incur significant costs, with training costs potentially reaching up to $100 per hour depending on the model and method used. Costs vary based on:
- The base model being fine-tuned
- The fine-tuning method (DPO, Supervised, or Reinforcement)
- The size of your training dataset
- Training duration
For the most current and detailed pricing information, refer to the OpenAI Pricing Documentation. Always monitor your usage and costs when running fine-tuning jobs.
OpenAI Fine-Tuning allows you to customize models for your specific use case by training them on your own data. Fine-tuning improves model performance on specific tasks, enables you to teach the model new behaviors, and can reduce costs by allowing you to use smaller models effectively.
Fine-Tuning Methods¶
Module Suite supports three fine-tuning methods:
- DPO (Direct Preference Optimization): Optimizes models using pairwise preference data, teaching the model to favor certain outputs over others. Ideal for subjective quality improvements like tone, style, or appropriateness.
- Supervised Fine-Tuning: Trains models on input-output pairs, teaching them to follow specific patterns or formats.
- Reinforcement Learning: Uses reward models to guide training, suitable for complex optimization scenarios.
Example: DPO Fine-Tuning¶
This example demonstrates how to create a DPO fine-tuning job:
Step 1: Prepare DPO Training Data¶
DPO requires pairs of responses where one is preferred over the other. Each sample needs: - An input (the prompt/request) - A preferred output (the better response) - A non-preferred output (the less desirable response)
// Create a DPO fine-tuning file builder
def builderFile = openai.newDpoFineTuneFileRequestBuilder("my-dpo-training-data")
// Example: Create a chat completion request as input
def userQuestion = "How is the weather in the north pole?"
def requestBuilder = openai.newChatCompletionRequestBuilder()
.model("gpt-4o")
.addChatMessage("user", userQuestion)
.build()
// Create a preferred response (more desirable answer)
def preferredBuilder = openai.newChatCompletionRequestBuilder()
.model("gpt-4o")
.addChatMessage("assistant", "The weather at the North Pole is extreme, with very long, dark, and cold winters and constant daylight with cool summers.")
.build()
def preferredMessage = preferredBuilder.messages[0]
// Create a non-preferred response (less desirable answer)
def nonPreferredBuilder = openai.newChatCompletionRequestBuilder()
.model("gpt-4o")
.addChatMessage("assistant", "The weather is hot and humid")
.build()
def nonPreferredMessage = nonPreferredBuilder.messages[0]
// Create a DPO sample
def sample = openai.newDpoFineTuneSampleBuilder()
.input(requestBuilder)
.preferred(preferredMessage)
.nonPreferred(nonPreferredMessage)
.build()
// Add multiple samples to the file (you'll need many samples for effective training)
for (int i = 0; i < 50; i++) {
builderFile.addSample(sample)
}
// Upload the DPO training file
def uploadedFile = openai.uploadDpoFineTuneFile(builderFile.build())
out << "File uploaded with ID: ${uploadedFile.id}"
Creates a DPO file with multiple samples. Each sample has an input, preferred response, and non-preferred response.
Training Data Requirements:
- The minimum number of examples you can provide for fine-tuning is 10
- We see improvements from fine-tuning on 50–100 examples, but the right number for you varies greatly and depends on the use case
- We recommend starting with 50 well-crafted demonstrations and evaluating the results
Step 2: Create the Fine-Tuning Job¶
// Use the uploaded file ID
def fileId = uploadedFile.id
// Create fine-tuning job parameters with DPO method
def builder = openai.newFineTuneJobCreateParamsBuilder(
"gpt-4.1-mini-2025-04-14", // Base model to fine-tune
fileId // Training file ID
).methodDPO() // Use DPO fine-tuning method
// Create the fine-tuning job
def fineTuneJob = openai.createFineTuneJob(builder.build())
out << "Fine-tuning job created with ID: ${fineTuneJob.id}"
out << "<br>Status: ${fineTuneJob.status}"
Creates a DPO fine-tuning job with the base model and training file. Jobs process asynchronously and can take significant time depending on dataset size and model complexity.
Step 3: Managing Fine-Tuning Jobs¶
You can monitor and manage your fine-tuning jobs:
// Use the finetune ID from the previous script
def fineTuneId = "ftjob-xxx"
// List all fine-tuning jobs
def page = openai.listFineTuneJobs()
page.data.each { job ->
out << "<br>Job: ${job.id} - Status: ${job.status} - Model: ${job.model}"
}
// Navigate through pages
while (page.hasNextPage) {
page = page.nextPage
page.data.each { job ->
out << "<br>Job: ${job.id} - Status: ${job.status}"
}
}
// Retrieve a specific fine-tuning job
def job = openai.retrieveFineTuneJob(fineTuneId)
out << "Status: ${job.status}"
out << "<br>Trained tokens: ${job.trainedTokens}"
// ONLY reinformecent job can be paused and resumed
// Pause a fine-tuning job
/*def pausedJob = openai.pauseFineTuneJob(job.id)
out << "<br>Job paused: ${pausedJob.status}"*/
// Resume a paused fine-tuning job
/*def resumedJob = openai.resumeFineTuneJob(job.id)
out << "<br>Job resumed: ${resumedJob.status}"*/
List jobs with pagination, retrieve job details, and pause or resume jobs. Statuses: validating_files, queued, running, succeeded, failed, cancelled, or paused.
Other Fine-Tuning Methods¶
In addition to DPO, you can use:
- Supervised Fine-Tuning: Use
methodSupervised()withnewSupervisedFineTuneFileRequestBuilder()anduploadSupervisedFineTuneFile()- Trains on input-output pairs - Reinforcement Learning: Use
methodReinforcement()with reward models - Uses reinforcement learning with human feedback