Managing Testsets
This guide covers how to create, list, and retrieve testsets using the Agenta SDK or REST API.
Creating a Testset
Use ag.testsets.acreate() to create a new testset with data:
- Python SDK
- REST API
import agenta as ag
ag.init()
# Create a testset with simple data
testset = await ag.testsets.acreate(
data=[
{"country": "Germany", "capital": "Berlin"},
{"country": "France", "capital": "Paris"},
{"country": "Spain", "capital": "Madrid"}
],
name="Country Capitals",
)
testset_id = testset.testset_id or testset.id
print(f"Testset ID: {testset_id}")
print(f"Revision ID: {testset.id}")
print(f"Name: {testset.name}")
print(f"Slug: {testset.slug}")
curl -X POST "https://cloud.agenta.ai/api/preview/simple/testsets/" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey YOUR_API_KEY" \
-d '{
"testset": {
"name": "Country Capitals",
"slug": "country-capitals",
"data": {
"testcases": [
{"data": {"country": "Germany", "capital": "Berlin"}},
{"data": {"country": "France", "capital": "Paris"}},
{"data": {"country": "Spain", "capital": "Madrid"}}
]
}
}
}'
Parameters:
data: A list of dictionaries containing your test data. Each dictionary represents one testcase.name: The name of your testset.
Returns: A TestsetRevision object containing:
id: The UUID of the created testset revisiontestset_id: The parent testset UUID (stable across revisions)name: The testset nameslug: The revision slugversion: The revision version string (e.g. "1")data: The test data (withtestcasesstructure)
Sample Output:
{
"id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"name": "Country Capitals",
"slug": "3ad5d688da6c",
"data": {
"testcases": [
{"data": {"country": "Germany", "capital": "Berlin"}},
{"data": {"country": "France", "capital": "Paris"}},
{"data": {"country": "Spain", "capital": "Madrid"}}
]
}
}
The data parameter accepts a simple list of dictionaries. The SDK automatically converts this to the structured TestsetRevisionData format internally.
Upserting a Testset
Use ag.testsets.aupsert() to create a testset or replace an existing one with the same name.
The function first searches for a testset matching the provided name (or testset_id if given). If it finds one, it replaces all testcases with your new data and creates a new revision. If no match exists, it creates a new testset.
Each update creates a new revision while keeping the same testset_id. This allows you to track changes over time and reference specific versions of your test data.
Upsert performs a full replacement. All existing testcases are removed and replaced with the data you provide. The operation does not merge or append testcases.
- Python SDK
- REST API
import agenta as ag
ag.init()
# First call creates a testset with 2 testcases
testset = await ag.testsets.aupsert(
name="Country Capitals",
data=[
{"country": "Germany", "capital": "Berlin"},
{"country": "France", "capital": "Paris"},
],
)
# Second call replaces all testcases with these 3
# France is removed because it's not in the new data
testset = await ag.testsets.aupsert(
name="Country Capitals",
data=[
{"country": "Germany", "capital": "Berlin"},
{"country": "Spain", "capital": "Madrid"},
{"country": "Italy", "capital": "Rome"},
],
)
# Result: testset now contains Germany, Spain, Italy (not France)
Use the PUT endpoint with the testset ID to replace all testcases:
curl -X PUT "https://cloud.agenta.ai/api/preview/simple/testsets/{testset_id}" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey YOUR_API_KEY" \
-d '{
"testset": {
"id": "YOUR_TESTSET_ID",
"name": "Country Capitals",
"data": {
"testcases": [
{"data": {"country": "Germany", "capital": "Berlin"}},
{"data": {"country": "Spain", "capital": "Madrid"}},
{"data": {"country": "Italy", "capital": "Rome"}}
]
}
}
}'
Parameters:
name: The testset name. Used to find an existing testset whentestset_idis not provided.data(required): The testcases that will replace all existing data.testset_id(optional): Updates this specific testset directly, skipping the name lookup.
Returns: A TestsetRevision object containing the created or updated testset.
Use aupsert() when you want to keep a testset synchronized with your data source. This works well in CI/CD pipelines where you regenerate test data on each run. Use acreate() when you need a new testset every time.
Listing Testsets
To list all testsets in your project, use ag.testsets.alist():
- Python SDK
- REST API
import agenta as ag
ag.init()
# List all testsets
testsets = await ag.testsets.alist()
print(f"Found {len(testsets)} testsets:")
for testset in testsets:
testset_id = testset.testset_id or testset.id
print(f" - {testset.name} (testset_id: {testset_id})")
curl -X POST "https://cloud.agenta.ai/api/preview/simple/testsets/query" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey YOUR_API_KEY" \
-d '{}'
Parameters: None required.
Returns: A list of TestsetRevision objects. For each item:
id: The latest revision UUIDtestset_id: The parent testset UUIDname: The testset nameslug: The revision slug- Additional metadata fields
Sample Output:
[
{
"id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"name": "Country Capitals",
"slug": "country-capitals"
},
{
"id": "01963520-4e4a-8761-91df-4be6e799eb7d",
"name": "Math Problems",
"slug": "math-problems"
}
]
Retrieving a Testset
Use ag.testsets.aretrieve() to fetch a testset. You can retrieve either the latest revision or a specific historical revision.
Retrieving the Latest Revision
Pass the testset_id to get the most recent version of a testset:
- Python SDK
- REST API
import agenta as ag
ag.init()
# Retrieve the latest revision
testset = await ag.testsets.aretrieve(testset_id=testset_id)
if testset:
print(f"Testcases: {len(testset.data.testcases)}")
curl -X GET "https://cloud.agenta.ai/api/preview/simple/testsets/{testset_id}" \
-H "Authorization: ApiKey YOUR_API_KEY"
Retrieving a Specific Revision
Pass the testset_revision_id to get an exact historical version. This is useful when you need to reproduce an evaluation or compare different versions of your test data.
- Python SDK
- REST API
import agenta as ag
ag.init()
# Retrieve a specific revision
testset = await ag.testsets.aretrieve(testset_revision_id=revision_id)
if testset:
print(f"Version: {testset.version}")
print(f"Testcases: {len(testset.data.testcases)}")
curl -X POST "https://cloud.agenta.ai/api/preview/testsets/revisions/retrieve" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey YOUR_API_KEY" \
-d '{
"testset_revision_ref": {
"id": "YOUR_REVISION_ID"
}
}'
Parameters:
testset_id: Retrieves the latest revision of this testset.testset_revision_id: Retrieves this exact revision.
Returns: A TestsetRevision object containing:
id: The revision UUIDtestset_id: The parent testset UUID (stable across revisions)version: The revision version numberdata: The testcases for this revision
Each update creates a new revision. The testset_id stays the same, but the revision_id changes. Store revision IDs when you need to reference exact versions later (for example, when logging which test data was used in an evaluation).
Retrieving a Testset by Name
You can find a testset by name by filtering the results from the query endpoint:
- Python SDK
- REST API
import agenta as ag
ag.init()
async def get_testset_by_name(name: str):
"""Helper function to find a testset by name."""
testsets = await ag.testsets.alist()
if not testsets:
return None
for testset in testsets:
if testset.name == name:
return testset
return None
# Usage
testset = await get_testset_by_name("Country Capitals")
if testset:
testset_id = testset.testset_id or testset.id
print(f"Found testset: {testset.name} (testset_id: {testset_id}, revision_id: {testset.id})")
else:
print("Testset not found")
curl -X POST "https://cloud.agenta.ai/api/preview/simple/testsets/query" \
-H "Content-Type: application/json" \
-H "Authorization: ApiKey YOUR_API_KEY" \
-d '{
"testset": {
"name": "Country Capitals"
}
}'
This pattern shows how you can implement your own helper functions to filter and find testsets based on custom criteria. You can extend this to search by tags or other metadata fields.
Working with Test Data
Once you have a testset, you can access the testcases within it:
- Python SDK
- REST API
import agenta as ag
ag.init()
# Retrieve a testset
testset = await ag.testsets.aretrieve(testset_id=testset_id)
# Access testcases
if testset and testset.data and testset.data.testcases:
for testcase in testset.data.testcases:
print(f"Testcase: {testcase.data}")
# Use testcase.data in your evaluation
When you retrieve a testset via the API, the response includes the testcases in the data.testcases array:
{
"testset": {
"id": "01963413-3d39-7650-80ce-3ad5d688da6c",
"name": "Country Capitals",
"data": {
"testcases": [
{
"id": "bf2de79d-bcd0-569e-92aa-735bbdd0b447",
"data": {"country": "Germany", "capital": "Berlin"}
},
{
"id": "f54345c8-939c-5d03-9950-b62b876b10bd",
"data": {"country": "France", "capital": "Paris"}
}
]
}
}
}
Each testcase contains a data field with the dictionary you provided during creation. You can use these testcases directly in your evaluations.
