Skip to main content

How This Helps

Creating a dataset from S3 is the recommended approach for production workflows. Point the API at your S3 bucket path and Visual Layer handles ingestion, indexing, and clustering automatically.
Use status_new for all status checks. The status field is being retired. See Retrieve Dataset Status.

Prerequisites

  • A Visual Layer Cloud account with API access.
  • A valid JWT token. See Authentication.
  • An S3 bucket containing your images or videos, accessible to Visual Layer.

Create a Dataset from S3

Send a POST request with your bucket path to create a new dataset.
POST /api/v1/dataset
Authorization: Bearer <jwt>
Content-Type: multipart/form-data

Parameters

ParameterTypeRequiredDescription
dataset_namestringYesThe display name for the new dataset.
bucket_pathstringYesS3 path to the bucket or folder containing your media files.

Example

curl -X POST \
  -H "Authorization: Bearer <jwt>" \
  -F "dataset_name=my_dataset" \
  -F "bucket_path=s3://my-bucket/images/" \
  "https://app.visual-layer.com/api/v1/dataset"

Response

{
  "dataset_id": "ad48d250-1232-11f1-bfca-fa39f6ed1f22"
}
Save the dataset_id — you need it for all subsequent operations on this dataset.
Dataset creation is asynchronous. After the initial request, poll GET /api/v1/dataset/{dataset_id} until status_new is READY before running search or export operations.

Monitor Dataset Status

Poll the dataset status endpoint to track progress.
curl -H "Authorization: Bearer <jwt>" \
  "https://app.visual-layer.com/api/v1/dataset/<dataset_id>"
The response includes a status_new field that transitions from INDEXING to READY when complete. See Retrieve Dataset Status for full status documentation.

Python Example

import requests
import time

VL_BASE_URL = "https://app.visual-layer.com"
JWT_TOKEN = "<your-jwt-token>"

headers = {"Authorization": f"Bearer {JWT_TOKEN}"}

# Step 1: Create dataset from S3
resp = requests.post(
    f"{VL_BASE_URL}/api/v1/dataset",
    headers=headers,
    data={
        "dataset_name": "my_dataset",
        "bucket_path": "s3://my-bucket/images/",
    },
)
resp.raise_for_status()
dataset_id = resp.json()["dataset_id"]
print(f"Created dataset: {dataset_id}")

# Step 2: Poll until READY
while True:
    resp = requests.get(
        f"{VL_BASE_URL}/api/v1/dataset/{dataset_id}",
        headers=headers,
    )
    resp.raise_for_status()
    data = resp.json()
    status = data.get("status_new")
    progress = data.get("progress", 0)
    print(f"  Status: {status} ({progress}%)")
    if status in ("READY", "ERROR"):
        break
    time.sleep(30)

print(f"Dataset ready: {dataset_id}")

Response Codes

See Error Handling for the error response format and Python handling patterns.
HTTP CodeMeaning
200Dataset created successfully.
400Bad Request — missing or invalid parameters.
401Unauthorized — check your JWT token.
500Internal Server Error — check that the S3 path is accessible.