How This Helps
Duplicate detection helps you identify redundant images or frames across your dataset. Use it to streamline cleanup, reduce storage, and improve data quality before training or export.
Prerequisites
- A dataset in
READYstatus. - A dataset ID (visible in the browser URL when viewing a dataset:
https://app.visual-layer.com/dataset/<dataset_id>/data). - A valid JWT token. See Authentication.
Find Duplicates Using VQL
The preferred approach uses the Visual Query Language (VQL) filter on the Explore endpoint. Theduplicates filter groups visually similar media into duplicate clusters.
VQL Duplicates Filter
Pass aduplicates filter in the vql array:
value field sets the similarity threshold (0.0–1.0). A value of 0.95 returns only clusters where images are at least 95% similar to each other. Lower values return more permissive groupings.
Example
Response
previews array shows representative images from the group.
Find Duplicates Using duplicate_threshold
You can also use the duplicate_threshold query parameter directly on the Explore endpoint as a simpler alternative to VQL.
| Parameter | Type | Description |
|---|---|---|
duplicate_threshold | float | Similarity cutoff (0.0–1.0). Returns only clusters containing near-duplicates at this threshold or higher. |
Filter by Uniqueness
To find the most unique images (the opposite of duplicates), use theuniqueness VQL filter.
Python Example
The following example retrieves all duplicate clusters and prints a summary.Response Codes
See Error Handling for the error response format and Python handling patterns.| HTTP Code | Meaning |
|---|---|
| 200 | Results returned successfully. |
| 401 | Unauthorized — check your JWT token. |
| 404 | Dataset not found. |
| 422 | Invalid query parameters — check VQL syntax or threshold value. |