search_metadata¶
Search video caption metadata.
Overview¶
Searches video analysis metadata based on various filters (UUID, model, config, date).
Key Use Cases:
- Find videos processed by a specific model
- Filter by UUID, config, or time range
- Check the list of registered videos
- Prepare input data for
get_captions_batch
Function Signature¶
def search_metadata(
feature_view: str,
uuids: Optional[List[str]] = None,
models: Optional[List[str]] = None,
configs: Optional[List[str]] = None,
time_after: Optional[str] = None,
time_before: Optional[str] = None
) -> pd.DataFrame
Description¶
Searches video metadata from the metadata store using various conditions.
Common Parameter
All APIs take the feature_view parameter (omitted in some documentation).
- 'video_description': Video captions
- 'audio_transcript': ASR results
- 'caption_summary': Caption + ASR summary
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
uuids |
List[str] or None |
None |
List of video UUIDs to search If None, all UUIDs are retrieved |
models |
List[str] or None |
None |
List of model names to filter by Example: ["gpt-4o", "claude-3.5"]If None, all models are retrieved |
configs |
List[str] or None |
None |
List of config file paths Example: ["config_fine.yaml"]If None, all config_sources are included |
time_after |
str or None |
None |
Retrieve data only after this point in time (ISO format) Example: "2024-12-01T00:00:00"If None, no start time limit |
time_before |
str or None |
None |
Retrieve data only before this point in time (ISO format) Example: "2024-12-31T23:59:59"If None, no end time limit |
Returns¶
Type: pd.DataFrame
Columns:
uuid(str): Unique identifier for the videomodel(str): Name of the caption generation modelconfig_source(str): Path to the config filetimestamp(datetime): Data creation timestampsegment_ids(List[str]): List of segment IDssegment_count(int): Total number of segments
Examples¶
Retrieve All¶
# Retrieve all metadata
metadata = search_metadata(
feature_view='caption_summary'
)
print(f"Total videos: {len(metadata)}")
print(metadata.head())
Output:
Total videos: 140
uuid model config_source segment_count
0 f2c99e03-8415-4926-bf3d-60ec8c2ddab4 gpt-4o None 28
1 f2c99e03-8415-4926-bf3d-60ec8c2ddab4 claude-3.5 None 30
2 1d0f4f13-f79b-448b-b176-cbcc4f38e911 vila-1.5 config_X.yaml 27
Filter by Specific Model¶
# Retrieve only GPT-4 results
metadata = search_metadata(
feature_view='caption_summary',
models=['gpt-4o']
)
print(f"GPT-4 videos: {len(metadata)}")
Output:
Multi-Condition Search¶
# GPT-4 + Claude + December data
metadata = search_metadata(
feature_view='caption_summary',
models=['gpt-4o', 'claude-3.5-sonnet'],
time_after='2024-12-01',
time_before='2024-12-31'
)
print(f"Results: {len(metadata)}")
Output:
Search by Specific UUIDs¶
# Retrieve only specific videos
metadata = search_metadata(
feature_view='caption_summary',
uuids=[
'f2c99e03-8415-4926-bf3d-60ec8c2ddab4',
'1d0f4f13-f79b-448b-b176-cbcc4f38e911'
]
)
print(f"Found {len(metadata)} versions")
Output:
Related APIs¶
- get_captions - Retrieve captions for a single video
- get_captions_batch - Batch caption retrieval (uses results from this API as input)
- get_all_videos - List all videos