콘텐츠로 이동

search_metadata

Search video caption metadata.


Overview

Searches video analysis metadata based on various filters (UUID, model, config, date).

Key Use Cases:

  • Find videos processed by a specific model
  • Filter by UUID, config, or time range
  • Check the list of registered videos
  • Prepare input data for get_captions_batch

Function Signature

def search_metadata(
    feature_view: str,
    uuids: Optional[List[str]] = None,
    models: Optional[List[str]] = None,
    configs: Optional[List[str]] = None,
    time_after: Optional[str] = None,
    time_before: Optional[str] = None
) -> pd.DataFrame

Description

Searches video metadata from the metadata store using various conditions.

Common Parameter

All APIs take the feature_view parameter (omitted in some documentation).
- 'video_description': Video captions
- 'audio_transcript': ASR results
- 'caption_summary': Caption + ASR summary


Parameters

Parameter Type Default Description
uuids List[str] or None None List of video UUIDs to search
If None, all UUIDs are retrieved
models List[str] or None None List of model names to filter by
Example: ["gpt-4o", "claude-3.5"]
If None, all models are retrieved
configs List[str] or None None List of config file paths
Example: ["config_fine.yaml"]
If None, all config_sources are included
time_after str or None None Retrieve data only after this point in time (ISO format)
Example: "2024-12-01T00:00:00"
If None, no start time limit
time_before str or None None Retrieve data only before this point in time (ISO format)
Example: "2024-12-31T23:59:59"
If None, no end time limit

Returns

Type: pd.DataFrame

Columns:

  • uuid (str): Unique identifier for the video
  • model (str): Name of the caption generation model
  • config_source (str): Path to the config file
  • timestamp (datetime): Data creation timestamp
  • segment_ids (List[str]): List of segment IDs
  • segment_count (int): Total number of segments

Examples

Retrieve All

# Retrieve all metadata
metadata = search_metadata(
    feature_view='caption_summary'
)

print(f"Total videos: {len(metadata)}")
print(metadata.head())

Output:

Total videos: 140
   uuid                                    model           config_source        segment_count
0  f2c99e03-8415-4926-bf3d-60ec8c2ddab4    gpt-4o          None                 28
1  f2c99e03-8415-4926-bf3d-60ec8c2ddab4    claude-3.5      None                 30
2  1d0f4f13-f79b-448b-b176-cbcc4f38e911    vila-1.5        config_X.yaml        27


Filter by Specific Model

# Retrieve only GPT-4 results
metadata = search_metadata(
    feature_view='caption_summary',
    models=['gpt-4o']
)

print(f"GPT-4 videos: {len(metadata)}")

Output:

GPT-4 videos: 52


# GPT-4 + Claude + December data
metadata = search_metadata(
    feature_view='caption_summary',
    models=['gpt-4o', 'claude-3.5-sonnet'],
    time_after='2024-12-01',
    time_before='2024-12-31'
)

print(f"Results: {len(metadata)}")

Output:

Results: 12


Search by Specific UUIDs

# Retrieve only specific videos
metadata = search_metadata(
    feature_view='caption_summary',
    uuids=[
        'f2c99e03-8415-4926-bf3d-60ec8c2ddab4',
        '1d0f4f13-f79b-448b-b176-cbcc4f38e911'
    ]
)

print(f"Found {len(metadata)} versions")

Output:

Found 2 versions