register_captions¶
Register captions for a single video.
Overview¶
Manually registers caption data generated by external models (GPT-4, Claude, Gemini, etc.) into the Feature Store.
Function Signature¶
def register_captions(
feature_view: str,
uuid: str,
mp4_source: str,
model: str,
segments: List[Dict[str, Any]],
config: Optional[str] = None,
timestamp: Optional[str] = None
) -> Dict[str, Any]
Description¶
Manually registers caption data generated by external models (GPT-4, Claude, Gemini, etc.) into the Feature Store.
Notes:
- If video information does not exist, it is automatically registered in the video Feature View.
- Supports both local videos and YouTube videos.
- Data is simultaneously saved to the Feature Store and Metadata Store.
- Even if the user specifies an arbitrary config path, the configuration file is automatically saved to a specific designated path.
Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
uuid |
str |
required | Unique identifier for the video Example: "f2c99e03-8415-4926-bf3d-60ec8c2ddab4" |
mp4_source |
str |
required | Local path or YouTube URL • Local: /gpfs/public/artifacts/videos/video-001/content.mp4• YouTube: https://www.youtube.com/watch?v=VIDEO_ID |
model |
str |
required | Name of the model that generated the captions Example: "gpt-4o", "claude-3.5", "gemini-pro" |
segments |
List[Dict] |
required | List of caption segments Each segment must include 4 fields: index, start, end, text |
config |
str or None |
None |
Path to the config file |
timestamp |
str or None |
None |
Data storage timestamp (ISO format) If None, the current time is automatically used |
segments Format¶
[
{"index": 0, "start": 0.0, "end": 10.0, "text": "A man is cooking..."},
{"index": 1, "start": 10.0, "end": 20.0, "text": "He chops vegetables..."},
{"index": 2, "start": 20.0, "end": 30.0, "text": "He adds ingredients..."}
]
Returns¶
Type: Dict[str, Any]
{
"status": str, # "success" or "error"
"uuid": str, # Input uuid
"model": str, # Input model
"config": str or None, # Input config (can be None)
"timestamp": str, # Saved timestamp
"segment_count": int, # Number of segments saved
"video_registered": bool, # Whether video was automatically registered
"file_size": int, # Video file size (auto-calculated)
"duration": float # Video duration (auto-calculated)
}
Examples¶
Registering Local Video Captions¶
# Register captions - video information is automatically registered
result = register_captions(
feature_view='caption_summary',
uuid='f2c99e03-8415-4926-bf3d-60ec8c2ddab4',
mp4_source='/gpfs/public/artifacts/videos/f2c99e03-8415-4926-bf3d-60ec8c2ddab4/content.mp4',
model='openai-gpt-4o',
segments=[
{"index": 0, "start": 0.0, "end": 10.0, "text": "A man is cooking..."},
{"index": 1, "start": 10.0, "end": 20.0, "text": "He chops vegetables..."},
{"index": 2, "start": 20.0, "end": 30.0, "text": "He adds ingredients..."}
]
)
print(f"Registered {result['segment_count']} segments")
Output:
{
"status": "success",
"uuid": "f2c99e03-8415-4926-bf3d-60ec8c2ddab4",
"model": "openai-gpt-4o",
"config": None,
"timestamp": "2024-12-26T16:00:00.123456",
"segment_count": 3,
"video_registered": True, # Video automatically registered
"file_size": 13355607, # Auto-calculated
"duration": 120.5 # Auto-calculated
}
Registering YouTube Video Captions¶
result = register_captions(
feature_view='caption_summary',
uuid='youtube-abc123',
mp4_source='https://www.youtube.com/watch?v=dQw4w9WgXcQ',
model='google-gemini-2.0-flash-exp',
segments=[
{"index": 0, "start": 0.0, "end": 15.0, "text": "Introduction to the topic..."},
{"index": 1, "start": 15.0, "end": 45.0, "text": "Detailed explanation..."},
{"index": 2, "start": 45.0, "end": 60.0, "text": "Conclusion and summary..."}
]
)
Output:
{
"status": "success",
"uuid": "youtube-abc123",
"model": "google-gemini-2.0-flash-exp",
"config": None,
"timestamp": "2024-12-26T17:30:00.456789",
"segment_count": 3,
"video_registered": True,
"file_size": 8234567, # YouTube metadata
"duration": 60.0 # YouTube metadata
}
Related APIs¶
- register_captions_batch - Batch registration
- get_captions - Retrieve registered captions