콘텐츠로 이동

VSS Feature Store

A data management system for systematically storing and retrieving video analysis results such as video captions, ASR (speech recognition), and summaries


System Introduction

VSS-Feature Store is a video analysis result management system built on Feast.

Key Features:

  • Integrated management of video analysis results by model, configuration, and generation time
  • Data reproduction at specific points in time through Point-in-Time queries
  • Entity Discovery through metadata search
  • Support for registering results from external models (GPT-4, Claude, Gemini)

System Architecture

System Architecture

Components

Component Role Notes
VSS Framework Video analysis execution VLM(captions), ASR(speech recognition), LLM(summary)
Milvus DB Vector search system Dedicated to VSS services (RAG, Video QA)
Feature Store Data version management Point-in-Time queries, research/analysis use
Metadata Store Entity Discovery UUID/model/config/time-based search
Airflow Auto-synchronization Milvus → Feature Store (10-minute cycle)

Data Flow

Generation

  • VSS Framework receives video input and generates captions (VLM), ASR, summaries (LLM)
  • Results are stored in Milvus DB for real-time service provision

Collection

  • Automatic: Airflow synchronizes Milvus → Feature Store every 10 minutes
  • Manual: Direct registration of external model results (GPT-4, Claude, etc.) via API

Consumption

  • Researchers/engineers query data through Feature Store API
  • Model comparison, dataset curation, experiment reproduction, etc.

Why Feature Store?

Problem

Different captions are generated for the same video depending on various conditions:

  • Model: GPT-4o, Claude-3.5, Gemini, etc.
  • Configuration (Config): prompt, temperature, chunk_duration, etc.
  • Generation Time: model updates, reprocessing

Importance of Config

Even with the same model and same video, completely different results are generated depending on config settings!

Example: scene_detection: fine-grained vs coarse-grained
→ Same 120-second video generates 45 vs 15 segments

Config can be written simply and users can freely define it.
See Configuration for details

Solution

Through Feature Store:

  1. Prevent duplicate processing: Centrally manage high-cost computation results
  2. Consistent access: All researchers/engineers use the same data
  3. Experiment reproduction: Accurately reproduce past experiments with Point-in-Time queries

Quick Start

Installation

# 1. Virtual Environment Setup
uv venv
source .venv/bin/activate

# 2. Install Libraries
uv pip install -e "packages/mantis_common" -e "services/svc-vss/data/feast"

# 3. Environment Variable Setup
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
export AWS_ENDPOINT_URL_S3=http://andrew-minio-2025-12-03-svc.dev.idc.k8s:9000
export FEAST_S3_ENDPOINT_URL=http://andrew-minio-2025-12-03-svc.dev.idc.k8s:9000

# 4. Run MCP Server (Verify)
python services/svc-vss/data/feast/mcp_server.py

# 5. Function Testing
python services/svc-vss/data/feast/test_feast.py

Basic Usage

import api
from constants import FEATURE_VIEW_VIDEO_DESCRIPTION

# 0) Target Feature View
fv = FEATURE_VIEW_VIDEO_DESCRIPTION

# 1) Register New Captions (required step)
result = api.register_captions(
    feature_view=fv,
    uuid='yt-video-001',
    mp4_source='https://www.youtube.com/watch?v=wjZofJX0v4M',
    model='gpt-4o',
    segments=[
        {'index': 0, 'start': 0.0,  'end': 10.0, 'text': 'Intro segment...'},
        {'index': 1, 'start': 10.0, 'end': 20.0, 'text': 'Content segment...'},
    ],
)

# 2) Retrieve Captions (verify registration result)
captions = api.get_captions(
    feature_view=fv,
    uuid='yt-video-001',
    model='gpt-4o',
)

# 3) Search Metadata (optional)
metadata = api.search_metadata(
    feature_view=fv,
    models=['gpt-4o'],
)

See Getting Started for details


API Reference

Caption Retrieval

API Description
search_metadata Search video metadata (UUID, model, date filters)
get_captions Retrieve captions for single video
get_captions_batch Batch retrieval of captions for multiple videos

Caption Registration

API Description
register_captions Register captions for single video (GPT-4, Claude, etc.)
register_captions_batch Batch registration of captions for multiple videos (JSON file)

Video Management

API Description
get_video Retrieve single video information
get_all_videos Retrieve all video information

MCP Integration

Use Feature Store with natural language through MCP (Model Context Protocol):

"Find all GPT-4 captions registered in December"
"Show captions with coarse config for video WuFL2bJm2yo"

See MCP Guide for details


Next Steps


Reference Materials

📄 Detailed Design Document

For complete design documentation of VSS Feature Store, refer to this Notion page:

📚 Feast Learning Materials

Understanding Feast, the core technology of Feature Store:


Support