콘텐츠로 이동

Architecture

VSS Feature Store system structure


Overall Architecture

System Architecture

Components

Component Role Notes
VSS Framework Video analysis execution VLM(captions), ASR(speech recognition), LLM(summary)
Milvus DB Vector search system Dedicated to VSS services (RAG, Video QA)
Feature Store Data version management and storage Point-in-Time queries, reference repository for research/analysis
Metadata Store Entity Discovery support UUID/model/config/time-based data search index
Airflow Auto-synchronization pipeline Milvus → Feature Store (10-minute cycle)

ER Diagram

ER Diagram

Video (Video Information)

Field Type Description
uuid string (PK) Video unique identifier
file_size int File size (bytes)
duration float Video length (seconds)
mp4_source string File path or YouTube URL

Video Description / Audio Transcript / Caption Summary

Field Type Description
uuid string (FK) Video UUID
segment_id string (PK) Segment ID
model string Model name
index int Segment order
start float Start time (seconds)
end float End time (seconds)
text string Caption/ASR text
config_source string Config file path

Relationship: VIDEO (1) ──< (N) VIDEO_DESCRIPTION/AUDIO_TRANSCRIPT/CAPTION_SUMMARY


Data Flow

Generation

Video Input
VSS Framework Processing
    ├─→ VLM Captioning
    ├─→ ASR Speech Recognition
    └─→ LLM Summary
Milvus DB Storage

Purpose: Real-time services (RAG, Video QA)


Collection

Auto-synchronization (Airflow - 10-minute cycle):

Query completed videos from Milvus
Copy to Feature Store
Update Metadata Store

Manual Registration (API):

Run external models (GPT-4, Claude)
Generate JSON file
Call register_captions_batch()
Save to Feature Store + Update Metadata

Consumption

User Request
Call search_metadata()
Search Metadata Store
Obtain Segment IDs
Query data from Feature Store
Return Results (DataFrame)

Directory Structure

/gpfs/public/artifacts/feature_store/vss_feature_store
├── data/                          # Actual feature data (date-partitioned)
│   ├── video/                     # Unique/static video information
│   │   ├── 2024-12-01.parquet     # Video metadata created/collected on 2024-12-01
│   │   ├── 2024-12-02.parquet
│   │   └── 2024-12-03.parquet
│   │
│   ├── video_description/         # Video-based text descriptions (LLM/caption)
│   │   ├── 2024-12-01.parquet     # Video descriptions generated on this date
│   │   ├── 2024-12-02.parquet
│   │   └── 2024-12-03.parquet
│   │
│   ├── audio_transcript/          # Speech recognition results (ASR transcript)
│   │   ├── 2024-12-01.parquet     # Transcript results by date
│   │   ├── 2024-12-02.parquet
│   │   └── 2024-12-03.parquet
│   │
│   └── caption_summary/           # Summary results based on caption/transcript
│       ├── 2024-12-01.parquet
│       ├── 2024-12-02.parquet
│       └── 2024-12-03.parquet
├── segment_index/                 # Segment-level metadata (for reference)
│   ├── meta_video_description.parquet   # video_description segment mapping
│   ├── meta_audio_transcript.parquet    # audio_transcript segment mapping
│   └── meta_caption_summary.parquet     # caption_summary segment mapping
│   # ※ Not actual features, but for chunk/segment ↔ original ID connection
└── config/                        # Model and pipeline configuration files
    ├── gemini_coarse.yaml         # Coarse Gemini configuration
    ├── gemini_fine.yaml           # Fine Gemini configuration
    └── custom_config.yaml         # Custom configuration

Config File Management

Config files are stored in /gpfs/public/artifacts/feature_store/vss_feature_store/config/,
and only the filename is stored in the config_source field (e.g., gemini_fine.yaml)

See Configuration for details


Point-in-Time Queries

Managing multiple versions of the same video:

Time Model Segment Count Notes
2024-12-01 GPT-4o 28 Initial processing
2024-12-15 GPT-4o 30 Prompt improvement
2025-01-10 GPT-4o 32 Model update
# Query at specific time
captions_v1 = api.get_captions(
    uuid='...',
    model='gpt-4o',
    timestamp='2024-12-01T10:00:00'
)

# Query latest version (omit timestamp)
captions_latest = api.get_captions(
    uuid='...',
    model='gpt-4o'
)

Security

Access Control: GPFS filesystem permission management

Data Protection:

  • Immutable: Cannot be modified
  • Append-only: Append only
  • Version History: All history preserved