콘텐츠로 이동

Configuration

Caption generation settings management


Overview

Even with the same model and same video, completely different results are generated depending on config settings.

Factors affected by Config:

  • Engine/model to use
  • Scene detection method (fine-grained vs coarse-grained)
  • Summarization settings
  • Other custom parameters

Config Flexibility

Config files can be freely defined by users.
Complex parameters are not required; you can simply write only core settings.


VSS Framework Configuration Example

When using the VSS framework, detailed settings like below are possible. However, you don't need to configure all items in a complex way; you can selectively use only the items you need.

Config Description
prompt Default prompt to use for caption generation
caption_summarization_prompt Prompt used when creating caption summaries from segment-by-segment caption/ASR results
max_tokens Maximum token count for captions (1-1024)
temperature Sampling temperature for captions (0-1)
top_p Top-p sampling mass for captions (0-1)
top_k Top-k candidate token count for captions (1-1000)
summarize_top_p Top-p in summarization stage (0-1)
summarize_temperature Temperature in summarization stage (0-1)
summarize_max_tokens Maximum token count in summarization stage
enable_audio Whether to enable audio stream ASR
enable_reasoning Whether to enable reasoning mode
chunk_duration Chunking video in N-second units
chunk_overlap_duration Overlapping duration between chunks (seconds)
summarize_batch_size Batch size to input at once to summary LLM (caption_summary unit)
segment_source Segment generation criteria (start/end specified externally)

[!TIP] Like the Gemini example, it can actually be configured very simply in practice.


Config File Management

Storage Location

/gpfs/public/artifacts/feature_store/vss_feature_store/config/
├── gemini_coarse.yaml
├── gemini_fine.yaml
└── custom_config.yaml

Feature Store Integration

When calling register_captions, input the absolute path with the config parameter. The system automatically copies the file to the management path (/gpfs/public/artifacts/feature_store/vss_feature_store/config), and only the filename is stored in Feature Store.

# 1. During registration: Input absolute path (automatic copy occurs)
api.register_captions(
    uuid='...',
    model='gemini-2.5-pro',
    config='/gpfs/public/my_configs/gemini_fine.yaml',  # ← Input absolute path
    ...
)

# 2. During retrieval: Query with saved filename
captions = api.get_captions(
    uuid='...',
    model='gemini-2.5-pro',
    config='gemini_fine.yaml'  # ← Query with filename only
)

Auto-copy Mechanism

If the file at the input absolute path doesn't exist in the management directory, it's automatically copied. If it already exists, the existing file is used. Through this, all configuration files are systematically managed centrally.


Config Examples

gemini_fine.yaml (Fine-grained)

Fine-grained scene detection

main_engine:
  name: gemini

scene_detection:
  type: fine-grained
  method: gemini-2.5-pro

summarization:
  llm: gemini-2.5-pro

Characteristics: - Fine-grained scene detection - More segments generated - Detailed analysis


gemini_coarse.yaml (Coarse-grained)

Broad range scene detection

main_engine:
  name: gemini

scene_detection:
  type: coarse-grained
  method: gemini-2.5-pro

summarization:
  llm: gemini-2.5-pro

Characteristics: - Coarse-grained scene detection - Fewer segments - High-level summary


gpt4_custom.yaml (Custom Configuration)

Custom configuration example

main_engine:
  name: openai
  model: gpt-4o

scene_detection:
  type: fine-grained
  fps: 1

processing:
  max_segments: 50
  overlap: true

output:
  format: detailed

Characteristics: - Users define only needed parameters - Can be written simply or complexly


Config Comparison Example

When processing with the same video(uuid=abc123), same model(gemini-2.5-pro):

Config Scene Detection Segment Count Characteristics
gemini_fine fine-grained 45 Detailed analysis
gemini_coarse coarse-grained 15 High-level summary

Same Time, Different Results

# 2 versions registered at 2024-12-01 10:00:00
metadata = api.search_metadata(
    uuid='abc123',
    model='gemini-2.5-pro',
    time_after='2024-12-01T09:00:00',
    time_before='2024-12-01T11:00:00'
)

# Output:
#   config                segment_count
#   gemini_fine.yaml      45
#   gemini_coarse.yaml    15

Config Selection Guide

Fine-grained

Use Cases: - Detailed behavior analysis - Frame-by-frame change tracking - Educational content

Advantages: High precision
Disadvantages: Many segments, long processing time


Coarse-grained

Use Cases: - Entire video summary - Quick preview - Scene transition-focused analysis

Advantages: Fast processing, few segments
Disadvantages: Possible loss of detailed information


Config Registration and Retrieval

During Registration

# Register with Fine config
result = api.register_captions(
    feature_view='caption_summary',
    uuid='abc123',
    model='gemini-2.5-pro',
    config='gemini_fine.yaml',  # ← Specify
    segments=[...]
)

# Register with Coarse config (same video)
result = api.register_captions(
    feature_view='caption_summary',
    uuid='abc123',
    model='gemini-2.5-pro',
    config='gemini_coarse.yaml',  # ← Different config
    segments=[...]
)

During Retrieval

# Retrieve Fine version
captions_fine = api.get_captions(
    feature_view='caption_summary',
    uuid='abc123',
    model='gemini-2.5-pro',
    config='gemini_fine.yaml'
)

# Retrieve Coarse version
captions_coarse = api.get_captions(
    feature_view='caption_summary',
    uuid='abc123',
    model='gemini-2.5-pro',
    config='gemini_coarse.yaml'
)

# Compare
print(f"Fine segments: {len(captions_fine)}")      # 45
print(f"Coarse segments: {len(captions_coarse)}")  # 15

Best Practices

1. Clear Naming

# Good examples
gemini_fine_v2.yaml
gpt4_coarse_2024-12.yaml

# Bad examples
config1.yaml
test.yaml

2. Config Version Control

# Manage config files with Git
cd /gpfs/public/artifacts/feature_store/vss_feature_store/config/
git init
git add *.yaml
git commit -m "Initial configs"

3. Add Comments

# Gemini Fine-grained Config
# Created: 2024-12-01
# Purpose: Detailed scene-by-scene analysis

main_engine:
  name: gemini
  # Fine-grained setting for detailed analysis

scene_detection:
  type: fine-grained
  method: gemini-2.5-pro

4. Config Documentation

Manage descriptions of each config file in a separate README:

config/
├── gemini_fine.yaml
├── gemini_coarse.yaml
└── README.md  # ← Description and usage guide for each config

Key Points

Important

  • Config is freely defined by users
  • Simple structure is sufficient (even with just 3-5 fields)
  • Even with same video + same model, completely different results depending on config

Next Steps