ai-multimodal
Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentati
Also installable via skills CLI
npx skills add brixtonpham/claude-config/skills/ai-multimodal