ai-multimodal

Multimodal AI processing via Google Gemini API (2M tokens context). Capabilities: audio (transcription, 9.5hr max, summarization, music analysis), images (captioning, OCR, object detection, segmentati

by brixtonpham· Repository·other
Also installable via skills CLI
npx skills add brixtonpham/claude-config/skills/ai-multimodal

Source

Path:skills/ai-multimodal/SKILL.md(main)

Related in other

ai-multimodal | AgentArea Skills