ai-llm-inference
Operational patterns for LLM inference: latency budgeting, tail-latency control, caching, batching/scheduling, quantization/compression, parallelism, and reliable serving at scale. Emphasizes producti
Also installable via skills CLI
npx skills add vasilyu1983/AI-Agents-public/frameworks/shared-skills/skills/ai-llm-inference
Source
Path:
frameworks/shared-skills/skills/ai-llm-inference(main)