vllm
Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GP
Also installable via skills CLI
npx skills add Orchestra-Research/AI-research-SKILLs/12-inference-serving/vllm
Source
Path:
12-inference-serving/vllm/SKILL.md(main)