vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GP

by ihatesea69· Repository·other
Also installable via skills CLI
npx skills add ihatesea69/HieuNghi-AI-Skills/airesearch_skills/12-inference-serving/vllm

Source

Path:airesearch_skills/12-inference-serving/vllm/SKILL.md(main)

Related in other

vllm | AgentArea Skills