serving-llms-vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GP

by zechenzhangAGI· Repository·devops

Run in AgentArea Browse All Skills

Also installable via skills CLI

npx skills add zechenzhangAGI/AI-research-SKILLs/devops/serving-llms-vllm-zechenzhangagi-ai-research-skills

Source

Repo:SkillsMP + GitHub Raw

Path:devops/serving-llms-vllm-zechenzhangagi-ai-research-skills(main)

Related in devops

create-pr-n8n-io-n8n

Creates GitHub pull requests with properly formatted titles that pass the check-pr-title CI validation. Use when creating PRs, sub...

by n8n-io

170,998

create-database-migration-tryghost-ghost

Create a database migration to add a table, add columns to an existing table, add a setting, or otherwise change the schema of Gho...

by TryGhost

51,672

internal-comms

A set of resources to help me write all kinds of internal communications, using the formats that my company likes to use. Claude s...