llm-serving-patterns

LLM inference infrastructure, serving frameworks (vLLM, TGI, TensorRT-LLM), quantization techniques, batching strategies, and streaming response patterns. Use when designing LLM serving infrastructure

by melodic-software· Repository·design

Run in AgentArea Browse All Skills

Also installable via skills CLI

npx skills add melodic-software/claude-code-plugins/design/llm-serving-patterns

Source

Repo:SkillsMP + GitHub Raw

Path:design/llm-serving-patterns(main)

Related in design

add-uint-support

Add unsigned integer (uint) type support to PyTorch operators by updating AT_DISPATCH macros. Use when adding support for uint16,...

skill-writer

Guide users through creating Agent Skills for Claude Code. Use when the user wants to create, write, author, or design a new Skill...

frontend-design

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web...