llm-inference-batching-scheduler

Guidance for optimizing LLM inference request batching and scheduling problems. This skill applies when designing batch schedulers that minimize cost while meeting latency and padding constraints, inv

by letta-ai· Repository·design

Run in AgentArea Browse All Skills

Also installable via skills CLI

npx skills add letta-ai/skills/design/llm-inference-batching-scheduler

Source

Repo:SkillsMP + GitHub Raw

Path:design/llm-inference-batching-scheduler(main)

Related in design

add-uint-support

Add unsigned integer (uint) type support to PyTorch operators by updating AT_DISPATCH macros. Use when adding support for uint16,...

skill-writer

Guide users through creating Agent Skills for Claude Code. Use when the user wants to create, write, author, or design a new Skill...

frontend-design

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web...