Fast structured generation and serving for LLMs with RadixAttention prefix caching. Use for JSON/regex outputs, constrained decoding, agentic workflows with tool calls, or when you need 5× faster infe
airesearch_skills/12-inference-serving/sglang/SKILL.md(main)