Silicon Sonnets

ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

January 29, 2024
Silicon Sonnets
ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models
Show Notes

ServerlessLLM introduces a serverless inference system for LLMs that reduces remote checkpoint downloads and improves checkpoint loading efficiency. It achieves significant latency performance improvements over existing systems in LLM inference workloads.