The delay between sending a request and getting the first response. In AI, this is often measured as Time to First Token (TTFT) — how long before the model starts streaming its answer. Affected by model size, server load, network distance, and prompt length.
Why it matters
Users perceive anything over ~2 seconds as slow. Low latency is why smaller models often win for real-time applications even when larger models are "smarter." It's a key differentiator between providers.