Every code you'll use, return, or debug — with API-shaped guidance
Every HTTP status code has three digits. The first tells you the kind of response, which is enough to route your error handling before you even parse the body.
"I'm still working on it." Rare outside of WebSockets, protocol upgrades, and streaming hints.
"Got your request. Here's what happened." 200 OK is the default; the others signal how it succeeded.
"The thing moved. Look somewhere else." Common with moved URLs, caching, and auth flows.
"You messed up." The client sent something wrong — bad format, missing auth, nonexistent resource.
"I messed up." The server blew up handling a valid-looking request. These should page someone.
Expect: 100-continue; server is ready for the body.POST /v1/predict when the inference ran and returned a result — even if confidence is low.Location header.POST /jobs) or a saved prediction record.GET /jobs/{id}.DELETE and some PUT responses.If-None-Match or If-Modified-Since headers.detail field explaining what's wrong.POST on a GET-only endpoint).Allow header listing the methods that are supported.Accept header.Content-Type the endpoint doesn't accept.POST /v1/predict when fields don't match the schema.Retry-After header with seconds (or an HTTP date). The class's own /llm/ endpoint returns this at 20 req/min.Retry-After.Common API scenarios and the code most reviewers will expect.
422 Unprocessable Entity (schema validation) or 400 Bad Request (general malformed). FastAPI picks 422 by default for Pydantic errors.413 Payload Too Large. Don't wait until your model chokes.401 Unauthorized. Include a WWW-Authenticate header if you want to be rigorous.403 Forbidden. They know who they are; they just can't do this.200 OK. The inference succeeded — confidence is a signal for the caller, not an error. Returning 4xx confuses monitoring.503 Service Unavailable. Pair with Retry-After. Monitoring should page on 5xx rates.202 Accepted with a job ID. Client polls GET /jobs/{id} — each poll returns 200 with the current state.429 Too Many Requests with Retry-After.POST /predict to POST /v1/predict and want old clients to be safe.308 Permanent Redirect (method-preserving) — never 301/302 for POSTs.Returning {"error": "..."} with status 200 breaks every monitoring tool, load balancer, and retry logic. They look at status codes, not your body.
Match the status code to the problem. Put details in detail or an errors array. Clients can still read the body, but the status tells the whole stack what happened.
Production clients get your Python traceback, including file paths and library versions. Attackers love this.
{"detail": "internal error", "request_id": "req_01H..."}. Log the traceback server-side. Support searches logs by request ID.
Hiding a 401 behind a 404 "for security" confuses legitimate clients and their error handlers. Real attackers probe both anyway.
Return 401 or 403 with a minimal message. If you truly want to hide existence, return 404 only for GET on specific resources — and be consistent.
Status 230? 451 (oh wait, that's real)? Made-up codes break everything that classifies by number range.
The status code is for the HTTP stack. The body is where your error taxonomy lives: {"code": "MODEL_OVERLOADED", "detail": "..."}.