The three endpoints every AI MVP should expose
Each endpoint below is a worked example of what your team's api/api_design.md should contain. Every endpoint has four parts: purpose, request schema, response schema, and error conditions. Plus the anti-GenAI requirement: why this endpoint exists.
| Content-Type | application/json |
| X-Request-Id | optional client-supplied trace ID |
{
"text": "string, required, 1-5000 chars",
"options": {
"return_probabilities": false
}
}
{
"prediction": "positive",
"confidence": 0.94,
"request_id": "req_01HXYZ...",
"model_version": "1.2.0"
}
confidence is a float 0.0–1.0. model_version lets clients invalidate caches when the model changes without the contract changing.
| Code | When | Body |
|---|---|---|
| 400 | Missing required field | {"error": "text is required"} |
| 413 | Input exceeds 5000 chars | {"error": "text too long"} |
| 422 | Validation failed (wrong type, etc.) | FastAPI validation detail |
| 429 | Rate limit exceeded | {"error": "rate limit", "retry_after": 30} |
| 503 | Model unavailable / overloaded | {"error": "model unavailable"} |
curl -X POST http://localhost:8000/v1/predict \ -H "Content-Type: application/json" \ -d '{"text": "I loved this product"}'
/health takes an instance out of rotation before users see errors.
No body. No auth. No rate limit.
GET /health
{
"status": "ok",
"uptime_seconds": 84231,
"model_loaded": true
}
/health call your model or database. A health check that hits every downstream service turns one slow dependency into a cascading outage. Use /readiness for that if you need it.
| Code | When |
|---|---|
| 503 | Service is starting up or shutting down |
GET /metadata
{
"api_version": "1.2.0",
"model_version": "sentiment-classifier@2026-04-12",
"supported_languages": ["en", "es"],
"max_input_length": 5000,
"rate_limit_per_minute": 60
}
Keep this endpoint cheap. It's read by every client on startup, and sometimes on every request.
Use this in Breakout 2 when reviewing another team's design:
Can you explain the endpoint in one sentence without using words like "handle" or "process"?
Are field types, required-ness, and constraints explicit? Could you mock this without asking questions?
Is the shape the same on success? Are nullable fields marked? Is there a request_id?
Every 4xx and 5xx you can trigger should be listed with the body shape and a human-readable reason.
Does the request or response mention model names, token IDs, internal DB fields, or infrastructure? That's a leak. Rename or remove.
Is the URL prefixed with /v1/? If not, how will you introduce a breaking change?