Endpoint Designer

The three endpoints every AI MVP should expose

How to Read This Page

Each endpoint below is a worked example of what your team's api/api_design.md should contain. Every endpoint has four parts: purpose, request schema, response schema, and error conditions. Plus the anti-GenAI requirement: why this endpoint exists.

Your job in Breakout 1: copy this structure for a product of your team's choosing. Don't just rename fields — pick a real capability and justify each endpoint.

POST /v1/predict

Why this endpoint exists: the product capability is "get a prediction from the model given a product-meaningful input." Every client — web, mobile, batch pipelines, other services — calls this same endpoint. The model behind it can change; this contract doesn't.

Request

Response

Errors

curl

Headers

Content-Type	application/json
X-Request-Id	optional client-supplied trace ID

Body

{
  "text": "string, required, 1-5000 chars",
  "options": {
    "return_probabilities": false
  }
}

{
  "prediction": "positive",
  "confidence": 0.94,
  "request_id": "req_01HXYZ...",
  "model_version": "1.2.0"
}

confidence is a float 0.0–1.0. model_version lets clients invalidate caches when the model changes without the contract changing.

Code	When	Body
400	Missing required field	`{"error": "text is required"}`
413	Input exceeds 5000 chars	`{"error": "text too long"}`
422	Validation failed (wrong type, etc.)	FastAPI validation detail
429	Rate limit exceeded	`{"error": "rate limit", "retry_after": 30}`
503	Model unavailable / overloaded	`{"error": "model unavailable"}`

curl -X POST http://localhost:8000/v1/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "I loved this product"}'

GET /health

Why this endpoint exists: load balancers, orchestrators, and monitoring systems need a cheap, side-effect-free way to ask "is this service alive?" A failing /health takes an instance out of rotation before users see errors.

Request

Response

Errors

No body. No auth. No rate limit.

GET /health

{
  "status": "ok",
  "uptime_seconds": 84231,
  "model_loaded": true
}

Common mistake: don't make /health call your model or database. A health check that hits every downstream service turns one slow dependency into a cascading outage. Use /readiness for that if you need it.

Code	When
503	Service is starting up or shutting down

GET /metadata

Why this endpoint exists: clients and debug tools need to know which version of the contract they're talking to without parsing a URL or reading a changelog. Support engineers read this field first when a user reports a problem.

Request

Response

GET /metadata

{
  "api_version": "1.2.0",
  "model_version": "sentiment-classifier@2026-04-12",
  "supported_languages": ["en", "es"],
  "max_input_length": 5000,
  "rate_limit_per_minute": 60
}

Keep this endpoint cheap. It's read by every client on startup, and sometimes on every request.

Design Review Checklist

Use this in Breakout 2 when reviewing another team's design:

Purpose

Can you explain the endpoint in one sentence without using words like "handle" or "process"?

Request schema

Are field types, required-ness, and constraints explicit? Could you mock this without asking questions?

Response schema

Is the shape the same on success? Are nullable fields marked? Is there a request_id?

Error conditions

Every 4xx and 5xx you can trigger should be listed with the body shape and a human-readable reason.

Leak check

Does the request or response mention model names, token IDs, internal DB fields, or infrastructure? That's a leak. Rename or remove.

Versioning

Is the URL prefixed with /v1/? If not, how will you introduce a breaking change?