API-First Development

1What "API-First" Actually Means

API-first means you design and agree on the interface — the URL, inputs, outputs, error shapes — before you build the thing behind it. The API is not a side effect of your implementation. It is the product.

For ML systems this matters twice as much. Models change. Training data changes. Inference backends get swapped out. But if the contract is stable, everything that depends on your model keeps working.

Rule of thumb: if you can't write a one-page description of your endpoint before writing the handler, the endpoint isn't ready to build yet.

2The Architecture Every AI Product Shares

Client Web app, mobile, other service, curl

↓ HTTP request

API Layer Validates input, routes, handles auth, returns structured response

↓ internal call

Application Logic Business rules, orchestration, feature prep, post-processing

↓ inference

ML Model Prediction, embedding, generation

↑ response travels back up

The API layer is the only layer your users ever touch. Everything below can be rewritten, replaced, or rehosted without notice — as long as the top of this stack keeps its promises.

3Code-First vs API-First

❌ Code-First

Build the logic, then see what comes out.

Endpoints leak implementation details (model names, internal IDs)
Response shapes change every time the model is retrained
Frontend and backend block each other
Documentation is written last, if at all

✅ API-First

Design the contract, then build both sides against it.

Frontend can mock responses and develop in parallel
Response shape is stable across model versions
OpenAPI spec is the source of truth
Breaking changes require a version bump — and are visible

4The Five Principles

1

Clear interfaces

A developer should understand what an endpoint does from its URL and method alone. POST /predict is clear. POST /doStuff is not.

2

Predictable responses

Same shape on success, same shape on error. Fields that can be null should be typed as nullable, not silently omitted. Clients that can predict what they'll get write half as much code.

3

Minimal coupling

The API doesn't expose which model served the prediction, where the database lives, or how the feature store is shaped. The client asks a product-shaped question and gets a product-shaped answer.

4

Validation at the edge

Reject bad input at the API layer with 422. Don't let malformed requests reach your model. Every layer below should trust its inputs.

5

Observable by default

Every response should be inspectable. Health endpoints, request IDs in responses, and stable error codes turn "it's broken" into "it's broken because X."

5A Small Example — The Same Endpoint, Two Ways

Prediction endpoint for a sentiment classifier. Same model, same inputs. Which one would you rather integrate with?

❌ Leaky — exposes internal detail

# Request
POST /api/run_bert_sentiment_v3
{
  "input_tokens": [101, 2023, 2003, 2204, 102],
  "temperature": 0.7,
  "model_checkpoint": "bert-base-uncased-ft-2025-03-14"
}

# Response
{
  "logits": [2.34, -1.87],
  "model_hash": "sha256:abc...",
  "gpu_ms": 42
}

✅ Contract-shaped — product-level

# Request
POST /v1/predict
{
  "text": "I loved this product"
}

# Response
{
  "prediction": "positive",
  "confidence": 0.94,
  "request_id": "req_01H..."
}

Notice: the second version can swap BERT for an LLM, for logistic regression, for a rules engine — without a single client having to change a line of code.

6How This Shows Up in Tonight's Lab

In Breakout 1 you'll design POST /predict, GET /health, and GET /metadata. Apply these principles:

URL describes a capability, not a model name
Request schema takes product-meaningful inputs (text, image, user_id — not token IDs)
Response schema stays stable if the model behind it is replaced
/health answers one question: is this service alive enough to take traffic?
/metadata tells clients which version of the contract they're talking to