Evaluation Protocol Builder

Define your experiment's evaluation methodology

← Back to Hub

📊 Why Evaluation Protocols Matter

A well-defined evaluation protocol ensures reproducibility and fair comparison between experiments. It specifies what metrics to track, how many random seeds to use, and when to consider a model "trained."

📝 Project Information

🎯 Environment Configuration

📊 Primary Metric

Tip: CartPole-v1 is "solved" when mean reward reaches 475+ over 100 consecutive episodes. For faster iteration, evaluate over 10 episodes with target of 475.

🎲 Random Seeds

Multiple seeds ensure results aren't due to lucky initialization. Best practice: 3-5 seeds minimum.

⏱ Training Budget

Tip: CartPole typically solves in 20-50k timesteps with PPO. 100k gives enough margin for suboptimal hyperparameters.

📈 Additional Metrics to Track

✅ Success Criteria

📄 eval_protocol.md Preview

Copied to clipboard!