# Autoresearch Run 6: Escalation Ladder

Run 6 tested a new **cost-saving escalation ladder** for the autoresearch orchestrator. Instead of using a fixed model for every iteration, the system starts with the cheapest option (Haiku/low effort) and only escalates to more expensive models when consecutive iterations fail to improve.

## The Ladder

10 steps, reset to step 0 on any successful improvement:

| Step | Model | Effort |
|------|-------|--------|
| 0 | haiku | low |
| 1 | haiku | medium |
| 2 | haiku | high |
| 3 | sonnet | low |
| 4 | sonnet | medium |
| 5 | sonnet | high |
| 6 | opus | low |
| 7 | opus | medium |
| 8 | opus | high |
| 9 | opus | max |

## Setup

- Task: Fashion-MNIST classification (CPU only)
- Baseline: single linear layer (784 -> 10)
- Training budget: 300s per iteration
- 20 iterations total

## Results

| Iter | Ladder | Model/Effort | Action | val_accuracy |
|------|--------|-------------|--------|-------------|
| 0 | 0 | haiku/low | keep | 0.8405 |
| 1 | 0 | haiku/low | keep | 0.8980 |
| 2 | 0 | haiku/low | discard | 0.8885 |
| 3 | 1 | haiku/medium | discard | 0.8825 |
| 4 | 2 | haiku/high | keep | 0.8993 |
| 5 | 0 | haiku/low | keep | 0.9233 |
| 6 | 0 | haiku/low | discard | 0.9107 |
| 7 | 1 | haiku/medium | discard | 0.9222 |
| 8 | 2 | haiku/high | keep | 0.9269 |
| 9 | 0 | haiku/low | discard | 0.9169 |
| 10 | 1 | haiku/medium | discard | 0.9228 |
| 11 | 2 | haiku/high | discard | 0.9264 |
| 12 | 3 | sonnet/low | keep | 0.9275 |
| 13 | 0 | haiku/low | discard | 0.9255 |
| 14 | 1 | haiku/medium | discard | 0.8843 |
| 15 | 2 | haiku/high | discard | 0.9190 |
| 16 | 3 | sonnet/low | discard | 0.9253 |
| 17 | 4 | sonnet/medium | discard | 0.9213 |
| 18 | 5 | sonnet/high | discard | 0.9152 |
| 19 | 6 | opus/low | discard | 0.9208 |

**Best: 92.75% at iteration 12 (sonnet/low). Total cost: $1.77.**

## Findings

**The ladder mechanism works as intended.** Both breakthroughs at iterations 4 and 8 came from haiku/high after lower effort levels stalled. The iteration 12 improvement required escalating to sonnet/low after haiku exhausted its range. The system correctly resets to step 0 after each success, keeping costs low during productive phases.

**Plateau at 92.75%.** After iteration 12, seven consecutive failures across all tiers up through opus/low could not beat it. The best model is a 2-block CNN (64->128 channels) with BatchNorm, horizontal flip augmentation, and Adam optimizer.

**Cost comparison vs Run 5.** Run 5 used sonnet for all 20 iterations and reached 93.95% for ~$2.50. Run 6 with the ladder reached 92.75% for $1.77 (29% cheaper). The lower peak likely reflects haiku producing weaker architectural ideas in the critical early iterations where the biggest gains happen --- by the time sonnet gets involved, the model is already near a local optimum that's harder to escape.

**Implication.** The escalation ladder is best suited for long runs where cost matters more than peak performance. For short runs where every iteration counts, using a stronger model throughout may be worth the premium.