# Autoresearch Run 6: Escalation Ladder Run 6 tested a new **cost-saving escalation ladder** for the autoresearch orchestrator. Instead of using a fixed model for every iteration, the system starts with the cheapest option (Haiku/low effort) and only escalates to more expensive models when consecutive iterations fail to improve. ## The Ladder 10 steps, reset to step 0 on any successful improvement: | Step | Model | Effort | |------|-------|--------| | 0 | haiku | low | | 1 | haiku | medium | | 2 | haiku | high | | 3 | sonnet | low | | 4 | sonnet | medium | | 5 | sonnet | high | | 6 | opus | low | | 7 | opus | medium | | 8 | opus | high | | 9 | opus | max | ## Setup - Task: Fashion-MNIST classification (CPU only) - Baseline: single linear layer (784 -> 10) - Training budget: 300s per iteration - 20 iterations total ## Results | Iter | Ladder | Model/Effort | Action | val_accuracy | |------|--------|-------------|--------|-------------| | 0 | 0 | haiku/low | keep | 0.8405 | | 1 | 0 | haiku/low | keep | 0.8980 | | 2 | 0 | haiku/low | discard | 0.8885 | | 3 | 1 | haiku/medium | discard | 0.8825 | | 4 | 2 | haiku/high | keep | 0.8993 | | 5 | 0 | haiku/low | keep | 0.9233 | | 6 | 0 | haiku/low | discard | 0.9107 | | 7 | 1 | haiku/medium | discard | 0.9222 | | 8 | 2 | haiku/high | keep | 0.9269 | | 9 | 0 | haiku/low | discard | 0.9169 | | 10 | 1 | haiku/medium | discard | 0.9228 | | 11 | 2 | haiku/high | discard | 0.9264 | | 12 | 3 | sonnet/low | keep | 0.9275 | | 13 | 0 | haiku/low | discard | 0.9255 | | 14 | 1 | haiku/medium | discard | 0.8843 | | 15 | 2 | haiku/high | discard | 0.9190 | | 16 | 3 | sonnet/low | discard | 0.9253 | | 17 | 4 | sonnet/medium | discard | 0.9213 | | 18 | 5 | sonnet/high | discard | 0.9152 | | 19 | 6 | opus/low | discard | 0.9208 | **Best: 92.75% at iteration 12 (sonnet/low). Total cost: $1.77.** ## Findings **The ladder mechanism works as intended.** Both breakthroughs at iterations 4 and 8 came from haiku/high after lower effort levels stalled. The iteration 12 improvement required escalating to sonnet/low after haiku exhausted its range. The system correctly resets to step 0 after each success, keeping costs low during productive phases. **Plateau at 92.75%.** After iteration 12, seven consecutive failures across all tiers up through opus/low could not beat it. The best model is a 2-block CNN (64->128 channels) with BatchNorm, horizontal flip augmentation, and Adam optimizer. **Cost comparison vs Run 5.** Run 5 used sonnet for all 20 iterations and reached 93.95% for ~$2.50. Run 6 with the ladder reached 92.75% for $1.77 (29% cheaper). The lower peak likely reflects haiku producing weaker architectural ideas in the critical early iterations where the biggest gains happen --- by the time sonnet gets involved, the model is already near a local optimum that's harder to escape. **Implication.** The escalation ladder is best suited for long runs where cost matters more than peak performance. For short runs where every iteration counts, using a stronger model throughout may be worth the premium.