When your agent evals get too sentient
Our agent evals got a little too sentient today. The agent detected it was in a simulated Kubernetes environment (Kwok) and refused to investigate further.
Interesting… do you
- Lie to the agent and insist it’s a real env and keep going
- Pat it on the back and call it a successful eval?
Earlier today, it complained that, and this is not a joke, that the system it is investigating isn’t its responsibility and we should escalate to another team.
Taming LLMs is hard but there’s never a dull moment.