Exploiting the most prominent AI agent benchmarks