Microsoft researchers find AI models and agents can't handle long-running tasks

An intern who failed this much would be shown the door