Anthropic has to keep repeating his interview skills test so you can't cheat on it with Claude

Since 2024, Anthropic’s well-prepared team has offered employers a home exam to make sure they know what they’re up to. But as AI writing tools have improved, the test has had to change dramatically to stay ahead of AI-assisted hacking.

Team leader Tristan Hume explained the history of the challenge in a blog post on Wednesday. “Every new version of Claude has forced us to revise the test,” writes Hume. “Given the same time limit, Claude Opus 4 outperformed the majority of applicants. This still allowed us to distinguish between strong candidates – but then, Claude Opus 4.5 matched the same.”

The results are very difficult for candidate evaluation. Without a human eye, there’s no way to make sure someone isn’t using AI to cheat on an exam — and if they are, they’re going to get high. “Under the constraints of the take-home test, we no longer had a way of distinguishing between what we wanted to do and what we really liked,” writes Hume.

The story of AI fraud is already there destroying schools and universities in the world, it is so amazing that AI laboratories have to deal with it. But Anthropic also has special tools to deal with this problem.

In the end, Hume created a new test that was not related to hardware optimization, making it a textbook enough to break current AI tools. But as part of the post, he shared a basic test to see if any reader could come up with a good answer.

“If you successfully complete Opus 4.5,” the post says, “we’d love to hear from you.”

Source link

Anthropic has to keep repeating his interview skills test so you can’t cheat on it with Claude

Leave a ReplyCancel Reply

The rise of the far left and far right puts France’s mainstream political parties in trouble

Form 144 IMAX Corporation Date: March 16

Nvidia’s version of OpenClaw can solve its biggest problem: security

Leave a ReplyCancel Reply

Trending now

The rise of the far left and far right puts France’s mainstream political parties in trouble

Form 144 IMAX Corporation Date: March 16

Nvidia’s version of OpenClaw can solve its biggest problem: security