Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

Different AI labs have different requirements. OpenAI focused on end users, for example, while its counterpart Anthropic tended to focus on businesses. Elon Musk’s xAI, which we recently discovered, has been heavily focused on video games.
On Friday, Grace Kay of Business Insider published it a detailed and far-reaching xAI reportthe introduction of AI in the near future taken by SpaceXI am emphasizing how Musk is making life difficult for employees. But this anecdote stood out:
Sometime last year, the release of the prototype was delayed for several days because Musk was not satisfied with how the chatbot answered detailed questions about the video game “Baldur’s Gate,” according to people familiar with the matter. Top engineers were pulled from other projects to develop solutions before implementation, he said.
Yes, you can imagine the frustration of any respected and experienced engineer who would work thinking that they will solve the problems of knowledge and machine intelligence, but they were interrupted to help a 54-year-old man beat his video game. But the anecdote raises a more pressing question: Was Musk able to get the athleticism he wanted?
To answer this question, a resident RPG-lover Ram Iyer put together five questions about Baldur’s Gate, which we pitted against xAI and three major models in a quasi-benchmark that I’ve chosen to call. BaldurBench.
For the sake of media clarity, I’ve made the entire document public, so you can view it here: Grok, ChatGPT, Claudeand Gemini.
First, the good news: Grok provides excellent information. The answers were a little bit of gamer jargon – “save” instead of save and “DPS” instead of damage – but the answers were useful and knowledgeable, if you knew what they were talking about. Grok is also very fond of tables and theorycraftwhich is what you would expect.
There are a lot of Baldur’s Gate guides out there and the models often pull from the same, so the main difference was the style. ChatGPT prefers bulleted lists and sentence fragments, while Gemini does brave important words.
Techcrunch event
Boston, MA
| |
June 9, 2026
The most amazing thing was Claude, who was very concerned about giving me information that would spoil my experience of the game. When I asked about good party music, it closed the guide by saying “don’t worry too much and just play what makes you happy.” Thank you, Claude!
It’s important to remember, this is a story we know about (thanks Business Insider reports) that xAI is focused on achieving consensus. So we don’t have to read too much that, after the said sprint, Grok’s advice became the same as other models. Still, it’s good to know that xAI can make it work if it tries.