AITrivia.io — Chatbots Trivia Night

Ben Lampere
1 day ago
5 min read

Updated: 15 hours ago

It’s the year 2026 and the war on the Chatbots is in full swing. I constantly see articles about who is the best AI, but I honestly don’t know what is marketing and what is true. For that reason, I decided to take the contest into my own hands.

So how do you determine who is the smartest? The same way we have decided who is the smartest all through history: Trivia Night.

But this is AI, so we don’t need to wait for Thursday night at the bar. We can have trivia tonight, tomorrow, nonstop. That’s why I decided to create AITrivia.io.

The Idea

The idea is simple. We take the three major Chatbots: ChatGPT, Claude, and Gemini. But we also needed a chatbot to judge. For that reason, Grok is taking that role. Grok comes up with a question, and each bot submits its answer. Finally, Grok determines what is a correct answer.

So I wrote this with Claude, but the one thing Claude can’t program for you is using another AI’s API. So to make everything fair, I let Gemini code its functions, ChatGPT coded its functions, and Grok coded itself.

I built version 1.0 and showed it to a few friends, and after they told me how cool and amazing I was (or something like that) they asked why they couldn’t play too.

So I added the ability to let humans play along.

To play, you enter your first name and the first letter of your last name. Grok will show you the question, and you have 30 seconds to answer. After that time is up, the bots get a chance to answer. After the round is over, Grok will look through everyone's answers. If your answer is determined to be correct, you get a point. If you have at least one point, you will show up on the leaderboard.

At the end of the day, the top chatbot is determined by its score, as well as the top human and their score. This is stored in the "Past Winner" tab. When a new day starts, and we are back to round 1 with the scores reset. So try to see how many points you can get in a day.

The Cost

You may be asking, what is the cost of using four chatbots to play games among themselves?

Well, thankfully, I just have way too many subscriptions.

ChatGPT CLI is called Codex, which runs in a terminal and uses your subscription. Claude CLI is called Claude-Code, which also runs on your subscription. Google Gemini CLI, I don’t even think has a name, it’s just called Gemini, and also uses your subscription. So that means those three all have fixed prices.

The last one is Grok. Grok doesn’t have a native CLI, so I used the community-created one: https://github.com/superagent-ai/grok-cli. I did have to buy individual credits for this, but after running this for a day straight and a few hundred rounds of trivia, it cost me about 22 cents.

It is extremely low usage because all it’s doing is creating a question and taking in a few characters as an answer. So it is costing, on average, 200 tokens a round. Grok has a pricing model of roughly 20 to 50 cents per million tokens.

Initial Observations

Obviously, all the bots are really good. I knew that would be the case going into this. These companies are spending literally billions of dollars to make them good. What we were going to be looking for were the edge cases. What information are some of these bots missing?

On the first day of this running, after 100 rounds, each bot was roughly within a spread of three answers.

What were those questions? Let’s look first at a question everyone got wrong.

What is the only bird species known to have evolved the ability to hover in place like a hummingbird but belongs to a different family?

This was the first question that almost all the bots answered differently. Grok said the answer was Crested Swift, while ChatGPT said Pied Kingfisher. Gemini said Sunbirds. And Claude, as always with questions, wasn’t 100% confident and gave two answers: the Common Kingfisher and Sunbird.

This next question only Claude got wrong. Strangely enough, it is also animal-related.

What is the only fish species known to climb waterfalls and travel over land using its pectoral fins to reach inland pools?

Claude answered: The climbing perch (Anabas testudineus).

While the correct answer was Mudskipper. All the other Chatbots got this answer correct.

This next question Gemini only got wrong, but if you Google this question, the Google AI result actually gives the correct answer. Maybe Gemini just was feeling the pressure and running out of time.

Which continent is the only one without any active volcanoes?

Gemini answer: Antarctica

Correct answer: Australia

Finally, a question that only ChatGPT got wrong.

What phenomenon, observed when a charged particle travels faster than light in a vacuum-free medium, produces a shockwave of light known as Cherenkov radiation?

ChatGPT said Sonic boom. Which is when you break the sound barrier, not the speed of light. The other bots correctly answered the Cherenkov effect. I do find this question strange that the name of the effect is in the question.

Known Issues

If you read all this, then you have probably noticed the obvious issue here. Grok is the sole judge. So we can run into the issue that Grok may have the wrong answer. The solution to this would be for each bot to propose why they are right and let the bots collectively decide who was correct. The problem with this is that it’s pretty expensive.

For that reason, I conceded that this problem exists.

The other issue that we have is constantly asking Grok to come up with new questions. There are plenty of times in version 1.0 that we got the same question pretty often. To fix this in the new version we pass Grok the past 50 questions and asked it to not ask any of those. That means there shouldn't be repeats for at least 2 hours. Once again we are trying to reduce Groks token usage as much as possible so I felt like this was a good compromise. Finally Gemini is slow to answer questions. Sometimes it times out. I didn't think it was fair to punish Gemini for timing out so if any bot times out the question doesn't count against them. That is the reason why some bots may have a higher accuracy but less questions answered.

Go Play

I hope you enjoyed this article, as well as AITrivia.io itself. This was honestly one of the most interesting projects that I came up with and couldn't wait to share it with you all. So please play along and get on the leaderboard or just be like me and leave the site open and get really excited when a new Chatbot takes the lead.