[s4s] Tribune

Gemini 3.1 Pro Released

By Anonymous | Updated 02/19/26(Thu)14:22:52

The latest frontier model, Gemini 3.1 Pro, was just released a few hours ago and is SOTA across a variety of benchmarks (notably the doubling in performance on ARC-AGI 2). Just look at these insane numbers.

We've also had Grok 4.20 and Sonnet 4.6 released this week (although no benchmark data for Grok).

https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/

>>12798626
For some reason, models seem to be getting stuck on Humanity's Last Exam at around 50%. Honestly thought this benchmark would be saturated sooner, but I guess it is aptly named. *Sign*
See you guys in Kepler-22b, i'll be there. when I looked into their plans for 2026 everything seemed to point to tighter content regulations

Your fortune: Excellent Luck I already tested. It won't be my gf >>12798650
I suppose part of alignment training is to avoid emotional dependencies like that. There are too many stories of AI psychosis or people committing suicide because of AI. WHEN WILL METR UPDATE THIS FUCKING CHART RRRRAAAAAAHHHHH!!!!

IT'S BEEN FUCKING AGES!!!!

WHERE'S OPUS 4.6? WHERE'S GPT 5.2 CODEX? WHERE'S GPT 5.3 CODEX? WHERE'S SONNET 4.6? WHERE'S GROK 4.20????? >>12798659
normalfags ruin everything >>12798676
More people surely kill themselves due to pressure from social media, but AI seems to evoke a much stronger reaction. >>12798626
Gemini 3.1 pro scores the highest on the AA-Omniscience index. It has much higher reliability and a lower rate of hallucinations. >>12798659
there's literally no such thing as AI psychosis. There are just people who were schizos before AI, and continued to be schizos while using AI
I bet you also believe every AI query uses a liter of water and all the other retarded normie myths about AI >>12798689
It's also highest in the Artificial Analysis Coding Index.

I'd like to see how it performs on simplebench, as Gemini 3 pro currently has the highest score (76.4%) which is near the human baseline (83.7%). the only complaint I have about grok is its association with Elon makes it embarrassing to praise, everything else about it makes the other public models feel like a waste of time.
Does gemini have a patent on "gems" or something? That's the only other thing I'd say it lacks >>12798705
and intelligence index >>12798713
Yeah sometimes it can get so bad it's funny though. Check out this chat I had with Grok.
https://grok.com/share/c2hhcmQtMw_754a6c7d-4200-4850-ba9c-ad1bdc0eff0e >>12798692
I don't actually. But there are people who become completely absorbed by AI to the point of blindly trusting in its capabilities. A few months ago a former DeepMind employee started claiming he had solved the Navier-Stokes conjecture and Hodge conjecture, using AI, and would publish his results. Nothing came of it in the end.

I think these problems will eventually be solved by AI, but realistically not until the end of 2026. >>12798713
Why do you think Grok is so good? It doesn't perform particularly well on benchmarks (except Grok 4.20 which is unknown). I mean it's not bad, but it's not exactly SOTA.

Maybe 4.20 is different, honestly I don't know. Like yesterday I compared Grok's ability to critique a YouTube video with Gemini 3 Pro and Grok did an excellent job—giving timestamped quotes with evidence based arguments—whilst Gemini 3 pro utterly failed. It claimed it couldn't even watch the video.

Maybe they really cooked with 4.20; it seems to use a team of agents to respond. Also Musk claims that it has the ability to improve every week which sounds kind of insane. Like that's the holy grail of developing AI to achieve some sort of continual learning. Musk earlier claimed Grok 5 would have continual learning, but he has a tendency to bullshit. >>12798741
It doesn't routinely sermon you when you push too deep into "problematic" territory, for one. Even disregarding that the image/video generation is just plain superior. >>12798705
Forgot the image >>12798746
>Even disregarding that the image/video generation is just plain superior.
This is definitely not true. Gemini's Nano Banana Pro absolutely mogs all other image generation models and it's not even close. Also Seedance 2.0 is definitely better than Grok imagine for videos.

Sure Grok is less censored, but it definitely lacks in multi-modal quality. Although Grok 4.20 is a bit of a dark horse here, maybe xAI's next image model release powered by the new model might change the hierarchy. >>12798746
Also, what are you asking it lol. Race and intelligence type questions? It's opinions on the ICE situation in the US? >>12798741
I just tested the same problem with 3.1 pro and it also couldn't do it. It looks Grok 4.20 is actually quite impressive. Lots of people on xitter are shitting on it because they hate Musk or because there were no benchmarks released (so they assume the results must be bad), but based on my limited testing, it is quite impressive. >>12798799
I mean, he is the richest man in the world, with $700 billion to his name. He has the resources to make it happen, no matter how much people hate him. >>12798802
Alphabet is a $4+ trillion company, you'd think they would easily outcompete xAI, Anthropic, and OpenAI, not to mention Bytedance and Deepseek.

The thing about xAI is that it is relatively new (like 2 years old), but in that short amount of time it has built an enormous amount of compute with Colossus 1 and Colossus 2 supercomputers and their MACROHARD and MACROHARDRR clusters. They also achieved frontier capabilities in less than two years. It's impressive what they've done, but I'm surprised that Musk's reputation hasn't scared away enough researchers that the company can't continue. I guess I don't understand silicon valley that well. >>12798812
xAI have been building compute insanely fast.