FlowersBench
The most ethical benchmarking of AI models under evaluation
A dictatorial reflection of @flowersslop's current views and evaluation of the latest AI models
The most ethical benchmarking of AI models under evaluation
A dictatorial reflection of @flowersslop's current views and evaluation of the latest AI models
2d water simulation, o1 (black screen) vs o3-mini-medium (actually works and looks really cool) vs 4o (if you click, window closes immediately with console error) vs r1 (works, but didn't look as cool as o3-mini and had some weird black borders) o3-mini 10/10 r1 8/10 o1 0/10