Nous Research's open-source Nomos 1 AI model scored 87/120 on the notoriously difficult Putnam math competition, ranking ...
Testing AI systems is hard. Responses are non-deterministic, you need to validate tool usage, and semantic meaning matters more than exact text matching.