Nvidia's Nemotron-Cascade 2 is a 30B MoE model that activates only 3B parameters at inference time, yet achieved gold medal-level performance at the 2025 IMO, IOI, and ICPC World Finals. Nvidia has ...
Lakkaraju, Himabindu, Hongzhe Du, Weikai Li, Min Cai, Karim Saraipour, Zimin Zhang, Yizhou Sun, and Shichang Zhang. "How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, ...