Nvidia's Nemotron-Cascade 2 is a 30B MoE model that activates only 3B parameters at inference time, yet achieved gold medal-level performance at the 2025 IMO, IOI, and ICPC World Finals. Nvidia has ...
Lakkaraju, Himabindu, Hongzhe Du, Weikai Li, Min Cai, Karim Saraipour, Zimin Zhang, Yizhou Sun, and Shichang Zhang. "How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results