Deepseek’s models rely on a process called distillation (i.e.) using foundational models like Llama a to train a smaller more ...