Releases
core_v0.6.0
NVIDIA Megatron Core 0.6.0
Latest
MoE (Mixture of Experts)
Performance optimization
Communication optimization for multi GPU and Single GPU
23% improvement (323 TFLOPS/GPU) over MCore 0.5.0 on Mixtral with Hopper BF16
GroupedMLP enhancement for Hopper
DP Overlapping. Support overlapping computation with gradient reduction and parameter gathering.
All-to-All based Token Dispatcher
Layer-wise logging for load balancing loss.
Improved expert parallel support including distributed optimizer.
Distributed optimizer
RETRO
BERT
Distributed checkpointing
Dist checkpointing
PyTorch native distributed backend
Improved saving/loading speed
TensorRT-LLM Export
Integration with TensorRT Model Optimizer Post-training quantization (PTQ)
Text generation driver to perform PTQ in Megatron-LM
Llama2 and Nemotron3-8b examples to use TensorRT-LLM unified build API to build engine after training.
Several minor enhancements, bug fixes, and documentation updates
You can’t perform that action at this time.