Skip to content

NVIDIA Megatron Core 0.6.0

Latest
Compare
Choose a tag to compare
@ericharper ericharper released this 19 Apr 23:46
· 306 commits to main since this release
  • MoE (Mixture of Experts)
    • Performance optimization
      • Communication optimization for multi GPU and Single GPU
      • 23% improvement (323 TFLOPS/GPU) over MCore 0.5.0 on Mixtral with Hopper BF16
      • GroupedMLP enhancement for Hopper
      • DP Overlapping. Support overlapping computation with gradient reduction and parameter gathering.
    • All-to-All based Token Dispatcher
    • Layer-wise logging for load balancing loss.
    • Improved expert parallel support including distributed optimizer.
  • Distributed optimizer
  • RETRO
    • Data processing
  • BERT
    • Distributed checkpointing
  • Dist checkpointing
    • PyTorch native distributed backend
    • Improved saving/loading speed
  • TensorRT-LLM Export
    • Integration with TensorRT Model Optimizer Post-training quantization (PTQ)
    • Text generation driver to perform PTQ in Megatron-LM
    • Llama2 and Nemotron3-8b examples to use TensorRT-LLM unified build API to build engine after training.
  • Several minor enhancements, bug fixes, and documentation updates