adaptive-gradient-clipping
Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".
NFNets and Adaptive Gradient Clipping for SGD implemented in PyTorch. Find explanation at tourdeml.github.io/blog/
:dart: Gradient Accumulation for TensorFlow 2