Also, the performance gains are on 4-bit training and inference, vs FP16 or 32. Quant models will be built that way…
1 Like
Also, the performance gains are on 4-bit training and inference, vs FP16 or 32. Quant models will be built that way…