fp8
E4M3
1 sign bit + 4 exponent bits + 3 mantissa bits
E5M2
1 sign bit + 5 exponent bits + 2 mantissa bits
- E4M3: Often used for model weights where you need
reasonable precision
- E5M2: Often used for gradients/activations where
extreme values need to be represented