Quantization

Quantization

Convert model weights to another data type.

less model size on gpu global memory
faster inference