Quantization
Convert model weights to another data type.
less model size on gpu global memory
faster inference