Quantization
[!tip] Quantization methods are developed to lower the barrier in GPU resources in terms of serving LLM.
Prerequisites
Float32 Vs Float16
np.float32(2 ** 23)[[#Reference]]
Now take a look at some of the commonly used quantization techniques.