MFU (Model Flops Utilization)
\[\text{MFU} = \frac{\text{measured throughput}}{\frac{P}{6N + 12LHQT}}\]
implication:
measured throughput
----------------------------------------------------------------------------
P
--------------------------------------------------------
3 * (gemm_flops_per_token + attention_flops_per_token)
\(\times 3\) in training, ignored in inference
Example:
one perf with following results:
MBU(Model Bandwidth Utilization)
achieved memory bandwidth
-----------------------------------
peak memory bandwidth
N + KV cache size
----------------------------
tpot
-----------------------------------
peak memory bandwidth
KV cache and Model Parameters are saved in GPU global memory.
1 / tpot: how many tokens are generated in a second.