The task of language model is to model the probability of a list of tokens:
\(P(x) = P(x_1) \cdot P(x_2 | x_1) \cdots P(x_n | x_1, \dots, x_{n-1})\)
Every Sequence
track its logical
blocks.
BlockSpaceManager
use
Dict[seq_id, BlockTable]
to track
sequence’s physical blocks.
cache block size = block_size * num_layers * num_heads * dim_head * 2 * dtype_size
* 2
is to store key and value
a list of PhysicalTokenBlock