The task of language model is to model the probability of a list of tokens:
\(P(x) = P(x_1) \cdot P(x_2 | x_1) \cdots P(x_n | x_1, \dots, x_{n-1})\)