Characteristics of LLM Model
LLM Serving
agnostic to ML Models, execution engines and computing hardware
Batching
previous batching restrictions:
Batching is only applicable when the two selected requests are in the same phase, with the same number of input tokens (in case of the initiation phase) or with the same token index (in case of the increment phase).
ORCA’s scheduler can change which requests are going to be processed at every iteration.