knowledge structure

artificial intelligence tree

concepts
- transformers
  - phase
    - tokenize
    - embed
    - attention
      - paged attention
      - flash attentino
    - mlp
    - unembed
practice
- inference framework
  - vllm
    - scheduling
    - paged attention operator cuda
      - kv cache allocation
    - model executor torch
    - misc
      - sampling params
      - api server
- neural network
  - torch arch and usage
- parallel computation
  - cuda

learning

cuda
- mat mul, add
- raw attention cuda operator
torch
- mini torch