artificial intelligence tree

  • concepts
    • transformers
      • phase
        • tokenize
        • embed
        • attention
          • paged attention
          • flash attentino
        • mlp
        • unembed
  • practice
    • inference framework
      • vllm
        • scheduling
        • paged attention operator cuda
          • kv cache allocation
        • model executor torch
        • misc
          • sampling params
          • api server
    • neural network
      • torch arch and usage
    • parallel computation
      • cuda

learning

  • cuda
  • torch