Dense/Sparse Layer

Dense Layer

FFNN (Feedforward Neural Network)

An FFNN allows the model to use the contextual information created by the attention mechanism, transforming it further to capture more complex relationships in the data.

MLP is a type of FFNN

dense layer

Sparse Layer

Only activate a portion of the parameters

Each expert learns different information during training

sparse layer

Expert

What to learn

Mixtral Paper

Example Expert

MoE walkthrough