An alternative to Transformers that processes sequences by maintaining a compressed "state" instead of using attention over all tokens. Mamba is the most well-known SSM architecture. SSMs scale linearly with sequence length (vs. quadratic for attention), making them potentially much more efficient for very long contexts.
Why it matters
SSMs are the main challenger to Transformer dominance. They're faster for long sequences and use less memory, but the research is still maturing. Hybrid architectures (mixing SSM layers with attention) may end up being the best of both worlds.