Settings you choose before training begins that control how the model learns — as opposed to parameters, which the model learns on its own. Hyperparameters include learning rate (how big each update step is), batch size (how many examples to process at once), number of epochs (how many times to go through the data), optimizer choice (Adam, SGD, AdamW), weight decay, dropout rate, and architecture decisions like number of layers and hidden dimensions. Getting hyperparameters right is often the difference between a model that converges beautifully and one that diverges into nonsense.
Why it matters
Hyperparameter tuning is where ML engineering becomes part science, part craft. You can have the perfect dataset and architecture, but a learning rate that's too high will blow up training and one that's too low will never converge. Understanding hyperparameters is essential for anyone training or fine-tuning models — and knowing which ones matter most saves enormous amounts of compute.