The initial, massive training phase where a model learns language (or other modalities) from a huge corpus. This is the expensive part — thousands of GPUs running for weeks or months, costing millions of dollars. The result is a foundation model that understands language but hasn't been specialized for any task yet.
Why it matters
Pre-training is what makes foundation models possible. It's also why only a handful of companies can create frontier models — the compute costs are astronomical. Everything else (fine-tuning, RLHF, prompting) builds on this base.