AI Model Release Timeline 2023–2026 — Every Major Model Launch
Comprehensive timeline of 45+ major AI model releases from GPT-4 to Llama 4. Every significant launch with parameters, context window, key capability, and open/closed status.
By Michael Lip · Updated April 2026
Methodology
Release dates are sourced from official provider announcements, blog posts, and GitHub release pages (queried via GitHub API on April 11, 2026 for meta-llama/llama-models, mistralai/mistral-inference, google/gemma, and deepseek-ai/DeepSeek-V3). Parameter counts are from official technical reports where available; estimated counts are marked with ~. Context windows from official documentation. "Open" means weights are publicly downloadable; "Closed" means API-only access.
| Date | Model | Provider | Parameters | Context | Open | Key Capability |
|---|---|---|---|---|---|---|
| 2023-02 | LLaMA 1 (7B-65B) | Meta | 7B-65B | 2K | Yes | First competitive open foundation model |
| 2023-03 | GPT-4 | OpenAI | ~1.8T MoE | 8K/32K | No | Multimodal, reasoning leap over GPT-3.5 |
| 2023-03 | Claude 1 | Anthropic | ~52B | 100K | No | 100K context window, Constitutional AI |
| 2023-07 | Claude 2 | Anthropic | ~70B | 100K | No | Improved coding and reasoning |
| 2023-07 | Llama 2 (7B-70B) | Meta | 7B-70B | 4K | Yes | First commercial open-weights model |
| 2023-09 | Mistral 7B | Mistral | 7B | 32K | Yes | Best 7B model, Apache 2.0 license |
| 2023-11 | GPT-4 Turbo | OpenAI | ~1.8T MoE | 128K | No | 128K context, 3x cheaper than GPT-4 |
| 2023-12 | Gemini 1.0 Pro | ~50B | 32K | No | Google's first Gemini model | |
| 2023-12 | Mixtral 8x7B | Mistral | 46.7B MoE | 32K | Yes | First major open MoE model |
| 2024-02 | Gemini 1.5 Pro | ~300B MoE | 1M | No | 1 million token context window | |
| 2024-02 | Gemma 1 (2B/7B) | 2B/7B | 8K | Yes | Google's first open model family | |
| 2024-03 | Claude 3 Opus | Anthropic | ~200B | 200K | No | New frontier, matched GPT-4 |
| 2024-03 | Claude 3 Sonnet | Anthropic | ~70B | 200K | No | Best mid-tier quality/cost ratio |
| 2024-03 | Claude 3 Haiku | Anthropic | ~20B | 200K | No | Fast, cheap, 200K context |
| 2024-03 | DBRX | Databricks | 132B MoE | 32K | Yes | Enterprise open MoE model |
| 2024-04 | Llama 3 (8B/70B) | Meta | 8B/70B | 8K | Yes | Significant quality jump, 15T training tokens |
| 2024-04 | Phi-3 (mini/medium) | Microsoft | 3.8B/14B | 4K/128K | Yes | Small model achieving GPT-3.5 quality |
| 2024-05 | GPT-4o | OpenAI | ~200B | 128K | No | Omni: native audio, vision, text. 2x faster, 50% cheaper |
| 2024-06 | Gemma 2 (9B/27B) | 9B/27B | 8K | Yes | Knowledge distillation from larger models | |
| 2024-06 | Qwen 2 (0.5B-72B) | Alibaba | 0.5B-72B | 128K | Yes | Multilingual, strong coding performance |
| 2024-07 | GPT-4o mini | OpenAI | ~8B | 128K | No | 97% cheaper than GPT-4, replaces 3.5 Turbo |
| 2024-07 | Llama 3.1 (8B-405B) | Meta | 8B-405B | 128K | Yes | 405B: largest open model, tool use, 128K context |
| 2024-07 | Mistral Large 2 | Mistral | 123B | 128K | No | Competitive frontier from European startup |
| 2024-09 | o1-preview | OpenAI | ~200B | 128K | No | Reasoning model: chain-of-thought at inference time |
| 2024-09 | Qwen 2.5 (0.5B-72B) | Alibaba | 0.5B-72B | 128K | Yes | Coder variant, improved math and code |
| 2024-10 | Claude 3.5 Sonnet | Anthropic | ~70B | 200K | No | Matched Opus quality at Sonnet price |
| 2024-11 | Grok-2 | xAI | ~300B MoE | 128K | No | Real-time information access via X/Twitter |
| 2024-12 | Llama 3.3 70B | Meta | 70B | 128K | Yes | 405B quality distilled into 70B |
| 2024-12 | DeepSeek V3 | DeepSeek | 671B MoE | 128K | Yes | Frontier quality at $5.5M training cost |
| 2024-12 | Phi-4 | Microsoft | 14B | 16K | Yes | STEM and reasoning focus, punches above weight |
| 2025-01 | DeepSeek R1 | DeepSeek | 671B MoE | 128K | Yes | Open reasoning model matching o1 |
| 2025-01 | Codestral 25.01 | Mistral | 22B | 256K | Yes | Code-specialized, 256K context |
| 2025-02 | GPT-4.5 | OpenAI | ~1.8T | 128K | No | Largest dense model, premium research tier |
| 2025-03 | Gemini 2.5 Pro | ~300B MoE | 1M | No | Thinking model with 1M context | |
| 2025-03 | Gemini 2.5 Flash | ~50B MoE | 1M | No | Fast + thinking mode, 1M context | |
| 2025-03 | Mistral Small 3.1 | Mistral | 24B | 128K | Yes | Vision + speed, $0.10/1M tokens |
| 2025-04 | Llama 4 Scout/Maverick | Meta | 17B-400B MoE | 10M | Yes | 10M context, native multimodality |
| 2025-04 | o3 | OpenAI | ~200B | 200K | No | Advanced reasoning, tool use, 200K context |
| 2025-04 | o4-mini | OpenAI | ~8B | 200K | No | Budget reasoning model |
| 2025-04 | Qwen 3 (0.6B-235B) | Alibaba | 0.6B-235B | 128K | Yes | Thinking + non-thinking modes, MoE |
| 2025-06 | Claude Opus 4.6 | Anthropic | ~300B | 200K | No | Best coding model, extended thinking, agents |
| 2025-06 | Claude Sonnet 4 | Anthropic | ~70B | 200K | No | Near-Opus quality at Sonnet price |
| 2025-10 | Claude Haiku 3.5 | Anthropic | ~20B | 200K | No | Fast, cheap, 200K context |
Parameter counts with ~ are estimated. MoE = Mixture of Experts (active parameters are lower than total). Context shown in tokens unless otherwise noted.
Frequently Asked Questions
How many major AI models were released in 2024?
2024 saw approximately 25 major model releases, making it the most prolific year. Key releases included GPT-4o and GPT-4o mini (OpenAI), Claude 3 family and Claude 3.5 Sonnet (Anthropic), Gemini 1.5 Pro and Flash (Google), Llama 3 and 3.1 (Meta), Mistral Large and Codestral (Mistral), and DeepSeek V3. Multiple frontier models launched each quarter.
What was the most significant AI model release?
Several were transformative: GPT-4 (March 2023) proved multimodal AI was viable; Llama 2 (July 2023) democratized open models; GPT-4o (May 2024) brought frontier quality at 6x lower cost; DeepSeek V3 (December 2024) showed frontier quality at dramatically lower training cost; o1 (September 2024) introduced reasoning-time compute scaling.
Are open-source models catching up to proprietary ones?
Yes, significantly. In early 2023, GPT-4 led open models by 15-20% on benchmarks. By late 2024, Llama 3.1 405B and DeepSeek V3 matched GPT-4 Turbo. DeepSeek R1 matched o1 on reasoning. The remaining gap is primarily in the largest frontier models (GPT-4.5, Claude Opus 4.6) which maintain a 3-5% edge on the hardest benchmarks.
What is the trend in AI model sizes?
Sizes scaled up through 2023, then shifted toward efficiency in 2024-2025. MoE became dominant: DeepSeek V3 (671B total, ~37B active) achieves high quality with lower inference costs. Small models also improved: Phi-4 (14B) and Mistral Small 3.1 (24B) match 2023-era 70B models on many tasks.
Which company has released the most AI models?
Meta leads in volume with the Llama series across multiple variants. Google follows with Gemini/Gemma. OpenAI has fewer but more impactful releases. Alibaba's Qwen team has been prolific with Qwen 2/2.5/3. Mistral released the most models relative to company size.