Artificial Intelligence Timeline
An interactive timeline tracing artificial intelligence from its mythological roots in antiquity through the computing revolution, neural network breakthroughs, and into projected futures.
How to use
Pan & zoom — scroll to zoom, drag to pan. The time axis is adaptive: antiquity is compressed while the recent deep learning era is stretched for detail.
Click any event dot for details. Use Search or Surprise me to discover events.
Layers — toggle AI eras, computing eras, the year grid, and the present line.
Trends — choose a companion data curve to overlay on the right axis: Training Compute, METR Task Horizon, AI Index Benchmarks, Context Window, or Inference Cost. Moore's Law remains the always-on backbone on the left.
Categories — filter events by type: mythology, neural networks, game-playing AI, ethics, models, policy, and more.
Data curves
Training Compute anchors the chart on the left Y-axis and is on by default. One companion curve can be added on the right Y-axis, chosen from a radio group. Each curve has its own unit; hovering any data point reveals model, year, exact value and a short description. Red hotspots for Key Events are always anchored to the Training Compute curve.
Training Compute (cyan, left axis) charts the exponential growth of compute used to train AI systems, measured in floating-point operations (FLOP). The cyan line is the frontier envelope — at any moment it tracks the most compute-intensive model published to date, jumping upward each time a new record is set. Faint cyan dots in the background show every notable model in the dataset (521 systems), not only the record-holders.
Show by Lab (sub-toggle under Training Compute) recolors the curve to reveal which research labs have driven the trend. Each lab's own monotonic record progression appears as a colored line, the frontier envelope shows colored segments for the labs you've selected, and the background scatter dots tint to their lab color. Solo a lab with the S button to isolate its trajectory; the Defaults button restores the five major frontier labs (Google/DeepMind, OpenAI, Anthropic, Meta, xAI).
Companion curves (right axis, one at a time)
Moore's Law (blue) shows transistor counts in microprocessors from the Intel 4004 to modern AI accelerators.
METR Task Horizon (amber) plots the length of real software-engineering tasks — in minutes of expert human time — that each frontier AI model can complete with 50% success. Horizon has roughly doubled every seven months since 2019, rising from ~3 seconds for GPT-2 to ~12 hours for Claude Opus 4.6. Scatter dots show every evaluated model; the amber line is the monotonic envelope.
Benchmarks vs Human (multi-color) reproduces Figure 2.1.1 of the Stanford HAI AI Index 2026 report. Eleven reference benchmarks — ImageNet, SuperGLUE, MMLU, GPQA Diamond, OSWorld, SWE-bench, VQA, SQuAD 2.0, MATH, MMMU and AIME — are scaled so that the human baseline = 100%. Solid lines track active benchmarks; dashed lines mark benchmarks that have saturated. The dashed grey line at 100% is the human baseline.
Context Window (green) shows the maximum number of input tokens frontier language models can reason over in a single call, from GPT-2's 1,024 tokens to today's 1–10 million-token systems. The green line is the record-so-far envelope; faint dots are non-record models.
Inference Cost (magenta) shows the falling price — in USD per million output tokens — of frontier-class model inference. The cheapest-so-far envelope has dropped roughly 10× per year since 2022, sometimes called the 'other Moore's Law' of AI.
The future zone
Beyond 2026, the dashed boundary marks projected territory. Scenario fans for training compute show baseline, accelerated, and constrained trajectories.
Sources
Training Compute — Epoch AI · Data on AI Models. Frontier envelope and per-lab progressions derived at load time from Epoch's dataset (CC BY 4.0).
Moore's Law — Karl Rupp's microprocessor trend dataset, drawing on data collected by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, C. Batten and others.
METR Task Horizon — METR (Model Evaluation & Threat Research), Measuring AI Ability to Complete Long Tasks v1.1 benchmark. Paper: Kwa, West, Becker et al. (2025), arXiv:2503.14499. Released under CC BY 4.0. p50 horizon length drawn from METR's public benchmark_results_1_1.yaml.
Benchmarks vs Human — Stanford HAI, AI Index 2026 Annual Report, Chapter 2 "Technical Performance", Figure 2.1.1 (page 76). Values digitized from the published figure. Report licensed CC BY-ND 4.0; numeric benchmark values are factual data and the figure is redrawn independently here.
Context Window — compiled from OpenAI, Anthropic, Google DeepMind and Meta release announcements (2019–2026). Cross-referenced with taylorwilsdon/llm-context-limits (MIT).
Inference Cost — Epoch AI, LLM Inference Price Trends (CC BY 4.0), cross-referenced with Artificial Analysis and provider pricing pages.
Events — compiled from peer-reviewed literature, historical records, arXiv preprints, official press releases, and specialist AI history references.