BestiAIry
A Typology of Neural Netwrk Architectures
What is a neural network?
A neural network is not a brain. It is a mathematical system made of many small computational units, often called nodes, cells, or neurons. Each one receives numbers, transforms them, and passes new numbers onward.
At the beginning of the network, input cells receive data — the pixels of an image, the words in a sentence, the sound waves of a voice recording, or measurements from the world. At the end, output cells produce the network's answer: a label, a prediction, a generated word, an image fragment, a probability, or a decision.
Between input and output are usually many hidden cells. They are called "hidden" not because they are mysterious, but because their internal activity is not the final answer. They form intermediate representations: edges before objects, syllables before words, patterns before meanings. In this middle space, the network gradually turns raw data into something more useful.
The connections between cells are controlled by adjustable numbers called weights. A weight determines how strongly one cell influences another. If it is large, the signal matters more. If it is small, it matters less. If it is negative, it can suppress rather than reinforce. Training a neural network is, mostly, the process of adjusting these weights.
A network learns by example. During training, it produces an output, compares it with the expected answer, and slightly changes its weights to reduce the error. Repeated millions or billions of times, this process allows the network to recognise patterns, make predictions, or generate new content.
But not all neural networks are wired in the same way. The architectures in this bestiary differ because their cells and connections are arranged for different tasks — some pass information in one direction, some send signals back into themselves, some preserve memory over time, some look only at small patches of an image, some introduce randomness, some fire in pulses. This bestiary is a visual field guide to those differences.
Reading the cells
A simple network begins with three basic roles. An input cell receives the data. A hidden cell transforms it. An output cell returns the result. From there, architectures become more specialised.
A backfed input cell receives information that has looped back from a later stage of the network. This lets the system reconsider new input in light of what it has already produced or remembered.
A noisy input cell deliberately adds variation or randomness to the input. This can make a model more robust, help it generalise, or allow it to generate more varied results.
A probabilistic hidden cell does not simply pass along a fixed value. It represents uncertainty, often by working with probabilities rather than single deterministic signals.
A spiking hidden cell communicates through brief pulses, closer in spirit to the way biological neurons fire. These are used in spiking neural networks, which explore more event-based and energy-efficient forms of computation.
A capsule cell represents not just whether a feature exists, but also some of its properties — orientation, position, relationship to other features. Capsules were designed to help networks understand parts and wholes more structurally.
A matching input-output cell is used when the network compares or reconstructs data, so the input and output have the same form. This is common in architectures that compress, denoise, translate, or regenerate information.
Cells for time and sequence
A recurrent cell sends information back into the network, allowing earlier activity to influence later activity. This makes it useful for sequences such as text, speech, music, or time-series data.
A memory cell can preserve information over time instead of immediately replacing it. It gives the network a way to carry context forward across many steps.
A gated memory cell adds control mechanisms — gates — that decide what to keep, what to forget, and what to reveal. These cells were crucial in networks such as LSTMs and GRUs, which helped neural networks handle longer sequences before the rise of transformers.
Cells for images and spatial patterns
A kernel is a small set of weights that slides across data, usually an image or grid. Instead of looking at the whole input at once, it detects local patterns: edges, textures, shapes.
A convolution or pooling cell stores the result. Convolution applies a kernel across the input, allowing the same pattern detector to be reused in many places — one of the reasons convolutional networks became so powerful for image recognition. Pooling summarises nearby values to keep the most important features while shrinking the representation.
Seen this way, the icons are not decorative. They are a visual grammar. Each architecture is built from a particular combination of cells, weights, loops, memories, gates, kernels, and outputs. The history of neural networks is, in large part, the history of people discovering new ways to arrange these elements so that machines can learn different kinds of patterns.
You do not need to be an engineer to begin reading these diagrams. You only need the basic idea: information enters, is transformed through weighted connections, and exits as an answer — and different wirings give neural networks different abilities.
The frontier today
For most of the 2020s the frontier has been dominated by one design: the Transformer and its descendants, scaled up into ever-larger foundation models such as GPT-4, Claude, Gemini, Llama, and DeepSeek. These systems have proven extraordinarily capable, but they are increasingly deployed as agents — systems that take actions in the world, call tools, write code, place orders, and run for long stretches without supervision. That trajectory raises questions about safety and control that the field has only begun to answer.
A second direction is now emerging in parallel. Yoshua Bengio and others have argued for what they call a non-agentic or "Scientist AI" approach: powerful models built not to act on the world but to explain it — to act as scientific instruments rather than autonomous decision-makers. In this picture, networks like world models that predict consequences, GFlowNets that map diverse hypotheses, and modern recurrent revivals (xLSTM, RWKV, Titans, Mamba) that handle long context with less compute, all become candidates for a different kind of frontier — one where AI amplifies human reasoning without replacing human judgement.
The architectures collected here belong to both branches and to everything that came before. The bestiary is a field guide, not a verdict: read it as a record of how a young, contested, fast-moving field has arrived at its present crossroads.
About
This page is an educational resource — a field guide to the neural network architectures behind the AI revolution. It is meant to help a curious general reader make sense of what these systems actually do, where they came from, and how their wirings differ. Cell-type icons and colours are inspired by Fjodor van Veen and Stefan Leijnen's 2019 chart, A mostly complete chart of Neural Networks (Asimov Institute), a popular pedagogical visualisation of the field; the glossary is adapted from the International AI Safety Report 2026.
Spotted a missing architecture, a misplaced date, or a cell-type that ought to be a square instead of a circle? Suggestions and corrections are very welcome — please get in touch.
Suggested citation
GLOBAÏA (2026). Bestiary of AI — A Typology of Neural Network Architectures [interactive resource]. globaia.org/bestiAIry/. Accessed .