---
title: Neural Networks
description: Explore neural networks from fundamentals to advanced concepts, including network architecture, backpropagation, and multi-class classification techniques. Learn about activation functions, training processes, and the motivation behind deep learning. Essential reading for AI researchers, machine learning engineers, and developers working with deep neural networks.
canonical_url: https://superlinked.com/glossary/neural-networks
last_updated: 2026-06-11
---

# What is a Neural Network?

A neural network is a machine learning model composed of layers of interconnected nodes (neurons) that learn to map inputs to outputs by adjusting connection weights through training. Inspired loosely by the brain, neural networks can approximate complex non-linear functions. They are the foundation of deep learning and underpin every embedding model, language model, and reranker used in modern search and AI systems.

---

## Why do neural networks matter for search and inference?

Every component in a modern semantic search or RAG pipeline (the embedding model, the reranker, the language model) is a neural network. Understanding how they work helps you:

- Choose the right model architecture for your task
- Reason about trade-offs between model size, latency, and accuracy
- Understand what fine-tuning and LoRA adaptation actually change
- Debug retrieval failures

---

## How does a neural network work?

A neural network is organised into **layers**:

```
Input layer → Hidden layer(s) → Output layer
```

Each neuron applies a weighted sum of its inputs, adds a bias, and passes the result through an **activation function**:

```
output = activation(w₁x₁ + w₂x₂ + ... + wₙxₙ + bias)
```

The network learns by adjusting weights to minimise a loss function via [backpropagation](/glossary/backpropagation) and gradient descent.

---

## What are activation functions?

Activation functions introduce non-linearity. Without them, stacking linear layers is equivalent to a single linear layer, limiting what the network can learn.

| Activation | Formula | Properties |
|---|---|---|
| ReLU | max(0, x) | Fast, avoids vanishing gradient, can "die" |
| Sigmoid | 1/(1+e⁻ˣ) | Outputs (0,1), used in binary output layers |
| Tanh | (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) | Outputs (-1,1), centred |
| GeLU | x·Φ(x) | Smooth ReLU, used in transformers |
| Softmax | eˣⁱ/Σeˣʲ | Multi-class output probabilities |

GeLU is the activation used in most modern transformer-based embedding models.

---

## Key neural network architectures for inference

| Architecture | Primary use in inference |
|---|---|
| MLP (Feedforward) | Classification heads, tabular tasks |
| CNN | Image encoding, OCR, visual document processing |
| RNN / LSTM | Sequence modelling (largely superseded) |
| Transformer | Text embedding, reranking, language generation |
| Vision Transformer (ViT) | Image embedding, multimodal models |

Transformer-based models dominate modern embedding and reranking. SIE hosts 85+ models built on these architectures.

---

## Shallow vs deep networks

**Shallow networks** (1-2 hidden layers) can approximate many functions but require exponentially more neurons to match the efficiency of deeper networks on complex tasks.

**Deep networks** (many layers) learn hierarchical representations: early layers detect simple patterns, later layers compose them into complex concepts. This hierarchy is what makes transformers so effective at language.

---

## What does "parameters" mean in a model?

Parameters = the total count of learnable weights and biases. More parameters → more capacity → higher accuracy potential, but also more memory, slower inference, and more data needed to train.

| Model | Parameters | Use case |
|---|---|---|
| BAAI/bge-small-en | 33M | Fast, lightweight embedding |
| BAAI/bge-m3 | 570M | Multilingual, high accuracy |
| BAAI/bge-reranker-v2-gemma | 2.5B | Highest-accuracy reranking |

SIE's GPU batching makes serving larger models at production scale practical. See the [model hub](/models) for full specs.

---

## Frequently asked questions

**How is a neural network different from a traditional ML model?**
Traditional models (linear regression, decision trees, SVMs) require hand-crafted features. Neural networks learn their own feature representations directly from raw data.

**What is overfitting in a neural network?**
When a model memorises training data instead of generalising. Detected by a growing gap between training and validation loss. Addressed with dropout, regularisation, early stopping, and more data.

**What does self-hosted inference mean for neural networks?**
Running the neural network's forward pass (prediction step) on your own hardware rather than sending inputs to a cloud API. SIE provides this for embedding and reranking models.

---

## Related resources

- [What is a transformer?](/glossary/transformers)
- [What is backpropagation?](/glossary/backpropagation)
- [What is a loss function?](/glossary/loss-function)
- [What is an optimizer?](/glossary/optimizer)
- [Browse models on SIE](/models)