Deep Learning

What is a Neural Network?

A neural network is a machine learning model composed of layers of interconnected nodes (neurons) that learn to map inputs to outputs by adjusting connection weights through training. Inspired loosely by the brain, neural networks can approximate complex non-linear functions. They are the foundation of deep learning and underpin every embedding model, language model, and reranker used in modern search and AI systems.

Why do neural networks matter for search and inference?

Every component in a modern semantic search or RAG pipeline (the embedding model, the reranker, the language model) is a neural network. Understanding how they work helps you:

Choose the right model architecture for your task
Reason about trade-offs between model size, latency, and accuracy
Understand what fine-tuning and LoRA adaptation actually change
Debug retrieval failures

How does a neural network work?

A neural network is organised into layers:

Input layer → Hidden layer(s) → Output layer

Each neuron applies a weighted sum of its inputs, adds a bias, and passes the result through an activation function:

output = activation(w₁x₁ + w₂x₂ + ... + wₙxₙ + bias)

The network learns by adjusting weights to minimise a loss function via backpropagation and gradient descent.

What are activation functions?

Activation functions introduce non-linearity. Without them, stacking linear layers is equivalent to a single linear layer, limiting what the network can learn.

Activation	Formula	Properties
ReLU	max(0, x)	Fast, avoids vanishing gradient, can “die”
Sigmoid	1/(1+e⁻ˣ)	Outputs (0,1), used in binary output layers
Tanh	(eˣ-e⁻ˣ)/(eˣ+e⁻ˣ)	Outputs (-1,1), centred
GeLU	x·Φ(x)	Smooth ReLU, used in transformers
Softmax	eˣⁱ/Σeˣʲ	Multi-class output probabilities

GeLU is the activation used in most modern transformer-based embedding models.

Key neural network architectures for inference

Architecture	Primary use in inference
MLP (Feedforward)	Classification heads, tabular tasks
CNN	Image encoding, OCR, visual document processing
RNN / LSTM	Sequence modelling (largely superseded)
Transformer	Text embedding, reranking, language generation
Vision Transformer (ViT)	Image embedding, multimodal models

Transformer-based models dominate modern embedding and reranking. SIE hosts 85+ models built on these architectures.

Shallow vs deep networks

Shallow networks (1-2 hidden layers) can approximate many functions but require exponentially more neurons to match the efficiency of deeper networks on complex tasks.

Deep networks (many layers) learn hierarchical representations: early layers detect simple patterns, later layers compose them into complex concepts. This hierarchy is what makes transformers so effective at language.

What does “parameters” mean in a model?

Parameters = the total count of learnable weights and biases. More parameters → more capacity → higher accuracy potential, but also more memory, slower inference, and more data needed to train.

Model	Parameters	Use case
BAAI/bge-small-en	33M	Fast, lightweight embedding
BAAI/bge-m3	570M	Multilingual, high accuracy
BAAI/bge-reranker-v2-gemma	2.5B	Highest-accuracy reranking

SIE’s GPU batching makes serving larger models at production scale practical. See the model hub for full specs.

Frequently asked questions

How is a neural network different from a traditional ML model? Traditional models (linear regression, decision trees, SVMs) require hand-crafted features. Neural networks learn their own feature representations directly from raw data.

What is overfitting in a neural network? When a model memorises training data instead of generalising. Detected by a growing gap between training and validation loss. Addressed with dropout, regularisation, early stopping, and more data.

What does self-hosted inference mean for neural networks? Running the neural network’s forward pass (prediction step) on your own hardware rather than sending inputs to a cloud API. SIE provides this for embedding and reranking models.