Why did we open-source our inference engine? Read the post
← All Glossary Articles

What is a Neural Network?

A neural network is a machine learning model composed of layers of interconnected nodes (neurons) that learn to map inputs to outputs by adjusting connection weights through training. Inspired loosely by the brain, neural networks can approximate complex non-linear functions. They are the foundation of deep learning and underpin every embedding model, language model, and reranker used in modern search and AI systems.


Why do neural networks matter for search and inference?

Every component in a modern semantic search or RAG pipeline (the embedding model, the reranker, the language model) is a neural network. Understanding how they work helps you:

  • Choose the right model architecture for your task
  • Reason about trade-offs between model size, latency, and accuracy
  • Understand what fine-tuning and LoRA adaptation actually change
  • Debug retrieval failures

How does a neural network work?

A neural network is organised into layers:

Input layer → Hidden layer(s) → Output layer

Each neuron applies a weighted sum of its inputs, adds a bias, and passes the result through an activation function:

output = activation(w₁x₁ + w₂x₂ + ... + wₙxₙ + bias)

The network learns by adjusting weights to minimise a loss function via backpropagation and gradient descent.


What are activation functions?

Activation functions introduce non-linearity. Without them, stacking linear layers is equivalent to a single linear layer, limiting what the network can learn.

ActivationFormulaProperties
ReLUmax(0, x)Fast, avoids vanishing gradient, can “die”
Sigmoid1/(1+e⁻ˣ)Outputs (0,1), used in binary output layers
Tanh(eˣ-e⁻ˣ)/(eˣ+e⁻ˣ)Outputs (-1,1), centred
GeLUx·Φ(x)Smooth ReLU, used in transformers
Softmaxeˣⁱ/ΣeˣʲMulti-class output probabilities

GeLU is the activation used in most modern transformer-based embedding models.


Key neural network architectures for inference

ArchitecturePrimary use in inference
MLP (Feedforward)Classification heads, tabular tasks
CNNImage encoding, OCR, visual document processing
RNN / LSTMSequence modelling (largely superseded)
TransformerText embedding, reranking, language generation
Vision Transformer (ViT)Image embedding, multimodal models

Transformer-based models dominate modern embedding and reranking. SIE hosts 85+ models built on these architectures.


Shallow vs deep networks

Shallow networks (1-2 hidden layers) can approximate many functions but require exponentially more neurons to match the efficiency of deeper networks on complex tasks.

Deep networks (many layers) learn hierarchical representations: early layers detect simple patterns, later layers compose them into complex concepts. This hierarchy is what makes transformers so effective at language.


What does “parameters” mean in a model?

Parameters = the total count of learnable weights and biases. More parameters → more capacity → higher accuracy potential, but also more memory, slower inference, and more data needed to train.

ModelParametersUse case
BAAI/bge-small-en33MFast, lightweight embedding
BAAI/bge-m3570MMultilingual, high accuracy
BAAI/bge-reranker-v2-gemma2.5BHighest-accuracy reranking

SIE’s GPU batching makes serving larger models at production scale practical. See the model hub for full specs.


Frequently asked questions

How is a neural network different from a traditional ML model? Traditional models (linear regression, decision trees, SVMs) require hand-crafted features. Neural networks learn their own feature representations directly from raw data.

What is overfitting in a neural network? When a model memorises training data instead of generalising. Detected by a growing gap between training and validation loss. Addressed with dropout, regularisation, early stopping, and more data.

What does self-hosted inference mean for neural networks? Running the neural network’s forward pass (prediction step) on your own hardware rather than sending inputs to a cloud API. SIE provides this for embedding and reranking models.


Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.