Why did we open-source our inference engine? Read the post

Glossary

Learn about key machine learning and AI concepts, algorithms, and techniques.

Model Training
Backpropagation

Master the backpropagation algorithm: how neural networks learn through gradient descent, chain rule, and error propagation. Complete guide with examples and implementations.

Read article
Traditional ML
Bag of Words

Learn how to implement and use the Bag-of-Words model in Natural Language Processing using Python. Discover practical applications in text classification, sentiment analysis, and topic modeling, with code examples using scikit-learn and TF-IDF techniques.

Read article
Supervised Learning
Classification

A comprehensive guide to binary and multi-class classification in machine learning. Learn about sigmoid and softmax functions, essential evaluation metrics like accuracy, precision, recall, and F1 score, and practical considerations for model performance. Discover how to handle challenges such as class imbalance, feature engineering, and model selection. Perfect for data scientists and ML practitioners looking to master classification techniques for real-world applications.

Read article
Unsupervised Learning
Clustering

A comprehensive guide to clustering in machine learning and data analysis. Explore essential techniques including K-means, hierarchical, density-based, and model-based clustering algorithms. Learn about similarity measures, real-world applications in market segmentation, anomaly detection, and bioinformatics.

Read article
Deep Learning
CNNs

Explore Convolutional Neural Networks (CNNs) and how they reshaped computer vision. Learn about network architecture, including convolutional, pooling, and fully connected layers, image recognition processes, and real-world applications in object detection, semantic segmentation, and medical imaging.

Read article
Pre-processing
Data Augmentation

Learn data augmentation techniques to expand datasets artificially. Discover image, audio, text, and time series augmentation methods using GANs, transformations, and synthetic data generation.

Read article
Pre-processing
Data Cleaning

Master data cleaning techniques for accurate analysis. Learn data preprocessing methods, outlier detection, missing value imputation, and machine learning approaches to ensure data quality and reliability.

Read article
Supervised Learning
Decision Trees

An in-depth guide to decision forests in machine learning, covering random forests and gradient boosted trees. Learn about tree construction, ensemble methods, hyperparameter tuning, and real-world applications in finance, healthcare, and marketing. Perfect for data scientists and ML engineers working with tabular data and seeking interpretable, high-performance models.

Read article
Unsupervised Learning
Dimensionality Reduction

Explore dimensionality reduction techniques including PCA, t-SNE, and autoencoders. Learn how to combat the curse of dimensionality, improve model performance, and efficiently handle high-dimensional data.

Read article
Pre-processing
Feature Engineering

Learn feature engineering techniques for machine learning success. Discover data preprocessing methods, dimensionality reduction, automated feature selection, and real-world applications to boost model performance.

Read article
Pre-processing
Feature Scaling

Master feature scaling and normalization techniques for ML. Learn min-max scaling vs standardization with real-world applications and implementation tips.

Read article
Pre-processing
Feature Selection

Learn essential feature selection techniques for machine learning. Discover filter, wrapper, and embedded methods to improve model performance and reduce overfitting.

Read article
Deep Learning
GANs

Explore Generative Adversarial Networks (GANs), their architecture, training process, and applications. Learn about generator and discriminator networks, loss functions, and recent advancements in image synthesis, data augmentation, and text generation. Essential reading for AI researchers, machine learning practitioners, and deep learning enthusiasts interested in generative models.

Read article
Traditional ML
Gradient Boosting

Explore how boosting algorithms like AdaBoost and Gradient Boosting transform weak learners into powerful predictive models. Discover practical applications in fraud detection, medical diagnosis, and credit risk assessment, with insights on implementation and best practices.

Read article
Deep Learning
Graph Networks

Explore Graph Neural Networks (GNNs) and their applications in analyzing graph-structured data. Learn about key architectures including GCNs, GATs, and GINs, message passing mechanisms, and applications in computer vision, drug discovery, and physics.

Read article
Inference
How Do You Deploy an Embedding Model on AWS?

Deploying an embedding model on AWS with SIE requires three steps: configure your GPU cluster using the SIE Terraform module, deploy the inference server with Helm, and connect using the SIE SDK. The process takes under 30 minutes and produces a production-grade inference cluster in your own AWS account with automat...

Read article
Inference
How Do You Deploy an Embedding Model on GCP?

Deploying an embedding model on GCP with SIE requires configuring a GPU cluster using the SIE Terraform module for Google Cloud, deploying the inference server with Helm, and connecting via the SIE SDK. Your data stays within your GCP project, costs are up to 50× lower than managed API pricing, and the full deployme...

Read article
Search & Retrieval
How Does Chroma Work with Embedding Models?

Chroma is an open-source vector database designed for simplicity. It runs in-process with Python, requires no infrastructure setup for development, and persists to disk or runs as a server for production. It works with embedding models by accepting vectors at insert time, generated by SIE, and enables cosine simila...

Read article
Search & Retrieval
How Does Haystack Work with Embedding Models?

Haystack is an open-source framework for building NLP and RAG pipelines by connecting modular components (document stores, retrievers, readers, and generators) into composable directed acyclic graphs. It works with embedding models by providing retriever components that call the embedding model at query and index ...

Read article
Search & Retrieval
How Does Qdrant Work with Embedding Models?

Qdrant is an open-source vector database that stores embedding vectors alongside payload metadata and enables fast approximate nearest neighbour (ANN) search, filtered search, and hybrid (dense + sparse) search. It works with embedding models by receiving the vectors they produce (generated by SIE) and indexing th...

Read article
Inference
How Does SIE Compare to Infinity?

SIE (Superlinked Inference Engine) and Infinity are both open-source servers for self-hosting text embedding and reranking models. Infinity is a lightweight, fast single-model server with a focus on OpenAI-compatible API endpoints. SIE is a broader inference platform with multi-model support, LoRA hot-loading, GPU c...

Read article
Search & Retrieval
How Does Weaviate Work with Embedding Models?

Weaviate is an open-source vector database that stores objects with vector representations and enables semantic search, hybrid search, and filtered retrieval via a GraphQL or REST API. It works with embedding models by accepting vectors at insert time (generated by SIE) and indexing them in HNSW graphs for fast AN...

Read article
Model Training
Loss Functions

Master loss functions in machine learning: MSE, cross-entropy, MAE & more. Complete guide to choosing the right loss function for neural network training and model optimization.

Read article
Deep Learning
Neural Networks

Explore neural networks from fundamentals to advanced concepts, including network architecture, backpropagation, and multi-class classification techniques. Learn about activation functions, training processes, and the motivation behind deep learning. Essential reading for AI researchers, machine learning engineers, and developers working with deep neural networks.

Read article
Model Training
Optimizer

Complete guide to machine learning optimizers: SGD, Adam, RMSprop & gradient descent. Learn neural network training algorithms with real-world examples.

Read article
Recommendation System
Recommendation Systems

A guide to recommendation systems, covering content-based filtering, collaborative filtering, and hybrid approaches. Learn about embeddings, candidate generation techniques, and the role of neural networks in personalization. Essential reading for data scientists, ML engineers, and developers building modern recommendation engines and personalized user experiences.

Read article
Supervised Learning
Regression

Master the fundamentals of linear and logistic regression in machine learning. Explore key concepts including features, weights, bias, and optimization techniques. Learn about cost functions, regularization methods, evaluation metrics, and practical considerations for model performance. Essential reading for data scientists and analysts working with predictive modeling and statistical analysis.

Read article
Deep Learning
RNNs

Explore Recurrent Neural Networks (RNNs) and their role in processing sequential data. Learn about LSTM architecture, variants like GRUs and Bidirectional LSTMs, and applications in NLP, speech recognition, and time series analysis.

Read article
Inference
SIE vs TEI: How Do They Compare?

SIE (Superlinked Inference Engine) and TEI (Text Embeddings Inference by Hugging Face) are both open-source servers for self-hosting text embedding models. TEI is a lightweight, single-model server focused on embeddings. SIE is a broader inference platform supporting multiple simultaneous models, rerankers, extracti...

Read article
Supervised Learning
SVMs

A detailed exploration of Support Vector Machines (SVMs), covering their mathematical principles, types, and real-world applications. Learn about linear and nonlinear classifications, kernel functions, and how SVMs compare to other machine learning algorithms in text classification, image recognition, and financial prediction tasks.

Read article
Deep Learning
Transformers

A comprehensive guide to Transformer neural networks, exploring the architecture that reshaped natural language processing. Learn about self-attention mechanisms, encoder-decoder structures, and how Transformers overcome traditional RNN limitations. Discover their applications in language modeling, machine translation, and emerging limitations in computational complexity and interpretability.

Read article
Search & Retrieval
What is a Chunking Strategy for RAG?

A chunking strategy is the approach used to split source documents into smaller segments before encoding them into vectors for a RAG (Retrieval-Augmented Generation) pipeline. The size, overlap, and boundary logic of chunks directly affects retrieval quality. Chunks that are too large compress too much information ...

Read article
Pre-processing
What is a Document Extraction Pipeline?

A document extraction pipeline is an automated system that ingests raw documents (PDFs, Word files, scanned images, HTML pages), extracts structured information from them (text, tables, entities, metadata), and prepares the output for downstream use in search indexes, databases, or RAG systems. It combines OCR, layo...

Read article
Model Adaptation
What is a LoRA Adapter?

A LoRA (Low-Rank Adaptation) adapter is a lightweight set of trainable weight matrices added to specific layers of a pre-trained neural network. During fine-tuning, only the LoRA weights are updated; the base model weights remain frozen. This reduces the number of trainable parameters by 100-1000x compared to full ...

Read article
Search & Retrieval
What is a Reranker?

A reranker is a model that takes a query and a set of candidate results from an initial retrieval step, and re-scores them for relevance. Unlike embedding models that encode query and documents independently, rerankers compare them jointly, producing more accurate relevance scores at the cost of higher latency.

Read article
Search & Retrieval
What is a Reranking Pipeline?

A reranking pipeline is a two-stage retrieval architecture where a fast first-stage retriever (embedding model + vector DB) fetches a broad set of candidates, and a slower but more accurate second-stage reranker (cross-encoder) re-scores and reorders them. The result is retrieval that combines the scalability of ANN...

Read article
Models
What is a Text Embedding Model?

A text embedding model is a neural network that converts text (a word, sentence, paragraph, or document) into a dense numerical vector that captures its semantic meaning. Texts with similar meanings produce vectors that are close together in the embedding space, enabling similarity search, clustering, classificati...

Read article
Search & Retrieval
What is a Vector Database?

A vector database is a database purpose-built for storing, indexing, and querying high-dimensional numerical vectors. Unlike traditional databases that query by exact value or keyword, vector databases find the nearest vectors to a query vector using approximate nearest neighbour (ANN) algorithms, enabling semantic...

Read article
Search & Retrieval
What is a Vector Index?

A vector index is a data structure that organises high-dimensional vectors to enable fast approximate nearest neighbour (ANN) search. Instead of comparing a query vector to every stored vector, the index groups or graphs vectors in ways that allow search to skip irrelevant regions of the vector space, reducing quer...

Read article
Deep Learning
What is an OCR Model?

An OCR (Optical Character Recognition) model is a machine learning system that extracts text from images, scanned documents, or PDFs. Modern OCR models use deep learning (typically a CNN encoder to detect text regions and a sequence model to decode characters) and can handle handwriting, complex layouts, tables, a...

Read article
Search & Retrieval
What is Approximate Nearest Neighbour (ANN) Search?

Approximate Nearest Neighbour (ANN) search finds vectors that are close to a query vector in high-dimensional space (not necessarily the exact closest, but close enough) in milliseconds rather than the seconds or minutes exact search would require. It is the core retrieval algorithm inside every vector database an...

Read article
Models
What is BGE-M3?

BGE-M3 is an open-source text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence) that supports three retrieval modes simultaneously: dense retrieval, sparse retrieval, and multi-vector (ColBERT-style) retrieval. It supports 100+ languages and is one of the highest-performing general-purpo...

Read article
Search & Retrieval
What is ColBERT?

ColBERT (Contextualized Late Interaction over BERT) is a neural retrieval model that encodes queries and documents into token-level vector representations and scores their relevance using a MaxSim operation that finds the best-matching document token for each query token, then sums those scores. It achieves near c...

Read article
Inference
What is GPU Utilisation in Inference?

GPU utilisation in inference refers to the percentage of a GPU's compute capacity actively used when serving model predictions. High GPU utilisation (70-95%) means you're getting maximum throughput per dollar. Low utilisation (under 30%) means you're paying for idle hardware. Efficient batching, model loading strate...

Read article
Search & Retrieval
What is Hybrid Search?

Hybrid search combines dense vector (semantic) search with sparse keyword (BM25-style) search, merging results from both to produce a single ranked list. It captures the strengths of each approach (semantic understanding from dense retrieval and exact-term precision from sparse retrieval), making it more accurate t...

Read article
Models
What is Instruction-Following Embedding?

Instruction-following embedding is a technique where a natural language instruction is prepended to the input text before encoding, telling the embedding model what type of representation to produce. Instead of always generating the same vector for a given text, the model adapts its output based on the task describ...

Read article
Search & Retrieval
What is Late Interaction Retrieval?

Late interaction retrieval is a neural search architecture where the query and document are encoded independently into token-level vectors, and their similarity is computed at retrieval time using a fine-grained token matching function (typically MaxSim). It sits between bi-encoders (fast but less accurate) and cros...

Read article
Search & Retrieval
What is Multi-Vector Search?

Multi-vector search is a retrieval technique where each document is represented by multiple vectors (one per token or passage) rather than a single fixed-size vector. At query time, the query's token vectors are compared against all document token vectors, enabling fine-grained token-level matching that captures n...

Read article
Search & Retrieval
What is RAG (Retrieval-Augmented Generation)?

Retrieval-Augmented Generation (RAG) is an AI architecture that combines a retrieval system with a large language model (LLM). Instead of relying solely on the LLM's training data, RAG first retrieves relevant documents from a knowledge base and passes them as context to the LLM, producing answers that are grounded...

Read article
Inference
What is Self-Hosted Inference?

Self-hosted inference is the practice of running AI model inference on your own infrastructure (your own cloud account such as AWS, GCP, Azure, or on-premises hardware) rather than sending requests to a third-party managed API. You control the hardware, the models, the configuration, and crucially, where your data goes.

Read article
Search & Retrieval
What is Semantic Search?

Semantic search is a retrieval technique that finds results based on the meaning of a query rather than exact keyword matches. Instead of matching words character-by-character, it converts text into vector embeddings and retrieves items whose embeddings are closest to the query's embedding in high-dimensional space.

Read article

Self-hosted inference for search & document processing

Cut API costs by 50x, boost quality with 85+ SOTA models, and keep your data in your own cloud.

Github 2.0K

Contact us

Tell us about your use case and we'll get back to you shortly.