Implementing Hybrid Semantic-Lexical Search in RAG

🚀 Able to supercharge your AI workflow? Attempt ElevenLabs for AI voice and speech technology!

On this article, you’ll discover ways to implement a hybrid search technique for RAG techniques by combining BM25 lexical search with semantic search, fused collectively utilizing Reciprocal Rank Fusion.

Subjects we’ll cowl embody:

Why hybrid search outperforms both lexical or semantic search alone in retrieval-augmented technology techniques.
Find out how to implement BM25 lexical search and dense vector semantic search as unbiased retrieval engines in Python.
Find out how to merge each rankings utilizing Reciprocal Rank Fusion (RRF) to provide a closing, balanced retrieval consequence.

Let’s get straight to it.

Implementing Hybrid Semantic-Lexical Search in RAG

Introduction

Implementing hybrid search methods is a vital step in constructing fashionable RAG (Retrieval-Augmented Technology) techniques, particularly when shifting from prototype to production-ready options.

There may be little argument towards semantic search — fueled by dense vectors or embeddings, that are numerical representations of textual content — being extremely helpful at understanding semantics, synonyms, and context. Nonetheless, lexical, keyword-based search with approaches like BM25 covers a small blind spot uncared for by semantic search. Combining the perfect of each worlds is subsequently the proper recipe to take your RAG system’s retrieval mechanism the additional mile.

Let’s discover the way to implement such a hybrid search technique via a mild coding instance, guiding you thru each step of the method!

Be aware: If you’re unfamiliar with RAG techniques, you might discover the “Understanding RAG” article collection remarkably insightful for getting probably the most out of this learn. Particularly, I like to recommend buying an understanding of vector databases first via this text.

Step-by-Step Implementation

Step one is to make sure all the mandatory exterior Python libraries are put in, particularly these three:

!pip set up rank_bm25 sentence-transformers requests

!pip set up rank_bm25 sentence–transformers requests

rank_bm25: an implementation of the BM25 lexical search algorithm for info retrieval (BM stands for “Finest Matching”).
sentence-transformers: supplies pre-trained language fashions for producing textual content embeddings. In an actual setting, you might have already got your individual vector database containing many doc embeddings and never want this, however we’ll use it right here to simulate the development of a toy vector database and illustrate hybrid search on it.
requests: used to fetch the uncooked dataset bundle from a public GitHub datasets repository ready for this instance.

With these components at hand, we begin by loading the dataset and storing the uncooked texts in a listing (we achieve this as a result of it’s a small dataset).

import requests import zipfile import io import os # Downloading and extracting the dataset from the compressed file url = “https://github.com/gakudo-ai/open-datasets/uncooked/refs/heads/predominant/asia_documents.zip” response = requests.get(url) with zipfile.ZipFile(io.BytesIO(response.content material)) as z: z.extractall(“asia_data”) # Loading paperwork and getting their filenames paperwork = [] doc_names = [] for file in os.listdir(“asia_data”): if file.endswith(“.txt”): with open(f”asia_data/{file}”, “r”, encoding=”utf-8″) as f: paperwork.append(f.learn()) doc_names.append(file) print(f”Loaded {len(paperwork)} paperwork for the data base.”)

import requests

import zipfile

import io

import os

# Downloading and extracting the dataset from the compressed file

url = “https://github.com/gakudo-ai/open-datasets/uncooked/refs/heads/predominant/asia_documents.zip”

response = requests.get(url)

with zipfile.ZipFile(io.BytesIO(response.content material)) as z:

z.extractall(“asia_data”)

# Loading paperwork and getting their filenames

paperwork = []

doc_names = []

for file in os.listdir(“asia_data”):

if file.endswith(“.txt”):

with open(f“asia_data/{file}”, “r”, encoding=“utf-8”) as f:

paperwork.append(f.learn())

doc_names.append(file)

print(f“Loaded {len(paperwork)} paperwork for the data base.”)

The hybrid search course of is split into three levels: two of them happen in parallel, or independently from one another. The third is the place the fusion of each approaches occurs, utilizing a merging technique known as Reciprocal Rank Fusion (RRF).

Let’s cowl lexical search with BM25 first:

from rank_bm25 import BM25Okapi # BM25 requires that every textual content is tokenized as a (sub)checklist of phrases tokenized_corpus = [doc.lower().split() for doc in documents] bm25 = BM25Okapi(tokenized_corpus) def search_bm25(question, top_k=3): tokenized_query = question.decrease().break up() # Getting scores (lexical relevance to the question) for all paperwork scores = bm25.get_scores(tokenized_query) # Rating paperwork by rating ranked_indices = sorted(vary(len(scores)), key=lambda i: scores[i], reverse=True) return ranked_indices[:top_k], scores

from rank_bm25 import BM25Okapi

# BM25 requires that every textual content is tokenized as a (sub)checklist of phrases

tokenized_corpus = [doc.lower().split() for doc in documents]

bm25 = BM25Okapi(tokenized_corpus)

def search_bm25(question, top_k=3):

tokenized_query = question.decrease().break up()

# Getting scores (lexical relevance to the question) for all paperwork

scores = bm25.get_scores(tokenized_query)

# Rating paperwork by rating

ranked_indices = sorted(vary(len(scores)), key=lambda i: scores[i], reverse=True)

return ranked_indices[:top_k], scores

The lexical search course of has been encapsulated in a perform known as search_bm25(). This perform takes two enter arguments: a string containing the consumer’s question to the RAG system, and the variety of prime outcomes to retrieve. The rank_bm25 library supplies a get_scores() technique that computes, for every doc — handled as a set of tokens — a lexical relevance rating. We then rank paperwork by lowering rating, choose the top-okay, and return them.

In the meantime, the semantic search engine first makes use of a sentence transformer mannequin to acquire embedding vectors for the texts and the consumer question, then applies a vector similarity metric like cosine similarity to rank texts by semantic relevance and retrieve probably the most related okay:

from sentence_transformers import SentenceTransformer, util import torch # Loading the pre-trained embedding mannequin mannequin = SentenceTransformer(‘all-MiniLM-L6-v2’) # Pre-compute embeddings for our corpus (our “Vector DB”) # You don’t want this step if you have already got an exterior vector database: # you might learn and import your doc vectors as a substitute doc_embeddings = mannequin.encode(paperwork, convert_to_tensor=True) def search_semantic(question, top_k=3): # Embedding the consumer’s question right into a vector query_embedding = mannequin.encode(question, convert_to_tensor=True) # Calculating cosine similarity between the question and all paperwork cosine_scores = util.cos_sim(query_embedding, doc_embeddings)[0] # Rating paperwork by similarity ranked_indices = torch.argsort(cosine_scores, descending=True).tolist() return ranked_indices[:top_k], cosine_scores.tolist()

from sentence_transformers import SentenceTransformer, util

import torch

# Loading the pre-trained embedding mannequin

mannequin = SentenceTransformer(‘all-MiniLM-L6-v2’)

# Pre-compute embeddings for our corpus (our “Vector DB”)

# You don’t want this step if you have already got an exterior vector database:

# you might learn and import your doc vectors as a substitute

doc_embeddings = mannequin.encode(paperwork, convert_to_tensor=True)

def search_semantic(question, top_k=3):

# Embedding the consumer’s question right into a vector

query_embedding = mannequin.encode(question, convert_to_tensor=True)

# Calculating cosine similarity between the question and all paperwork

cosine_scores = util.cos_sim(query_embedding, doc_embeddings)[0]

# Rating paperwork by similarity

ranked_indices = torch.argsort(cosine_scores, descending=True).tolist()

return ranked_indices[:top_k], cosine_scores.tolist()

Time to place all of it collectively. The 2 scores calculated for every doc can not merely be added, as a result of they function on very completely different numeric scales. As an alternative, we carry out the fusion primarily based on ranks quite than uncooked similarity or relevance scores. For this, RRF is the gold trade normal for fusing rating info: it calculates an general rating for every doc by rewarding people who seem in excessive positions throughout each lists. The underlying logic is considerably much like that of the harmonic imply operator in statistics.

The overarching hybrid search course of is carried out as follows:

def hybrid_search(question, top_k=3): # 1. Acquiring the 2 standalone search rankings bm25_ranks, _ = search_bm25(question, top_k=len(paperwork)) semantic_ranks, _ = search_semantic(question, top_k=len(paperwork)) # 2. Making use of RRF formulation: RRF_score = 1 / (okay + rank) rrf_scores = {i: 0.0 for i in vary(len(paperwork))} k_constant = 60 # The worth of 60 is an ordinary tutorial conference # Including RRF scores from BM25 for rank, doc_idx in enumerate(bm25_ranks): rrf_scores[doc_idx] += 1.0 / (k_constant + rank + 1) # Including RRF scores from semantic search for rank, doc_idx in enumerate(semantic_ranks): rrf_scores[doc_idx] += 1.0 / (k_constant + rank + 1) # 3. Sorting paperwork by their closing fused RRF rating final_ranked_indices = sorted(rrf_scores.keys(), key=lambda idx: rrf_scores[idx], reverse=True) return final_ranked_indices[:top_k], rrf_scores

def hybrid_search(question, top_k=3):

# 1. Acquiring the 2 standalone search rankings

bm25_ranks, _ = search_bm25(question, top_k=len(paperwork))

semantic_ranks, _ = search_semantic(question, top_k=len(paperwork))

# 2. Making use of RRF formulation: RRF_score = 1 / (okay + rank)

rrf_scores = {i: 0.0 for i in vary(len(paperwork))}

k_constant = 60 # The worth of 60 is an ordinary tutorial conference

# Including RRF scores from BM25

for rank, doc_idx in enumerate(bm25_ranks):

rrf_scores[doc_idx] += 1.0 / (k_constant + rank + 1)

# Including RRF scores from semantic search

for rank, doc_idx in enumerate(semantic_ranks):

rrf_scores[doc_idx] += 1.0 / (k_constant + rank + 1)

# 3. Sorting paperwork by their closing fused RRF rating

final_ranked_indices = sorted(rrf_scores.keys(), key=lambda idx: rrf_scores[idx], reverse=True)

return final_ranked_indices[:top_k], rrf_scores

Now it’s time to attempt all of it out. Let’s formulate a consumer question and see what outcomes we get.

question = “Which nation is greatest recognized for rice fields and paddies?” print(f”— Question: ‘{question}’ —“) # Testing Semantic (good at understanding features like “nation-wise nuances” and conceptual titles) print(“nTop Semantic Outcomes:”) sem_indices, _ = search_semantic(question) for idx in sem_indices: print(f”- {doc_names[idx]}”) # Testing BM25 (good at discovering precise keyword-based matches like “rice”, “subject”, “paddy”) print(“nTop BM25 Outcomes:”) bm25_indices, _ = search_bm25(question) for idx in bm25_indices: print(f”- {doc_names[idx]}”) # Testing Hybrid (balances each) print(“nTop Hybrid (RRF) Outcomes:”) hybrid_indices, _ = hybrid_search(question) for idx in hybrid_indices: print(f”- {doc_names[idx]}”)

question = “Which nation is greatest recognized for rice fields and paddies?”

print(f“— Question: ‘{question}’ —“)

# Testing Semantic (good at understanding features like “nation-wise nuances” and conceptual titles)

print(“nTop Semantic Outcomes:”)

sem_indices, _ = search_semantic(question)

for idx in sem_indices:

print(f“- {doc_names[idx]}”)

# Testing BM25 (good at discovering precise keyword-based matches like “rice”, “subject”, “paddy”)

print(“nTop BM25 Outcomes:”)

bm25_indices, _ = search_bm25(question)

for idx in bm25_indices:

print(f“- {doc_names[idx]}”)

# Testing Hybrid (balances each)

print(“nTop Hybrid (RRF) Outcomes:”)

hybrid_indices, _ = hybrid_search(question)

for idx in hybrid_indices:

print(f“- {doc_names[idx]}”)

The outcomes will not be wonderful in comparison with a manufacturing RAG system, however keep in mind we examined this on a tiny, nine-document dataset. With that context, the result is sort of cheap.

— Question: ‘Which nation is greatest recognized for rice fields and paddies?’ — Prime Semantic Outcomes: – Vietnam.txt – South_Korea.txt – Thailand.txt Prime BM25 Outcomes: – Indonesia.txt – Japan.txt – Philippines.txt Prime Hybrid (RRF) Outcomes: – Vietnam.txt – Thailand.txt – Indonesia.txt

—– Question: ‘Which nation is greatest recognized for rice fields and paddies?’ —–

Prime Semantic Outcomes:

– Vietnam.txt

– South_Korea.txt

– Thailand.txt

Prime BM25 Outcomes:

– Indonesia.txt

– Japan.txt

– Philippines.txt

Prime Hybrid (RRF) Outcomes:

– Vietnam.txt

– Thailand.txt

– Indonesia.txt

Attempt modifying the question and changing it with others associated to temples, seashores, mountains, or anything that involves thoughts when enthusiastic about japanese locations. Are you able to discover a state of affairs wherein each the semantic outcomes and the BM25 outcomes are extremely in keeping with one another?

Wrapping Up

This text guided you thru implementing a hybrid search mechanism for the retrieval stage of RAG techniques. Selecting to not rely solely on semantic search is a vital consideration when scaling RAG options to manufacturing environments.

🔥 Need the perfect instruments for AI advertising and marketing? Take a look at GetResponse AI-powered automation to spice up your online business!

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Implementing Hybrid Semantic-Lexical Search in RAG

Introduction

Step-by-Step Implementation

Wrapping Up

LEAVE A REPLY

Subscribe

AI Improvements Redefine CX at Cisco Stay AMER

Finest Time to Publish on Social Media in 2026

Zapier vs. Make comparability: Which is finest? [2026]

Constructing a Multi-Software Gemma 4 Agent with Error Restoration

6 productiveness hacks everybody ought to strive in 2026

More like this
Related

AI Improvements Redefine CX at Cisco Stay AMER

Finest Time to Publish on Social Media in 2026

Zapier vs. Make comparability: Which is finest? [2026]

Constructing a Multi-Software Gemma 4 Agent with Error Restoration

About us

The latest posts

AI Improvements Redefine CX at Cisco Stay AMER

Finest Time to Publish on Social Media in 2026

Zapier vs. Make comparability: Which is finest? [2026]

Newsletter Subscribe

Implementing Hybrid Semantic-Lexical Search in RAG

Introduction

Step-by-Step Implementation

Wrapping Up

LEAVE A REPLY

Subscribe

More like thisRelated

About us

The latest posts

Newsletter Subscribe

More like this
Related