Hybrid Search: Combining Keyword and Vector Search for Better Retrieval
Introduction
Search is deceptively hard. When a user types a query, they might be searching by meaning ("how do I fix a login problem?") or by exact identifier ("error code SKU-12345"). These two cases need fundamentally different approaches — and most search systems only do one of them well.
Vector search (also called semantic search) is excellent at understanding intent and meaning. It can match "password reset" with "account recovery" because the two phrases mean similar things, even though they share no words. But vector search is poor at exact matching: it may fail to find the one document that contains "SKU-12345" if the embedding model blurs that specific token into a generic representation.
Keyword search (most famously, BM25) is the opposite: it excels at exact matching but has no understanding of synonyms, context, or meaning.
Hybrid search runs both in parallel and combines their results. This is the approach used by virtually every serious production RAG (retrieval-augmented generation) system. This article explains how both methods work, why combining them is better, and how to implement hybrid search using Reciprocal Rank Fusion (RRF).
The Limits of Pure Vector Search
Vector search works by converting text into numerical vectors called embeddings. Similar texts produce vectors that are close together in space, so "find the most similar documents" becomes a fast nearest-neighbor lookup.
This approach excels at capturing semantic similarity. A query like "How do I reset my password?" can match documents containing "account recovery" or "login issues" even without exact word overlap.
However, vector search struggles in several real-world scenarios:
- Exact keyword matches: Searching for "SKU-12345" or "RFC-8446" may not return the correct document if the embedding model does not treat those specific tokens as important.
- Rare or technical terms: Domain-specific jargon, abbreviations, or recently coined terms may not be well-represented in the embedding space.
- Named entities: Product names, person names, or location-specific queries can be missed if embeddings generalize too aggressively.
- Short queries: Single-word or very short queries often lack enough semantic context for embeddings to work effectively.
These failures are not hypothetical. In production systems, they lead to user frustration and degraded retrieval quality.
The Strengths of Keyword Search (BM25)
BM25 (Best Match 25) is a probabilistic ranking function used by search engines for decades. It does not use neural networks or embeddings — it is a statistical method based on counting words.
BM25 scores documents based on three factors:
- Term frequency (TF): How often the query terms appear in the document. More occurrences = higher score.
- Inverse document frequency (IDF): How rare the query terms are across the entire corpus. Rare terms are more informative.
- Document length normalization: Prevents long documents from automatically outscoring short ones just because they contain more words overall.
The BM25 formula is:
In plain English: this formula adds up relevance scores for each query word, rewarding documents where that word appears often (but not too often) and is rare across the whole collection.
Where:
f(q_i, D)is the term frequency of query termq_iin documentD.|D|is the document length (in words).avgdlis the average document length across the corpus.k_1andbare tuning parameters. Default values ofk_1 = 1.5andb = 0.75work well in most cases.
Where BM25 falls short
BM25 does not understand language meaning. "Password reset" and "account recovery" are completely unrelated to BM25 because they share no words. It also cannot handle typos or synonyms. This is precisely where vector search excels — the two methods complement each other perfectly.
Why Hybrid Search Works Better
Hybrid search runs both vector search and BM25 in parallel, then merges the two ranked lists into a single final ranking. This ensures:
- Exact keyword matches are captured when they matter.
- Semantic similarity is captured for conceptual queries.
- When one method fails, the other can cover for it.
Empirical benchmarks consistently show hybrid search outperforming either method alone, especially on diverse query types. This is the main reason production RAG systems use it.
Reciprocal Rank Fusion (RRF)
The most common and effective way to merge BM25 and vector search results is Reciprocal Rank Fusion (RRF). It is simple, effective, and does not require training a separate model.
How RRF works
Each retrieval method returns a ranked list of documents. RRF assigns a score to each document based on its position in each list:
In plain English: a document gets a point for every ranked list it appears in. The higher it ranks in a list, the bigger the point. Documents that rank highly in both BM25 and vector search get the largest combined score.
Where:
Ris the set of ranking methods (e.g., BM25 and vector search).rank_r(d)is the position of documentdin ranking methodr(starting at 1 for the top result).kis a constant — typically 60 — that prevents documents at the very top from getting excessively large scores over those just below them.
Why RRF is robust
RRF only uses ranks, not the raw scores from each method. This is crucial because BM25 scores and cosine similarity scores are on completely different scales — you cannot simply add them. By working with ranks, RRF is immune to this scale mismatch, and it naturally balances both methods without any manual weight tuning.
Implementing Hybrid Search
Here is a practical implementation of hybrid search using BM25 and vector embeddings. First, install
the required libraries: pip install rank_bm25 sentence-transformers numpy.
Step 1: Index documents with both methods
You need two indexes built over the same documents:
- A BM25 index (e.g., using the
rank_bm25library, Elasticsearch, or OpenSearch). - A vector index (e.g., Pinecone, Weaviate, Qdrant, or FAISS for local use).
Step 2: Query both indexes
import numpy as np
from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer
# Sample documents
documents = [
"How to reset your password",
"Account recovery guide",
"Product SKU-12345 specifications",
"Understanding RFC-8446 TLS 1.3"
]
# BM25 indexing
tokenized_docs = [doc.lower().split() for doc in documents]
bm25 = BM25Okapi(tokenized_docs)
# Vector indexing
model = SentenceTransformer('all-MiniLM-L6-v2')
doc_embeddings = model.encode(documents)
# Query
query = "password reset"
# BM25 search
tokenized_query = query.lower().split()
bm25_scores = bm25.get_scores(tokenized_query)
bm25_ranks = np.argsort(bm25_scores)[::-1]
# Vector search
query_embedding = model.encode([query])[0]
cosine_scores = np.dot(doc_embeddings, query_embedding)
vector_ranks = np.argsort(cosine_scores)[::-1]
print("BM25 ranks:", bm25_ranks)
print("Vector ranks:", vector_ranks)
Step 3: Apply Reciprocal Rank Fusion
The function below takes the two ranked lists and produces a single merged ranking. Notice that ranks are 0-indexed from the sort output, so we add 1 to each rank position to follow the RRF formula.
def reciprocal_rank_fusion(bm25_ranks, vector_ranks, k=60):
rrf_scores = {}
# Add BM25 scores
for rank, doc_id in enumerate(bm25_ranks):
rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank + 1)
# Add vector scores
for rank, doc_id in enumerate(vector_ranks):
rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank + 1)
# Sort by RRF score
sorted_docs = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
return sorted_docs
# Combine results
final_ranking = reciprocal_rank_fusion(bm25_ranks, vector_ranks)
print("Hybrid search results:", final_ranking)
Alternative Fusion Methods
RRF is the most popular approach, but there are alternatives worth knowing:
Weighted score fusion
Combine normalized scores from BM25 and vector search using a weighted average:
In plain English: α controls how much you trust BM25 versus vector search. Setting
α = 0.3 means 30% BM25 and 70% vector search. Both scores must be normalized to the same
scale first — otherwise the method is meaningless.
This requires score normalization and manual tuning of the weight α, which is why RRF is
usually preferred.
Learned fusion
Train a reranker model that takes both BM25 and vector scores as features and learns optimal weights from labeled data. This is more complex but can achieve higher accuracy if you have enough labeled query-document pairs.
Query-adaptive fusion
Dynamically adjust the fusion weights based on the query. Short, precise queries rely more on BM25; long, conceptual queries favor vector search.
When to Use Hybrid Search
Hybrid search is ideal when:
- Your corpus contains both conceptual content and exact identifiers (SKUs, error codes, names).
- Users submit diverse query types — questions, keywords, and technical terms.
- You want retrieval that degrades gracefully across edge cases.
- Your domain includes rare or specialized terminology not well-covered by embedding models.
Real-world examples where hybrid search shines:
- E-commerce product search (combining semantic search with exact SKU matching).
- Technical documentation (semantic intent + exact API names or error codes).
- Legal or regulatory text (concept search + precise citation matching).
- Customer support (understanding intent + finding exact error codes).
Production Considerations
Latency
Hybrid search queries two indexes instead of one. To minimize latency, run both retrievals in parallel. Most vector databases and search engines support parallel execution efficiently.
Index synchronization
Ensure both indexes are updated together when documents are added, modified, or deleted. Inconsistent indexes lead to missing or duplicate results.
Cost
Running two indexes increases storage and compute costs. However, the improvement in retrieval quality typically justifies the expense — poor retrieval degrades everything downstream, including the quality of LLM answers.
Tuning the RRF constant
The RRF constant k = 60 works well in most cases and was validated in the original RRF
paper. Adjust it only if you have strong evidence from offline evaluation metrics.
Hybrid Search in Vector Databases
Many modern vector databases now support hybrid search natively, which simplifies implementation:
- Weaviate: Supports hybrid search with configurable BM25 and vector weights.
- Qdrant: Offers hybrid search using score fusion.
- Elasticsearch: Combines keyword search with dense vector retrieval natively.
- Pinecone: Supports sparse-dense hybrid search.
Example: Hybrid Search with Weaviate
The following example uses the Weaviate v4 Python client. The alpha parameter controls the
blend: alpha=0 means pure BM25, alpha=1 means pure vector search, and values
in between blend both. Install with: pip install weaviate-client>=4.0.
import weaviate
from weaviate.classes.query import HybridFusion
# v4 connection (replaces weaviate.Client())
client = weaviate.connect_to_local() # or connect_to_wcs(), connect_to_custom()
collection = client.collections.get("Articles")
results = collection.query.hybrid(
query="what is retrieval augmented generation",
alpha=0.75, # 0=pure BM25, 1=pure vector
fusion_type=HybridFusion.RELATIVE_SCORE,
limit=5,
return_metadata=weaviate.classes.query.MetadataQuery(score=True)
)
for obj in results.objects:
print(obj.properties, obj.metadata.score)
client.close()
Evaluation Metrics for Hybrid Search
To confirm that hybrid search actually improves over each method alone, measure these metrics on a held-out evaluation set with known relevant documents:
- Precision@K: Of the top K results returned, what fraction are actually relevant?
- Recall@K: Of all relevant documents in the corpus, what fraction appear in the top K results?
- Mean Reciprocal Rank (MRR): On average, how high does the first relevant result appear? Higher is better.
- NDCG (Normalized Discounted Cumulative Gain): Accounts for both ranking position and relevance grade — the most comprehensive single metric.
Always compare hybrid search to pure BM25 and pure vector search as baselines before deploying.
Common Pitfalls
Over-relying on one method
If one method dominates the fusion — say, BM25 always pushes its top result to first place — you lose the benefit of hybrid search. Use evaluation metrics to confirm both methods are contributing.
Ignoring query types
Analyze your query distribution. If 90% of your users type exact product codes, lean heavier on BM25. If 90% ask natural-language questions, lean heavier on vector search.
Poor chunking strategy
Both BM25 and vector search are only as good as the document chunks they index. If chunks are too large, important keywords are diluted. If too small, context is lost. Invest time in your chunking strategy before optimizing fusion.
Comparison: Pure vs Hybrid Search
| Aspect | Pure Vector Search | Pure BM25 | Hybrid Search |
|---|---|---|---|
| Semantic Understanding | Excellent | Poor | Excellent |
| Exact Keyword Matching | Weak | Excellent | Excellent |
| Rare Terms / Identifiers | Weak | Excellent | Excellent |
| Short Queries | Moderate | Good | Good |
| Implementation Complexity | Low | Low | Medium |
| Cost | Medium | Low | Medium–High |
| Overall Robustness | Good | Good | Excellent |
Future Directions
Hybrid search continues to evolve:
- Learned sparse representations: Neural models (like SPLADE) generate sparse, keyword-like vectors that bridge BM25 and dense retrieval.
- Contextual re-ranking: Using a cross-encoder LLM to re-rank hybrid results based on query context for even higher precision.
- Multi-vector search: Combining multiple embedding models with BM25 for even more robust retrieval coverage.
Conclusion
Hybrid search is not a luxury — it is a practical necessity for production RAG systems. Pure vector search excels at semantic understanding but misses exact matches. BM25 is fast and precise for keywords but lacks semantic depth.
By combining both methods with Reciprocal Rank Fusion, you get retrieval that is robust, accurate, and handles the full range of real-world queries. If your RAG system currently uses only vector search, adding BM25 and RRF fusion is one of the highest-impact improvements you can make.
Key Takeaways
- Pure vector search struggles with exact keyword matches and rare terms; BM25 lacks semantic understanding — hybrid search covers both failure modes.
- Reciprocal Rank Fusion (RRF) is the simplest and most effective way to merge BM25 and vector rankings — it uses only ranks, not raw scores, making it immune to scale mismatches between the two methods.
- Many vector databases (Weaviate, Qdrant, Elasticsearch, Pinecone) now support hybrid search natively, simplifying implementation significantly.
- Always evaluate retrieval quality using metrics like Precision@K, Recall@K, and MRR to confirm that hybrid search improves over each method individually before deploying.
References
- Robertson, S., & Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389.
- Cormack, G. V., Clarke, C. L. A., & Buettcher, S. (2009). Reciprocal rank fusion outperforms condorcet and individual rank learning methods. SIGIR 2009.
- Weaviate — Hybrid Search Explained
- Qdrant — Hybrid Search Documentation
- Thakur, N., et al. (2021). BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. NeurIPS 2021 Datasets Track.
Related Articles