Hybrid Search: Combining Keyword and Vector Search for Better Retrieval
Introduction
When teams first build retrieval-augmented generation (RAG) systems, they often start with pure vector search. The logic seems simple: encode queries and documents into embeddings, then retrieve the most similar results.
This works for many cases, especially when queries are semantic and conceptual. However, pure vector search has blind spots.
If a user searches for a specific product code, technical identifier, or exact phrase, vector search can fail completely. Embeddings capture semantic meaning but often miss exact keyword matches.
Hybrid search solves this by combining vector search with traditional keyword-based retrieval like BM25.
The result is a system that understands both meaning and exact matches, giving you the best of both worlds.
This post explains how hybrid search works, when to use it, and how to implement it correctly in production RAG systems.
The Limits of Pure Vector Search
Vector search relies on embedding models to convert text into high-dimensional vectors. Similarity is then measured using cosine similarity or dot product.
This approach excels at capturing semantic similarity. For example, a query like "How do I reset my password?" can match documents containing "account recovery" or "login issues" even without exact word overlap.
However, vector search struggles in several scenarios:
- Exact keyword matches: Searching for "SKU-12345" or "RFC-8446" may not return the correct document if the embedding model does not prioritize exact tokens.
- Rare or technical terms: Domain-specific jargon, abbreviations, or newly coined terms may not be well-represented in the embedding space.
- Named entities: Product names, person names, or location-specific queries can be missed if embeddings generalize too much.
- Short queries: Single-word or very short queries often lack enough semantic context for embeddings to work effectively.
These failures are not hypothetical. In production systems, they lead to user frustration and degraded retrieval quality.
The Strengths of Keyword Search (BM25)
BM25 (Best Match 25) is a probabilistic ranking function that scores documents based on term frequency and inverse document frequency.
Unlike vector search, BM25 does not rely on embeddings or neural networks. It is a statistical method that has been the backbone of search engines for decades.
How BM25 Works
BM25 assigns a score to each document based on:
- Term frequency (TF): How often the query terms appear in the document.
- Inverse document frequency (IDF): How rare the query terms are across the entire corpus.
- Document length normalization: Adjusts scores to avoid bias toward longer documents.
The BM25 formula is:
Where:
f(q_i, D)is the term frequency of query termq_iin documentD.|D|is the document length.avgdlis the average document length in the corpus.k_1andbare tuning parameters (typicallyk_1 = 1.5,b = 0.75).
BM25 is fast, interpretable, and excels at exact keyword matching.
Where BM25 Falls Short
BM25 does not understand semantics. It treats "password reset" and "account recovery" as completely unrelated unless they share keywords.
This is where vector search shines. The two approaches are complementary.
Why Hybrid Search Works Better
Hybrid search combines vector search and BM25 to leverage their respective strengths.
The system runs both retrieval methods in parallel, then merges the results using a fusion algorithm.
This ensures that:
- Exact keyword matches are prioritized when relevant.
- Semantic similarity is captured for conceptual queries.
- Edge cases where one method fails are covered by the other.
Empirical studies show that hybrid search consistently outperforms either method alone, especially in diverse query types.
Reciprocal Rank Fusion (RRF)
The most common way to combine BM25 and vector search results is Reciprocal Rank Fusion (RRF).
RRF is simple, effective, and does not require training.
How RRF Works
Each retrieval method produces a ranked list of documents. RRF computes a score for each document based on its rank in each list.
The formula is:
Where:
Ris the set of ranking methods (e.g., BM25 and vector search).rank_r(d)is the rank of documentdin rankingr.kis a constant (typically 60) that controls the influence of lower-ranked results.
Documents that rank highly in both methods receive the highest RRF scores.
Why RRF Works
RRF does not rely on the absolute scores from each method, only the ranks. This makes it robust to score scale differences.
It also naturally balances contributions from both methods without manual weight tuning.
Implementing Hybrid Search
Here is a practical implementation of hybrid search using BM25 and vector embeddings.
Step 1: Index Documents
You need two indexes:
- A BM25 index (e.g., Elasticsearch, OpenSearch, or a simple inverted index).
- A vector index (e.g., Pinecone, Weaviate, Qdrant, or FAISS).
Both indexes should contain the same documents but optimized for their respective retrieval methods.
Step 2: Query Both Indexes
import numpy as np
from rank_bm25 import BM25Okapi
from sentence_transformers import SentenceTransformer
# Sample documents
documents = [
"How to reset your password",
"Account recovery guide",
"Product SKU-12345 specifications",
"Understanding RFC-8446 TLS 1.3"
]
# BM25 indexing
tokenized_docs = [doc.lower().split() for doc in documents]
bm25 = BM25Okapi(tokenized_docs)
# Vector indexing
model = SentenceTransformer('all-MiniLM-L6-v2')
doc_embeddings = model.encode(documents)
# Query
query = "password reset"
# BM25 search
tokenized_query = query.lower().split()
bm25_scores = bm25.get_scores(tokenized_query)
bm25_ranks = np.argsort(bm25_scores)[::-1]
# Vector search
query_embedding = model.encode([query])[0]
cosine_scores = np.dot(doc_embeddings, query_embedding)
vector_ranks = np.argsort(cosine_scores)[::-1]
print("BM25 ranks:", bm25_ranks)
print("Vector ranks:", vector_ranks)
Step 3: Apply Reciprocal Rank Fusion
def reciprocal_rank_fusion(bm25_ranks, vector_ranks, k=60):
rrf_scores = {}
# Add BM25 scores
for rank, doc_id in enumerate(bm25_ranks):
rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank + 1)
# Add vector scores
for rank, doc_id in enumerate(vector_ranks):
rrf_scores[doc_id] = rrf_scores.get(doc_id, 0) + 1 / (k + rank + 1)
# Sort by RRF score
sorted_docs = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
return sorted_docs
# Combine results
final_ranking = reciprocal_rank_fusion(bm25_ranks, vector_ranks)
print("Hybrid search results:", final_ranking)
This simple implementation shows how RRF merges the two ranking lists.
Alternative Fusion Methods
While RRF is the most popular fusion method, there are alternatives:
Weighted Score Fusion
Combine normalized scores from BM25 and vector search using weighted averages:
This requires score normalization and tuning the weight α.
Learned Fusion
Train a reranker model that takes both BM25 and vector scores as features and learns optimal weights.
This is more complex but can achieve higher accuracy if you have labeled data.
Query-Adaptive Fusion
Dynamically adjust the fusion weights based on query characteristics. For example, short queries rely more on BM25, while long semantic queries favor vector search.
When to Use Hybrid Search
Hybrid search is ideal when:
- Your corpus contains both conceptual content and exact identifiers (SKUs, codes, names).
- Users submit diverse query types (questions, keywords, technical terms).
- You want robust retrieval that degrades gracefully across edge cases.
- Your domain includes rare or specialized terminology.
Examples include:
- E-commerce product search (combining semantic search with exact SKU matching).
- Technical documentation (semantic similarity + exact API names).
- Legal or regulatory text (concept search + citation matching).
- Customer support (understanding intent + finding exact error codes).
Production Considerations
Latency
Hybrid search requires querying two indexes. To minimize latency, run both retrievals in parallel.
Most vector databases and search engines support parallel execution efficiently.
Index Synchronization
Ensure that both BM25 and vector indexes are updated together when documents are added, modified, or deleted.
Inconsistent indexes can lead to missing or duplicate results.
Cost
Running two indexes increases storage and compute costs. However, the improvement in retrieval quality often justifies the expense.
Tuning
The RRF constant k can be tuned based on your corpus and query distribution. Start with
k = 60 and adjust if needed.
Hybrid Search in Vector Databases
Many modern vector databases now support hybrid search natively:
- Weaviate: Supports hybrid search with configurable BM25 and vector weights.
- Qdrant: Offers hybrid search using score fusion.
- Elasticsearch: Combines keyword search with dense vector retrieval.
- Pinecone: Supports sparse-dense hybrid search (in beta).
Using a database with built-in hybrid search simplifies implementation and ensures optimized performance.
Example: Hybrid Search with Weaviate
import weaviate
client = weaviate.Client("http://localhost:8080")
# Hybrid search query
result = client.query.get(
"Document",
["content", "title"]
).with_hybrid(
query="password reset",
alpha=0.5 # 0.5 = equal weight to BM25 and vector
).with_limit(10).do()
print(result)
The alpha parameter controls the balance between keyword and semantic search.
Evaluation Metrics for Hybrid Search
To measure the effectiveness of hybrid search, use:
- Precision@K: Percentage of top-K results that are relevant.
- Recall@K: Percentage of relevant documents found in top-K results.
- Mean Reciprocal Rank (MRR): Measures how quickly the first relevant result appears.
- Normalized Discounted Cumulative Gain (NDCG): Accounts for ranking position and relevance.
Compare these metrics across pure BM25, pure vector search, and hybrid search to quantify improvements.
Common Pitfalls
Over-Reliance on One Method
If BM25 or vector search dominates the fusion, you lose the benefits of hybrid search. Monitor the contribution of each method.
Ignoring Query Types
Not all queries benefit equally from hybrid search. Analyze your query distribution to identify where hybrid search adds value.
Poor Chunking
If your documents are chunked poorly, both BM25 and vector search will suffer. Invest in good chunking strategies.
Comparison: Pure vs Hybrid Search
| Aspect | Pure Vector Search | Pure BM25 | Hybrid Search |
|---|---|---|---|
| Semantic Understanding | Excellent | Poor | Excellent |
| Exact Keyword Matching | Weak | Excellent | Excellent |
| Rare Terms / Identifiers | Weak | Excellent | Excellent |
| Short Queries | Moderate | Good | Good |
| Implementation Complexity | Low | Low | Medium |
| Cost | Medium | Low | Medium-High |
| Overall Robustness | Good | Good | Excellent |
Future Directions
Hybrid search continues to evolve. Emerging trends include:
- Learned sparse representations: Neural models that generate sparse keyword-like vectors for better fusion.
- Contextual re-ranking: Using LLMs to rerank hybrid search results based on query context.
- Multi-vector search: Combining multiple embedding models with BM25 for even more robust retrieval.
Conclusion
Hybrid search is not a luxury; it is a practical necessity for production RAG systems.
Pure vector search excels at semantic understanding but misses exact matches. BM25 is fast and precise for keywords but lacks semantic depth.
By combining both methods with reciprocal rank fusion, you get retrieval that is robust, accurate, and adaptable to diverse query types.
If your RAG system is currently using only vector search, adding BM25 and hybrid fusion is one of the highest-impact improvements you can make.
Key Takeaways
- Pure vector search struggles with exact keyword matches and rare terms.
- BM25 excels at keyword matching but lacks semantic understanding.
- Hybrid search combines both methods for superior retrieval accuracy.
- Reciprocal rank fusion (RRF) is the simplest and most effective fusion method.
- Many vector databases now support hybrid search natively.
- Hybrid search is especially valuable for diverse query types and technical domains.
- Always evaluate retrieval quality using metrics like Precision@K, Recall@K, and MRR.