The Problem: Why Retail Needs Intelligent Product Search
Modern retail generates massive amounts of data: product catalogs with millions of SKUs, customer reviews, sales transactions, inventory movements, and pricing information. When a business user asks "What products are trending among millennials in the electronics category?", traditional search systems fall short:
- Keyword Limitations — Traditional search relies on exact matches. A query for "trending electronics" misses products described as "popular gadgets" or "best-selling tech"
- Context Blindness — Conventional systems cannot understand nuanced queries like "products similar to what sold well last quarter"
- Data Silos — Sales data, customer feedback, and inventory information exist in separate systems, making unified insights impossible
- Scale Challenges — Searching through millions of product vectors in real-time requires specialized indexing that most databases cannot provide
Retailers need a system that understands natural language, retrieves semantically relevant information from multiple data sources, and generates coherent, contextual responses. This is where the Smart Retail Navigator comes in.
The Solution: RAG Architecture with Annoy Indexing
The Smart Retail Navigator combines three technologies that each solve a piece of the puzzle. Together, they create a system that understands queries like humans do, retrieves relevant data at scale, and generates actionable insights.
Query Understanding
LLM parses natural language intent and context
Vector Embedding
Query converted to dense vector representation
Annoy Search
Find nearest neighbors in product vector space
Context Retrieval
Fetch relevant documents and metadata
Response Generation
LLM synthesizes contextual answer
Why These Three Technologies?
Each component addresses a specific challenge:
- RAG (Retrieval-Augmented Generation) — Grounds LLM responses in actual data, preventing hallucination and ensuring accuracy
- LLM (Large Language Models) — Provides natural language understanding and human-like response generation
- Annoy (Approximate Nearest Neighbors) — Enables sub-millisecond similarity search across millions of vectors
How It Works: Vector Similarity and Retrieval-Augmented Generation
Vector Embeddings: The Foundation
Every piece of retail data — product descriptions, customer reviews, sales summaries — gets converted into a dense vector representation. These vectors capture semantic meaning, so "wireless headphones" and "Bluetooth earbuds" end up close together in vector space.
# Convert text to vector representation
def embed_product(product_description):
# Use pre-trained model to generate embeddings
embedding = model.encode(product_description)
return embedding # Returns 768-dimensional vector
# Example: Similar products cluster together
headphones_vec = embed_product("Wireless noise-canceling headphones")
earbuds_vec = embed_product("Bluetooth earbuds with ANC")
# cosine_similarity(headphones_vec, earbuds_vec) ≈ 0.87
Annoy: Fast Approximate Nearest Neighbor Search
Annoy (Approximate Nearest Neighbors Oh Yeah) uses random projection trees to partition the vector space. Instead of comparing a query against every product vector (O(n) complexity), it traverses trees to find approximate nearest neighbors in O(log n) time.
# Build Annoy index from product embeddings
from annoy import AnnoyIndex
dimension = 768 # Embedding dimension
index = AnnoyIndex(dimension, 'angular') # Use cosine similarity
# Add all product vectors
for product_id, embedding in product_embeddings.items():
index.add_item(product_id, embedding)
# Build index with 10 trees (more trees = better accuracy)
index.build(n_trees=10)
# Search: Find 10 most similar products in ~1ms
similar_ids = index.get_nns_by_vector(query_vector, n=10)
RAG: Grounding Generation in Reality
Retrieval-Augmented Generation solves the hallucination problem. Instead of relying solely on the LLM's training data, we retrieve relevant documents and include them in the prompt. The LLM generates responses based on actual, current information.
# RAG: Retrieve relevant context, then generate
def answer_retail_query(user_query):
# Step 1: Embed the query
query_embedding = embed_query(user_query)
# Step 2: Find relevant documents via Annoy
relevant_ids = annoy_index.get_nns_by_vector(query_embedding, n=5)
context_docs = fetch_documents(relevant_ids)
# Step 3: Build augmented prompt
augmented_prompt = f"""
Based on the following retail data:
{context_docs}
Answer this question: {user_query}
"""
# Step 4: Generate response with LLM
response = llm.generate(augmented_prompt)
return response
Dual LLM Strategy
The system employs two specialized models for optimal performance:
| Model | Specialization | Use Case |
|---|---|---|
| eCeLLM | E-commerce domain expertise | Complex product queries, category analysis, trend detection |
| DistilGPT-2 | Real-time processing | Quick responses, simple queries, high-throughput scenarios |
eCeLLM, trained specifically on e-commerce data, excels at understanding retail terminology and product relationships. DistilGPT-2, a distilled version of GPT-2, provides faster inference for time-sensitive queries.
The Mathematics: Cosine Similarity and Random Projections
Cosine Similarity for Semantic Matching
The system uses cosine similarity to measure how semantically related two pieces of text are. This metric is ideal for comparing embeddings because it focuses on direction (meaning) rather than magnitude.
// Cosine similarity between query (q) and document (d) vectors
cos(θ) = (q · d) / (||q|| × ||d||)
// Range: [-1, 1], where 1 = identical meaning
// In practice, retail vectors typically range [0.3, 0.95]
// Example similarity scores:
query: "wireless earbuds"
"Bluetooth headphones" → 0.89
"USB charging cable" → 0.31
"Running shoes" → 0.12
Random Projection Trees in Annoy
Annoy builds a forest of random projection trees. Each tree recursively splits the vector space using random hyperplanes until each leaf contains a small number of items.
// Building a random projection tree
1. Select random hyperplane through data points
2. Split points into "left" and "right" based on which side they fall
3. Recursively split until leaf size < threshold
// Search traversal
1. For each tree, descend to leaf containing query point
2. Collect candidate neighbors from all trees
3. Compute exact distances for candidates
4. Return top-k closest
// Complexity
Build: O(n × t × log n) where t = number of trees
Search: O(t × log n) near-constant time for large n
System Architecture
| Layer | Technologies | Purpose |
|---|---|---|
| Data Layer | Mock Data Generators | Sales, customer feedback, inventory simulation |
| Embedding Layer | Sentence Transformers | Convert text to 768-dimensional vectors |
| Index Layer | Annoy | Fast approximate nearest neighbor search |
| Intelligence Layer | eCeLLM, DistilGPT-2 | Query understanding and response generation |
| Orchestration | Jupyter Notebook | Pipeline coordination and experimentation |
Data Flow
The system processes three primary data streams that feed into the unified search index:
- Sales Data — Transaction records, revenue metrics, seasonal patterns
- Customer Feedback — Reviews, ratings, sentiment signals
- Inventory Information — Stock levels, supplier data, availability status
Retail Intelligence Use Cases
Product Discovery
"Find products similar to our top sellers from last quarter" — semantic search across catalog
Trend Analysis
"What categories are gaining momentum with Gen-Z customers?" — cross-reference sales and demographics
Inventory Insights
"Which products need restocking based on current sales velocity?" — predictive inventory queries
Customer Sentiment
"Summarize recent feedback for our electronics category" — aggregate review analysis
Performance Characteristics
The combination of Annoy indexing and dual-LLM architecture delivers both speed and accuracy:
Why Annoy Over Alternatives?
- Memory Efficiency — Index can be memory-mapped, allowing indexes larger than RAM
- Static Index — Once built, index is immutable and thread-safe for concurrent reads
- Simple API — Minimal setup compared to distributed solutions like Milvus or Pinecone
- Proven Scale — Used in production at Spotify for music recommendations
Implementation Highlights
Mock Data Generation
The project includes data generators that simulate realistic retail scenarios, enabling experimentation without requiring production data access.
# Generate synthetic retail data
def generate_retail_dataset(n_products=10000):
products = []
for i in range(n_products):
product = {
'id': i,
'name': generate_product_name(),
'description': generate_description(),
'category': random.choice(CATEGORIES),
'price': generate_price(),
'reviews': generate_reviews(n=random.randint(5, 50))
}
products.append(product)
return products
# Index all products
dataset = generate_retail_dataset()
for product in dataset:
embedding = embed_product(product['description'])
annoy_index.add_item(product['id'], embedding)
Query Processing Pipeline
The orchestration layer coordinates the flow from user query to final response:
# End-to-end query processing
class RetailNavigator:
def __init__(self):
self.annoy_index = load_annoy_index()
self.ecellm = load_ecellm()
self.distilgpt = load_distilgpt()
def process_query(self, query, mode='accurate'):
# Select LLM based on mode
llm = self.ecellm if mode == 'accurate' else self.distilgpt
# Embed and search
query_vec = embed_query(query)
relevant_ids = self.annoy_index.get_nns_by_vector(query_vec, 10)
context = self.fetch_context(relevant_ids)
# Generate response with RAG
prompt = self.build_rag_prompt(query, context)
response = llm.generate(prompt)
return response
Explore the Code
The complete implementation is available on GitHub as a Jupyter notebook with documentation and examples for building your own retail intelligence system.
View on GitHub