E-commerce AI Capability Building
Measurable Outcomes
The Challenge
4-person engineering team with strong backend skills but no ML experience wanted to add semantic search and product recommendations without becoming dependent on external consultants.
The Outcome
Team shipped 3 AI features in 6 months and has independently built 2 more since our engagement ended. They no longer need external AI help for product-level features.
The Context
An online retailer with a 4-person engineering team wanted to modernize their search and discovery experience. Their current search was keyword-based—customers searching for “cozy sweater” wouldn’t find products described as “warm knit pullover.”
They had the budget to hire consultants to build features for them. But they didn’t want that. They wanted to:
- Actually understand what they were shipping
- Be able to maintain and extend the features themselves
- Not call us every time they needed a change
This is exactly the scenario our ENABLE service is designed for.
How We Worked
Embedded, Not External
I joined their team 2 days per week for 6 months. I attended standups, participated in code reviews, and used their tools. This wasn’t a consultant dropping in to present slides—it was pair programming and real-time problem solving.
The rhythm worked like this:
| Feature | Ownership Model |
|---|---|
| Semantic Search | Built together (me driving) |
| Recommendations | Built together (them driving) |
| Review Summarization | Built by them (my review) |
By the third feature, they were doing the work. I was just there to catch edge cases and validate decisions.
Feature 1: Semantic Search (Month 1-2)
Problem: Keyword search was failing on conceptual queries. “Summer dress for wedding” returned nothing if products were described as “elegant formal dress.”
Solution: Vector embeddings + hybrid search.
We embedded all product titles and descriptions using OpenAI’s embedding model, stored them in pgvector, and built a hybrid search that combined:
- Vector similarity for semantic matching
- Keyword matching for exact terms (SKUs, brand names)
- Category boosting for relevance
What the team learned:
- How embeddings work and why they capture semantic meaning
- Practical considerations: batch processing, incremental updates, index management
- Query tuning: balancing semantic similarity with business rules
Result: 18% improvement in search-to-purchase conversion.
Feature 2: Product Recommendations (Month 3-4)
Problem: Their “Related Products” section was based on category matching—show other products in the same category. It worked, but it wasn’t personalized.
Solution: Collaborative filtering + embedding similarity.
For logged-in users: recommend products that similar users purchased. For anonymous users: recommend products with similar embeddings to what they’re viewing.
What the team learned:
- Cold start problem and how to handle it
- When to use collaborative filtering vs. content-based approaches
- A/B testing methodology for ML features
Result: 12% increase in average order value from recommendation clicks.
Feature 3: Review Summarization (Month 5-6)
Problem: Products with 100+ reviews were overwhelming. Customers wanted quick takeaways without reading everything.
Solution: LLM-generated summaries with aspect extraction.
For each product, generate a 3-4 sentence summary highlighting:
- Key positives mentioned across reviews
- Common concerns or complaints
- Who the product is best for
What the team learned:
- Prompt engineering for consistent output format
- Caching strategies (reviews don’t change often, don’t regenerate unnecessarily)
- Content moderation considerations
This feature was built almost entirely by their team. My role was reviewing code and discussing edge cases.
Result: Reduction in return rate for products with review summaries (customers had better expectations).
Knowledge Transfer Approach
The goal was never to write code for them—it was to teach them to fish. Here’s how we structured the learning:
Week 1 of each feature: Concept deep-dive. What problem are we solving? What approaches exist? Why are we choosing this one? What are the trade-offs?
Weeks 2-3: Implementation together. I drove for Feature 1, they drove for Feature 2, they soloed Feature 3.
Week 4: Production hardening. Error handling, monitoring, documentation. The boring stuff that makes systems production-ready.
Throughout: Async support on Slack for questions that came up between sessions.
Documentation We Created Together
- Architecture decision records: Why we chose pgvector over Pinecone, why OpenAI embeddings instead of open-source models
- Runbooks: How to reindex products, how to debug search quality issues, how to update prompts
- Cost tracking: How to monitor API costs and optimize when needed
After We Left
The real test of ENABLE is what happens after the engagement ends. Three months later:
Feature 4 (built independently): Size recommendation based on purchase history and returns. Uses similar embedding approach to find “fit twins”—customers with similar body types based on their return patterns.
Feature 5 (built independently): Smart search autocomplete. Suggests products as users type, combining prefix matching with semantic similarity.
Both features shipped without any input from us. The team had the skills, confidence, and architectural patterns to execute independently.
What Made This Work
Right team profile: Strong engineers who wanted to learn. If the team had no interest in AI, we would have recommended hiring an ML engineer instead.
Realistic scope: We didn’t try to boil the ocean. Three features in six months, each building on the last.
Working code over documentation: The team learned by building, not by reading whitepapers. Every concept was tied to something they could run and debug.
Honest assessment: We told them what they could realistically build themselves vs. what would require specialized expertise. (Fine-tuning models, building custom recommendation algorithms from scratch—those need dedicated ML engineers. Using off-the-shelf embeddings and APIs—their team could absolutely do that.)
Technical Stack
For teams considering similar work:
| Component | Choice | Why |
|---|---|---|
| Embeddings | OpenAI text-embedding-3-small | Good quality, easy API, reasonable cost |
| Vector store | PostgreSQL + pgvector | They already ran Postgres, minimal ops burden |
| Caching | Redis | Review summaries, recommendation results |
| API | FastAPI | Their existing stack, good async support |
| Monitoring | Datadog | Their existing observability stack |
Total incremental infrastructure: pgvector extension (free) + Redis instance (they already had one for sessions).
Cost Reality
This engagement wasn’t cheap—6 months of embedded work. But compare to:
- Hiring an ML engineer (salary + benefits + ramp time + risk of bad hire)
- Building dependency on a vendor for every feature change
- Building features that nobody knows how to maintain
The team now moves faster on AI features than before we joined. That’s the ROI of capability building over feature delivery.