We're moving our RAG pipeline from prototype to production. Currently using LangChain + Pinecone but hitting latency issues at scale. What architectures have worked well for you? Particularly interested in chunking strategies and hybrid search approaches.
Great question! We switched from naive chunking to semantic chunking using sentence-transformers and saw a 40% improvement in retrieval quality. Also, don't sleep on hybrid search — combining BM25 with vector search gave us the best results.
3/8/2026
We use LlamaIndex with a custom retrieval pipeline. Key insight: chunk size matters less than chunk overlap and metadata enrichment. We attach section headers, page numbers, and document type as metadata to every chunk.
3/8/2026
Sign in to answer this question.