Best practices for RAG pipelines in production?

We're moving our RAG pipeline from prototype to production. Currently using LangChain + Pinecone but hitting latency issues at scale. What architectures have worked well for you? Particularly interested in chunking strategies and hybrid search approaches.

tech trends

rag

llm

production

3/8/2026

2 Answers

Great question! We switched from naive chunking to semantic chunking using sentence-transformers and saw a 40% improvement in retrieval quality. Also, don't sleep on hybrid search — combining BM25 with vector search gave us the best results.

3/8/2026

We use LlamaIndex with a custom retrieval pipeline. Key insight: chunk size matters less than chunk overlap and metadata enrichment. We attach section headers, page numbers, and document type as metadata to every chunk.

3/8/2026