Skip to main content
    47

    Best practices for RAG pipelines in production?

    We're moving our RAG pipeline from prototype to production. Currently using LangChain + Pinecone but hitting latency issues at scale. What architectures have worked well for you? Particularly interested in chunking strategies and hybrid search approaches.

    tech trends
    rag
    llm
    production
    3/8/2026

    2 Answers

    23

    Great question! We switched from naive chunking to semantic chunking using sentence-transformers and saw a 40% improvement in retrieval quality. Also, don't sleep on hybrid search — combining BM25 with vector search gave us the best results.

    3/8/2026

    18

    We use LlamaIndex with a custom retrieval pipeline. Key insight: chunk size matters less than chunk overlap and metadata enrichment. We attach section headers, page numbers, and document type as metadata to every chunk.

    3/8/2026

    Sign in to answer this question.