Text Chunking for RAG: Sizes, Overlap and Strategies
How to split documents into chunks that retrieve well — choosing a unit, sizing chunks to your embedding model, using overlap to preserve context, and exporting for a pipeline.
Open the AI Text Chunker →What this tool does
The AI Text Chunker splits text into overlapping pieces ready to embed for retrieval-augmented generation (RAG). You control the unit (approximate tokens, characters, words or sentences), the chunk size and the overlap, and can export the result as a JSON array or JSONL for your embedding pipeline.
Why chunk at all
RAG works by embedding pieces of your documents into vectors, then at query time fetching the pieces whose vectors are closest to the question. The size of those pieces is a real lever: too large and a chunk's embedding averages several topics together and retrieves imprecisely; too small and it loses the context needed to be useful. Chunking is how you tune that trade-off.
Picking a unit
- Tokens map most directly to what embedding models measure, so sizing in tokens keeps chunks within a model's limit. The count here is an estimate (~4 characters per token); use your model's tokenizer for exact figures.
- Sentences avoid cutting a fact in half — often more important than hitting an exact size, because a chunk that ends mid-clause embeds poorly.
- Characters / words are simple and predictable when your text is uniform.
Size and overlap
A few hundred tokens per chunk is a common starting point, with 10–20% overlap so context carries across the seam and a retrieved chunk is not missing the sentence that set it up. Denser reference text often prefers smaller chunks; flowing narrative, larger ones. There is no universal best — change one variable, re-run a couple of real queries, and compare what comes back. The token counter and cosine similarity tools help you measure both sides of that.
Exporting
Copy the chunks as a JSON array, as JSONL (one object per line), or download a .jsonl file. Each record carries an id, the chunk text and an estimated token count — the shape most embedding scripts expect. JSONL is the usual format for feeding an embedding or fine-tuning job; the JSONL converter can validate it line by line.
Privacy
All chunking happens in your browser. Your document is never uploaded, so you can safely chunk internal or confidential material before embedding it.
Ready to try it? Open the AI Text Chunker →