Check your understanding of retrieval building blocks behind modern RAG systems. From indexing to reranking, verify you can tune quality, speed, and cost.
In a vector database, the embedding dimension must ______.
match the index’s expected dimensionality
be prime
equal the number of documents
change at query time
HNSW is a popular ANN index because it provides ______.
high recall with sublinear query time via navigable small-world graphs
only CPU-bound training
exact k-NN with linear scans
lossless compression of text
Hybrid retrieval that combines BM25 and vector similarity often ______.
improves recall and grounding over either method alone
requires identical embeddings and tokens
breaks tokenization entirely
prevents use of filters
Reranking top-k candidates with a cross-encoder can ______.
boost precision by scoring query–document pairs more deeply
lower latency in all cases
eliminate hallucinations entirely
remove the need for chunking
Chunking with moderate overlap helps because it ______.
preserves context that spans chunk boundaries
forces shorter indexes
improves OCR quality automatically
guarantees no duplicates
For cost-efficient storage at scale, systems often use ______.
product quantization or scalar quantization on vectors
sorting chunks alphabetically
gzip on token IDs only
PNG compression on embeddings
Maximal Marginal Relevance (MMR) is used to ______.
force chronological ordering
select diverse yet relevant results to reduce redundancy
learn embeddings end-to-end
deduplicate documents offline only
Metadata filters (e.g., by source or date) are valuable in RAG because they ______.
remove the need for evaluation
replace vector search entirely
guarantee perfect answers
constrain retrieval to trusted, timely content
A good grounding practice is to store ______ with each chunk.
source identifiers and offsets for citation
GPU model name
only the cosine score
window size used at index time
To reduce embedding costs in production, teams commonly ______.
re-embed on every query
push embeddings into cookies
cache vectors and avoid re-embedding unchanged text
compute vectors with random seeds only
Starter
Great beginning—keep exploring the core ideas and key trade-offs.
Solid
Strong grasp—practice applying these choices to real data and workloads.
Expert!
Excellent—your decisions reflect production-grade mastery.