Traditional Full-Text Search - TF-IDF (Term Frequency — Inverse Document Frequency) - Neural Network Embeddings : ranks documents based on their similarity in the vector space / HNSW algorithm - Hybrid Search / RFF (Reciprocal Rank Fusion)
limit. - Helps ranking sections of documents. - Each vector can embed a limited amount of data per model. - A long passage with multiple topics into a single vector can cause important nuance can get lost. - Overlapping text might be helpful.
3272, 262, 4675, 780, 340, 373, 1165, 10032, 13] 60 chars (76 chars, 17 tokens) (55 chars, 24 tokens) [0.653249, -0.211342, 0.000436 … -0.532995, 0.900358, 0.345422] 13 tokens N-dimensional embedding vector per token …a continuous space representation we can use as model input Embeddings for similar concepts will be close to each other in N-dimensional space (e.g., vectors for “dog” and “hound” will have a cosine similarity closer to 1 than “dog” and “chair”) Less common words will tend to split into multiple tokens: There’s a bias towards English in the BPE corpus: dog chair hound
between a token and all other tokens in the context • Multiple heads in a layer focus on learning different relationships, including grammar and semantic
ways: • Via model weights (i.e., fine-tune the model on a training set) teaching specialized tasks, less reliable for factual recall. not a base training, salt is in the water • Via model inputs (i.e., insert the knowledge into an input message) short-term memory, bound by token limits.
teaching a new task, not new information or knowledge. • It is not a reliable way to store knowledge as part of the model. • Fine-tuning does not overrule hallucination (confabulation). • Slow, difficult and expensive. • Fine tuning is 1000x more difficult compared to prompt engineering.