Rag training - Intro - news articles

arch-diagram-news-article Diagramming done via DiagramAsCode

Rag Training - Intro

It takes about 21 news articles in text files, loads them into a simulated pipeline, splits text into chunks (specified chunk size and overlap as an array) into chunks, passing each chunk through an embedding model (text-embedding-3-small via OpenAI), which vectorizes it and then indexes it and stores it into a vector database.

The subsequent queries will query the vector database.

It is a simple exercise to code up a rag retrieval to query (using an LLM) a specific set of documents.