Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Microsoft JDConf 2025 - RAG & Tools for develop...

Microsoft JDConf 2025 - RAG & Tools for developers with LangChain4j

Are you utterly confused about RAG, what it is, how it works, and what things you need to consider when doing it? I know I was when I started learning about it!

If you feel the same, join me as I take a technology-agnostic walk through exactly what RAG is and then demonstrate various Java implementations using LangChain4j.

Then we will look at tools and agents and break that down as well, explaining everything from a technology-agnostic point of view, then demonstrating various implementations using LangChain4j.

Eric Deandrea

April 09, 2025
Tweet

Video

More Decks by Eric Deandrea

Other Decks in Technology

Transcript

  1. @edeandrea Eric Deandrea, Red Hat Java Champion | Senior Principal

    Developer Advocate RAG & Tools with LangChain4j
  2. @edeandrea • Java Champion • 26+ years software development experience

    • Contributor to Open Source projects Quarkus Spring Boot, Spring Framework, Spring Security LangChain4j (& Quarkus LangChain4j) Wiremock Microcks • Boston Java Users ACM Chapter Board Member • Published Author About Me
  3. @edeandrea • Showcase & explain Quarkus, how it enables modern

    Java development & the Kubernetes-native experience • Introduce familiar Spring concepts, constructs, & conventions and how they map to Quarkus • Equivalent code examples between Quarkus and Spring as well as emphasis on testing patterns & practices 3 https://red.ht/quarkus-spring-devs
  4. @edeandrea Retrieval Augmented Generation (RAG) Enhance LLM knowledge by providing

    relevant information in real-time from other sources – Dynamic data that changes frequently Fine-tuning is expensive! 2 stages Indexing / Ingestion Retrieval / Augmentation
  5. @edeandrea Indexing / Ingestion What do I need to think

    about? What is the representation of the data? How do I want to split? Per document? Chapter? Sentence? How many tokens do I want to end up with?
  6. @edeandrea Indexing / Ingestion Compute an embedding (numerical vector) representing

    semantic meaning of each segment. Requires an embedding model In-process/Onnx, Amazon Bedrock, Azure OpenAI, Cohere, DashScope, Google Vertex AI, Hugging Face, Jine, Jlama, LocalAI, Mistral, Nomic, Ollama, OpenAI, OVHcloud, Voyage AI, Cloudfare Workers AI, Zhipu AI
  7. @edeandrea Store embedding alone or together with segment. Requires a

    vector store In-memory, Chroma, Elasticsearch, Milvus, Neo4j, OpenSearch, Pinecone, PGVector, Redis, Vespa, Weaviate, Qdrant Indexing / Ingestion
  8. @edeandrea Indexing / Ingestion var ingestor = EmbeddingStoreIngestor.builder() .embeddingModel(embeddingModel) .embeddingStore(embeddingStore)

    // Add userId metadata entry to each Document to be able to filter by it later .documentTransformer(document -> { document.metadata().put("userId", "12345"); return document; }) // Split each Document into TextSegments of 1000 tokens each with a 200-token overlap .documentSplitter(DocumentSplitters.recursive(1000, 200)) // Add the name of the Document to each TextSegment to improve the quality of search .textSegmentTransformer(textSegment -> TextSegment.from( textSegment.metadata().getString("file_name") + "\n" + textSegment.text(), textSegment.metadata() ) ) .build(); // Get the path of where the documents are and load them recursively Path path = Path.of(...); List<Document> documents = FileSystemDocumentLoader.loadDocumentsRecursively(path); // Ingest the documents into the embedding store ingestor.ingest(documents);
  9. @edeandrea Retrieval / Augmentation Compute an embedding (numerical vector) representing

    semantic meaning of the query. Requires an embedding model.
  10. @edeandrea Retrieval / Augmentation Retrieve & rank relevant content based

    on cosine similarity or other similarity/distance measures.
  11. @edeandrea Retrieval / Augmentation Augment input to the LLM with

    related content. What do I need to think about? Will I exceed the max number of tokens? How much chat memory is available?
  12. @edeandrea Retrieval / Augmentation public class RagRetriever { @Produces @ApplicationScoped

    public RetrievalAugmentor create(EmbeddingStore store, EmbeddingModel model) { var contentRetriever = EmbeddingStoreContentRetriever. builder() .embeddingModel(model) .embeddingStore(store) .maxResults( 3) .minScore( 0.75) .filter( metadataKey("userId").isEqualTo("12345")) .build(); return DefaultRetrievalAugmentor. builder() .contentRetriever(contentRetriever) .build(); } }
  13. @edeandrea public class RagRetriever { @Produces @ApplicationScoped public RetrievalAugmentor create(EmbeddingStore

    store, EmbeddingModel model) { var embeddingStoreRetriever = EmbeddingStoreContentRetriever.builder() .embeddingModel(model) .embeddingStore(store) .maxResults(3) .minScore(0.75) .filter(metadataKey("userId").isEqualTo("12345")) .build(); var googleSearchEngine = GoogleCustomWebSearchEngine.builder() .apiKey(System.getenv("GOOGLE_API_KEY")) .csi(System.getenv("GOOGLE_SEARCH_ENGINE_ID")) .build(); var webSearchRetriever = WebSearchContentRetriever.builder() .webSearchEngine(googleSearchEngine) .maxResults(3) .build(); return DefaultRetrievalAugmentor.builder() .queryRouter(new DefaultQueryRouter(embeddingStoreRetriever, webSearchRetriever)) .build(); } } Advanced RAG https://github.com/cescoffier/langchain4j-deep-dive/blob/main/4-rag/src/main/java/dev/langchain4j/quarkus/deepdive/RagRetriever.java
  14. @edeandrea public class RagRetriever { @Produces @ApplicationScoped public RetrievalAugmentor create(EmbeddingStore

    store, EmbeddingModel model, ChatLanguageModel chatModel) { var embeddingStoreRetriever = ... var webSearchRetriever = ... var queryRouter = LanguageModelQueryRouter.builder() .chatLanguageModel(chatModel) .fallbackStrategy(FallbackStrategy.ROUTE_TO_ALL) .retrieverToDescription( Map.of( embeddingStoreContentRetriever, “Local Documents”, webSearchContentRetriever, “Web Search” ) ) .build(); return DefaultRetrievalAugmentor.builder() .queryRouter(queryRouter) .build(); } } Advanced RAG https://github.com/cescoffier/langchain4j-deep-dive/blob/main/4-rag/src/main/java/dev/langchain4j/quarkus/deepdive/RagRetriever.java
  15. @edeandrea Agent and Tools A tool is a function that

    the model can call: - Tools are parts of CDI beans - Tools are defined and described using @Tool Prompt (Context) Extend the context with tool descriptions Invoke the model The model asks for a tool invocation (name + parameters) The tool is invoked (on the caller) and the result sent to the model The model computes the response using the tool result Response
  16. @edeandrea Tools - A tool is just a method -

    It can access databases, or invoke a remote service - It can also use another LLM Tools require memory Application
  17. @edeandrea Using tools with LangChain4j Assistant assistant = AiServices.builder(Assistant.class) .chatLanguageModel(

    model) .tools(new Calculator()) .chatMemory( MessageWindowChatMemory .withMaxMessages(10)) .build(); static class Calculator { @Tool("Calculates the length of a string") int stringLength(String s) { return s.length(); } @Tool("Calculates the square root of a number" ) double sqrt(int x) { System.out.println("Called sqrt() with x=" + x); return Math.sqrt(x); } } Objects to use as tools Declare an tool method (description optional)
  18. @edeandrea Giving access to database (Quarkus Panache) @ApplicationScoped public class

    BookingRepository implements PanacheRepository<Booking> { @Tool("Cancel a booking" ) @Transactional public void cancelBooking(long bookingId, String firstName, String lastName) { var booking = getBookingDetails( bookingId, firstName, lastName); delete(booking); } @Tool("List booking for a customer" ) public List<Booking> listBookingsForCustomer (String name, String surname) { return Customer.find("firstName = ?1 and lastName = ?2" , name, surname) .singleResultOptional() .map(found -> list( "customer", found)) .orElseGet(List::of); } }
  19. @edeandrea Agentic AI @RegisterAiService public interface WeatherForecastAgent { @SystemMessage("You are

    a meteorologist ...") @Toolbox({ CityExtractorAgent.class, ForecastService.class, GeoCodingService.class }) String forecast(String query); } @RegisterAiService public interface CityExtractorAgent { @Tool("Extracts the city name from a given question") @UserMessage("Extract the city name from {question}") String extractCity(String question); } @RegisterRestClient public interface ForecastService { @Tool("Forecasts the weather for the given coordinates") @ClientQueryParam(name = "forecast_days", value = "?") WeatherForecast forecast(@RestQuery double latitude, @RestQuery double longitude); }
  20. @edeandrea Web Search Tools (Tavily) @UserMessage(""" Search for information about

    the user query: {query}, and answer the question. """) @ToolBox(WebSearchTool.class) String chat(String query); Provided by quarkus-langchain4j-tavily Can also be used with RAG
  21. @edeandrea Risks • Things can go wrong quickly • Risk

    of prompt injection • Audit is very important to check the parameters • Distinction between read and write tools Application