Building an RAG Application with Spring Boot and LangChain4j

In today's world of AI-powered applications, Retrieval-Augmented Generation has emerged as a very potent technique for enhancing Large Language Models with external knowledge. This post will help you get started in building a RAG application using Spring Boot and LangChain4j—a Java library to build applications using Large Language Models.

What is RAG?

RAG combines the generative capacity of LLMs with their ability to retrieve relevant information from the application. Such an approach allows the model to invoke a lot of types of up-to-date or domain-specific knowledgeable ones not encapsulated in the model's training dataset itself, which would allow for more correct and contextually relevant responses.

What Is Possible with RAG?

RAG technology spans a large variety of applications across industries and several scenarios. Here are some impressive examples of the things that are possible using RAG:

Enhanced Customer Support

RAG will have the capability to power intelligent chatbots and virtual assistants that leverage corporate knowledge bases to provide accurate, context-aware responses to customer questions, reducing the burden on human support staff and improving response times.

Personalized Content Recommendations

RAG can be used to individualize recommendations, be it in streaming services, an e-commerce platform, or news aggregators; this is possible due to its ability to make use of user data and a great library of content.

Legal Research and Analysis

RAG stands ready to help lawyers and paralegals tremendously: by reading overnight what would have taken a junior lawyer months, it provides relevant case law, statutes, and other legal documents almost instantly and produces summaries or answers specific legal questions based upon this information.

Medical Diagnosis Support

By having access to medical literature and patient records, RAG could help healthcare professionals with suggestions of probable diagnosis, treatment options, or even relevant research papers as per the symptoms and history of patients.

Financial Analysis and Reporting

RAG is able to analyze a great volume of financial data, company reports, and market news and come up with insightful financial analysis, risk assessment, or investment recommendation.

Prerequisites

Before we get started, first check that the following is on your machine:

Java Development Kit, JDK 17 or later
Maven or Gradle
IDE: IntelliJ IDEA, Eclipse, or VS Code

Based on the description above, the following are prerequisites:

An OpenAI API key that allows you to access GPT models
Basic knowledge of Spring Boot and Java

Setting Up the Project

Let's first set up a new Spring Boot project. Here are the steps:

Open start.spring.io
Select the options Maven and Java
Select the Spring Boot 3.x project
Add the following dependencies: Spring Web, Spring Data JPA, H2 Database

Generate the project and download it.

Now, open the project in your IDE and add the following LangChain4j dependencies in your pom.xml:

<dependencies>

    <!-- Other dependencies -->
    
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j</artifactId>
        <version>0.31.0</version>
    </dependency>
    <dependency>
        <groupId>dev.langchain4j</groupId>
        <artifactId>langchain4j-open-ai</artifactId>
        <version>0.31.0</version>
    </dependency>
</dependencies>

We will use these components with our RAG application:

Document Loader – to ingest data from external data sources
Embedding Model – to convert the text into vector representations
Vector Store – to index embeddings and to search
LLM – to generate the prompt completion response
A RAG Chain – to chain together retrieval and generation steps

Step 1: Document Loading

A simple document loader that reads text files from a directory:

package com.pengyu.ragdemo;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.loader.FileSystemDocumentLoader;

import java.nio.file.Path;
import java.util.List;

public class DocumentLoader {

    public List<Document> loadDocuments(String directoryPath) {
        return FileSystemDocumentLoader.loadDocuments(Path.of(directoryPath));
    }
}

Step 2: Embedding Model

Now we will use OpenAI's embedding model to convert our documents into vector representations:

Now let's create an in-memory vector store for indexing and searching our embeddings.

import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.openai.OpenAiEmbeddingModel;

public class EmbeddingService {

    private final EmbeddingModel embeddingModel;

    public EmbeddingService(String openAiApiKey) {
        this.embeddingModel = OpenAiEmbeddingModel.builder()
                .apiKey(openAiApiKey)
                .build();
    }

    public EmbeddingModel getEmbeddingModel() {
        return embeddingModel;
    }
}

Step 3: Vector Store

Now let's create an in-memory vector store for indexing and searching our embeddings.

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.embedding.Embedding;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
import dev.langchain4j.data.segment.TextSegment;

import java.util.List;

public class VectorStore {

    private final EmbeddingStore<TextSegment> embeddingStore;
    private final EmbeddingModel embeddingModel;

    public VectorStore(EmbeddingModel embeddingModel) {
        this.embeddingStore = new InMemoryEmbeddingStore<>();
        this.embeddingModel = embeddingModel;
    }

    public void addDocuments(List<Document> documents) {
        for (Document document : documents) {
            String text = document.text();
            TextSegment segment = TextSegment.from(text);
            Embedding embedding = embeddingModel.embed(segment).content();
            embeddingStore.add(embedding, segment);
        }
    }

    public EmbeddingStore<TextSegment> getEmbeddingStore() {
        return embeddingStore;
    }
}

Step 4: Language Model

We'll use OpenAI's GPT model as our language model:

import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.openai.OpenAiChatModel;

public class LanguageModelService {

    private final ChatLanguageModel chatLanguageModel;

    public LanguageModelService(String openAiApiKey) {
        this.chatLanguageModel = OpenAiChatModel.builder()
                .apiKey(openAiApiKey)
                .build();
    }

    public ChatLanguageModel getChatLanguageModel() {
        return chatLanguageModel;
    }
}

Step 5: RAG Chain

Now, let's create our RAG chain that combines retrieval and generation:

import dev.langchain4j.chain.ConversationalRetrievalChain;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.retriever.EmbeddingStoreRetriever;
import dev.langchain4j.store.embedding.EmbeddingStore;

public class RagChain {

    private final ConversationalRetrievalChain chain;

    public RagChain(ChatLanguageModel chatLanguageModel, EmbeddingModel embeddingModel, EmbeddingStore<TextSegment> embeddingStore) {
        EmbeddingStoreRetriever retriever = EmbeddingStoreRetriever.from(embeddingStore, embeddingModel, 2);
        this.chain = ConversationalRetrievalChain.builder()
                .chatLanguageModel(chatLanguageModel)
                .retriever(retriever)
                .build();
    }

    public String ask(String question) {
        return chain.execute(question);
    }
}

Step 6: Application Configuration

Create a configuration class to set up our components:

import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class RagConfig {

    @Value("${openai.api.key}")
    private String openAiApiKey;

    @Bean
    public DocumentLoader documentLoader() {
        return new DocumentLoader();
    }

    @Bean
    public EmbeddingService embeddingService() {
        return new EmbeddingService(openAiApiKey);
    }

    @Bean
    public VectorStore vectorStore(EmbeddingService embeddingService) {
        return new VectorStore(embeddingService.getEmbeddingModel());
    }

    @Bean
    public LanguageModelService languageModelService() {
        return new LanguageModelService(openAiApiKey);
    }

    @Bean
    public RagChain ragChain(LanguageModelService languageModelService, EmbeddingService embeddingService, VectorStore vectorStore) {
        return new RagChain(
                languageModelService.getChatLanguageModel(),
                embeddingService.getEmbeddingModel(),
                vectorStore.getEmbeddingStore()
        );
    }
}

Step 7: RESTful API

Finally, let's create a controller to expose our RAG functionality:

import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RestController;

@RestController
public class RagController {

    private final RagChain ragChain;
    private final DocumentLoader documentLoader;
    private final VectorStore vectorStore;

    public RagController(RagChain ragChain, DocumentLoader documentLoader, VectorStore vectorStore) {
        this.ragChain = ragChain;
        this.documentLoader = documentLoader;
        this.vectorStore = vectorStore;
    }

    @PostMapping("/load-documents")
    public String loadDocuments(@RequestBody String directoryPath) {
        var documents = documentLoader.loadDocuments(directoryPath);
        vectorStore.addDocuments(documents);
        return "Documents loaded successfully";
    }

    @PostMapping("/ask")
    public String ask(@RequestBody String question) {
        return ragChain.ask(question);
    }
}

Running the Application

Add your OpenAI API key to application.properties:

openai.api.key=your-api-key-here

Run the Spring Boot application.
Load documents into the vector store:


Invoke-WebRequest -Uri "http://localhost:8080/load-documents" -Method POST -Headers @{"Content-Type"="text/plain"} -Body "C:\path\to\your\documents"

Ask questions:


Invoke-WebRequest -Uri "http://localhost:8080/ask" -Method POST -Headers @{"Content-Type"="text/plain"} -Body "What are Key Features of IT Ireland?"

Successfully retrieved information from provided documents.

Conclusion

We have successfully developed a RAG application using Spring Boot and LangChain4j. The application ingests the documents, generates embeddings, stores them in a vector database, and finally uses these stored embeddings to augment an LLM's responses.

Improvements and possible extensions one might consider:

A far more sophisticated document loader that loads many types of documents.
A persistent vector store such as Pinecone or Weaviate for scaling.
Add conversation history to enable context awareness across interactions.
Implement error handling and input validation. Add authentication and rate limiting to protect your API.

The RAG is an extremely useful technique that empowers LLMs to perform exceptionally well in such real-world applications. You can combine the strengths of Spring Boot and LangChain4j to build robust, scalable, and intelligent applications that have the power of AI, while keeping the flexibility and reliability of the Spring ecosystem.
Keep in mind that API keys should be kept secret and that there will be usage limits and guidelines for services from third-party AI vendors. Happy coding!