RAG Application - Talk with your PDF

For this walkthrough, we will be using an application that is a RAG-based Chatbot. It will utilize a Qdrant vector store, Ollama for LLM serving, Langchain as the "glue" between these components, and Gradio as the UI engine.

This setup enables efficient RAG by leveraging vector search, embedding and an optimized inference engine.

Model Serving

Deploy Ollama Model Serving instance on the NERC OpenShift environment by following these intructions.

Pull the Required Model for RAG

Once the Ollama setup is successfully completed, you will be able to access the Open WebUI for Ollama as explained here.

Using Open WebUI, you can download and manage LLM models as per your need. For our RAG application, we are going to use Phi-3 model which is a family of lightweight 3B (Mini) and 14B (Medium) state-of-the-art open models by Microsoft. We are going to pull the Phi-3 model using Open WebUI as shown below:

Alternatively, you can Pull the Models using the "Terminal" connected to the Ollama pod.

Deploying a Vector Database

For our RAG application, we need a Vector Database to store the embeddings of different documents. In this case, we are using Qdrant.

You can deploy and run the Qdrant vector database directly on the NERC OpenShift environment by following these intructions.

After you follow those instructions, you should have a Qdrant instance ready to be populated with documents.

Facing Rate Limits While Pulling Container Image?

If you encounter Rate Limits while pulling Container Images, refer to this guide for detailed steps on how to resolve the issue.

Deploying as a Workbench Using a Data Science Project (DSP) on NERC RHOAI

Prerequisites:

Before proceeding, confirm that you have an active GPU quota that has been approved for your current NERC OpenShift Allocation through NERC ColdFront. Read more about How to Access GPU Resources on NERC OpenShift Allocation.

Procedure:

Navigating to the OpenShift AI dashboard.

Please follow these steps to access the NERC OpenShift AI dashboard.
Please ensure that you start your Jupyter notebook server with options as depicted in the following configuration screen. This screen provides you with the opportunity to select a notebook image and configure its options, including the Accelerator and Number of accelerators (GPUs).

For our example project, let's name it "RAG Workbench". We'll select the TensorFlow image with Recommended Version (selected by default), choose a Deployment size of Medium, Accelerator as None (no GPU is needed for this setup) and allocate a Cluster storage space of 20GB (Selected By Default).

Tip

The dashboard currently enforces a minimum storage volume size of 20GB. Please ensure that you modify this based on your need in Cluster Storage.

Here, you will use Environment Variables to define the key-value pairs required for connecting to the Qdrant vector database, specifically:
- QDRANT_COLLECTION: The name of the collection in Qdrant.
- QDRANT_API_KEY: The authentication key for accessing the Qdrant database.
  
  To retrieve the value of QDRANT__SERVICE__API_KEY from the qdrant-key Secret using the oc command, run:
```
oc get secret qdrant-key -o jsonpath='{.data.QDRANT__SERVICE__API_KEY}' | base64 --decode
```
Make sure these variables are properly set to establish a secure connection.
If this procedure is successful, you have started your RAG Workbench. When your workbench is ready and the status changes to Running, click the open icon () next to your workbench's name, or click the workbench name directly to access your environment:
Once you have successfully authenticated by clicking "mss-keycloak" when prompted, as shown below:

Next, you should see the NERC RHOAI JupyterLab Web Interface, as shown below:

The Jupyter environment is currently empty. To begin, populate it with content using Git. On the left side of the navigation pane, locate the Name explorer panel, where you can create and manage your project directories.

Learn More About Working with Notebooks

For detailed guidance on using notebooks on NERC RHOAI JupyterLab, please refer to this documentation.

Importing the tutorial files into the Jupyter environment

Bring the content of this tutorial inside your Jupyter environment:

On the toolbar, click the Git Clone icon:

Enter the following Git Repo URL: https://github.com/nerc-project/llm-on-nerc

Check the Include submodules option, and then click Clone.

In the file browser, double-click the newly-created llm-on-nerc folder.

Verification:

In the file browser, you should see the notebooks that you cloned from Git. Navigate to the llm-on-nerc/examples/notebooks/langchain directory, where you will find the Jupyter notebook file RAG_with_sources_Langchain-Ollama-Qdrant.ipynb, as shown below:

Double-click on this file to open it.

Very Important Note

Update the Ollama's BASE_URL and Qdrant's QDRANT_HOST in this notebook to match your deployment settings.

Integrating Qdrant with LangChain:

Once you have Qdrant set up, the next step is to integrate it with LangChain. The LangChain library provides various tools to interact with vector databases, including Qdrant.

The AI model, now enriched with additional data, including The Forgotten Lighthouse book pdf that is located at llm-on-nerc/examples/notebooks/langchain/datasource/The_Forgotten_Lighthouse_Book.pdf, can be queried. When asked, "Who is the starfish, and how do you know?", it processes the PDF and infers that Grandpa calls Sarah "my little starfish" in his letter. Since the model wasn't originally trained on this book, its response relies on RAG, demonstrating how AI can extract and infer new information without retraining.

In real life, this means you can run a pre-trained AI model (e.g., the Phi-3 model in our application) on your own data without ever sending it outside your premises for training.

The response to our query is as shown below: