RAG Application - Talk with your PDF
For this walkthrough, we will be using an application that is a RAG-based Chatbot. It will utilize a Qdrant vector store, Ollama for LLM serving, Langchain as the "glue" between these components, and Gradio as the UI engine.
This setup enables efficient RAG by leveraging vector search, embedding and an optimized inference engine.
Model Serving
Deploy Ollama Model Serving instance on the NERC OpenShift environment by following these intructions.
Pull the Required Model for RAG
Once the Ollama setup is successfully completed, you will be able to access the Open WebUI for Ollama as explained here.
Using Open WebUI, you can download and manage LLM models as per your need. For our RAG application, we are going to use Phi-3 model which is a family of lightweight 3B (Mini) and 14B (Medium) state-of-the-art open models by Microsoft. We are going to pull the Phi-3 model using Open WebUI as shown below:
Alternatively, you can Pull the Models using the "Terminal" connected to the Ollama pod.
Deploying a Vector Database
For our RAG application, we need a Vector Database to store the embeddings of different documents. In this case, we are using Qdrant.
You can deploy and run the Qdrant vector database directly on the NERC OpenShift environment by following these intructions.
After you follow those instructions, you should have a Qdrant instance ready to be populated with documents.
Facing Rate Limits While Pulling Container Image?
If you encounter Rate Limits while pulling Container Images, refer to this guide for detailed steps on how to resolve the issue.
Deploying as a Workbench Using a Data Science Project (DSP) on NERC RHOAI
Prerequisites:
- Before proceeding, confirm that you have an active GPU quota that has been approved for your current NERC OpenShift Allocation through NERC ColdFront. Read more about How to Access GPU Resources on NERC OpenShift Allocation.
Procedure:
-
Navigating to the OpenShift AI dashboard.
Please follow these steps to access the NERC OpenShift AI dashboard.
-
Please ensure that you start your Jupyter notebook server with options as depicted in the following configuration screen. This screen provides you with the opportunity to select a notebook image and configure its options, including the Accelerator and Number of accelerators (GPUs).
For our example project, let's name it "RAG Workbench". We'll select the TensorFlow image with Recommended Version (selected by default), choose a Deployment size of Medium, Accelerator as None (no GPU is needed for this setup) and allocate a Cluster storage space of 20GB.
Tip
The dashboard currently enforces a minimum storage volume size of 20GB. Please ensure that you modify this based on your need in Cluster Storage.
Here, you will use Environment Variables to define the key-value pairs required for connecting to the Qdrant vector database, specifically:
-
QDRANT_COLLECTION
: The name of the collection in Qdrant. -
QDRANT_API_KEY
: The authentication key for accessing the Qdrant database.To retrieve the value of
QDRANT__SERVICE__API_KEY
from theqdrant-key
Secret using theoc
command, run:oc get secret qdrant-key -o jsonpath='{.data.QDRANT__SERVICE__API_KEY}' | base64 --decode
Make sure these variables are properly set to establish a secure connection.
-
-
If this procedure is successful, you have started your RAG Workbench. When your workbench is ready, the status will change to Running and you can select "Open" to go to your environment:
-
Once you have successfully authenticated by clicking "mss-keycloak" when prompted, as shown below:
Next, you should see the NERC RHOAI JupyterLab Web Interface, as shown below:
The Jupyter environment is currently empty. To begin, populate it with content using Git. On the left side of the navigation pane, locate the Name explorer panel, where you can create and manage your project directories.
Learn More About Working with Notebooks
For detailed guidance on using notebooks on NERC RHOAI JupyterLab, please refer to this documentation.
Importing the tutorial files into the Jupyter environment
Bring the content of this tutorial inside your Jupyter environment:
On the toolbar, click the Git Clone icon:
Enter the following Git Repo URL: https://github.com/nerc-project/llm-on-nerc
Check the Include submodules option, and then click Clone.
In the file browser, double-click the newly-created llm-on-nerc folder.
Verification:
In the file browser, you should see the notebooks that you cloned from Git. Navigate
to the llm-on-nerc/examples/notebooks/langchain
directory, where you will find
the Jupyter notebook file RAG_with_sources_Langchain-Ollama-Qdrant.ipynb
, as
shown below:
Double-click on this file to open it.
Very Important Note
Update the Ollama's BASE_URL and Qdrant's QDRANT_HOST in this notebook to match your deployment settings.
Integrating Qdrant with LangChain:
Once you have Qdrant set up, the next step is to integrate it with LangChain. The LangChain library provides various tools to interact with vector databases, including Qdrant.
The AI model, now enriched with additional data, including The Forgotten Lighthouse
book pdf that is located at llm-on-nerc/examples/notebooks/langchain/datasource/The_Forgotten_Lighthouse_Book.pdf
,
can be queried. When asked, "Who is the starfish, and how do you know?", it
processes the PDF and infers that Grandpa calls Sarah "my little starfish" in his
letter. Since the model wasn't originally trained on this book, its response
relies on RAG, demonstrating how AI can extract and infer new information without
retraining.
In real life, this means you can run a pre-trained AI model (e.g., the Phi-3 model in our application) on your own data without ever sending it outside your premises for training.
The response to our query is as shown below: