In this tutorial, we will explore how to use Llama3 Full Rag API with Ollama, LangChain, and ChromaDB, and create a Flask API that facilitates the integration of these tools. Additionally, we will also implement functionality for uploading PDF files, parsing them, and extracting text using Python’s Flask framework.
Before we get started, let’s briefly discuss each of the tools we will be using in this tutorial:
-
Llama3 Full Rag: Llama3 Full Rag is a natural language processing library that provides tools for text generation, summarization, and language generation.
-
Ollama: Ollama is a tool for language translation and generation that uses deep learning algorithms to generate high-quality translations.
-
LangChain: LangChain is a tool for creating language chains, which are sequences of words that form coherent and meaningful sentences.
-
ChromaDB: ChromaDB is a database for storing and managing linguistic data, such as text corpora, lexicons, and language models.
- Flask: Flask is a lightweight and versatile web framework for building web applications in Python.
Now, let’s start by setting up our development environment and installing the necessary dependencies:
-
First, create a new directory for your project and navigate into it. You can do this using the following commands in your terminal:
mkdir llama3_full_rag_api cd llama3_full_rag_api
-
Next, create a virtual environment for your project to isolate its dependencies. You can use the following command to create a virtual environment named ‘env’:
python3 -m venv env
-
Activate the virtual environment using the following command:
source env/bin/activate
- Now, install the necessary dependencies for our project using the following command:
pip install flask flask-restful requests PyPDF2 llamas llamas-ollama llamas-langchain llamas-chromadb
With our environment set up and dependencies installed, let’s create the Flask API for our project. We will start by defining the routes and functionality for our API:
-
Create a new Python file named ‘app.py’ in your project directory and open it in your favorite code editor.
-
Import the necessary modules and libraries at the beginning of the file:
from flask import Flask, request, jsonify from flask_restful import Api, Resource from PyPDF2 import PdfFileReader from llamas import Llama3 from llamas.components.ollama import Ollama from llamas.chains.langchain import LangChain from llamas.database.chromadb import ChromaDB
-
Instantiate the Flask application and create an instance of the Api class:
app = Flask(__name__) api = Api(app)
-
Define a route for uploading PDF files and extracting text from them:
class PDFUpload(Resource): def post(self): if 'file' not in request.files: return jsonify({'error': 'No file part'}) file = request.files['file'] if file.filename == '': return jsonify({'error': 'No selected file'}) if file: pdf = PdfFileReader(file) text = '' for page_num in range(pdf.numPages): text += pdf.getPage(page_num).extract_text() return jsonify({'text': text})
-
Define a route for processing text using Llama3 Full Rag, Ollama, LangChain, and ChromaDB:
class TextProcessing(Resource): def post(self): data = request.get_json() text = data['text'] llama = Llama3() ollama = Ollama(llama) langchain = LangChain(llama) chromadb = ChromaDB(llama) processed_text = { 'ollama_translation': ollama.translate(text), 'langchain_sentence': langchain.create_sentence(text), 'chromadb_synonyms': chromadb.get_synonyms(text) } return jsonify(processed_text)
- Add the routes to the Flask application and run the application:
api.add_resource(PDFUpload, '/pdf/upload') api.add_resource(TextProcessing, '/text/process')
if name == ‘main‘:
app.run(debug=True)
With the Flask API set up, we can now test the functionality by starting the Flask development server and making requests to the defined routes using tools such as Postman.
To start the Flask development server, run the following command in your terminal:
```bash
python app.py
You should see output indicating that the Flask application is running. Now, you can open Postman or another API testing tool, and make requests to the defined routes to upload PDF files and process text using Llama3 Full Rag, Ollama, LangChain, and ChromaDB.
In this tutorial, we have explored how to create a Flask API that integrates Llama3 Full Rag, Ollama, LangChain, and ChromaDB, and implements functionality for uploading PDF files and processing text. You can further enhance this project by adding more features and customization based on your requirements and the capabilities of the tools used.
Finally a complete Fast and Simple end to end API using Ollama, Llama3, LangChain, ChromaDB, Flask and PDF processing for a complete RAG system. If you like this one check out my video on setting up an AWS Server with GPU Support – https://youtu.be/dJX9x7bETe8
Definitely the best tutorial I've found on Youtube. I especially appreciated that you included the problems you found while implementing the code like packages not yet installed, because when someone look at tutorial usually he sees that everything is always working fine but it's not what really happens when doing it for the first time. Great job.
Thanks for your video. Have a question. If I a very big pdf, will this embedding data take more tokens? And what is the max length of the pdf?
the code is blurred and to small, hard to read
Where is the calling API?
Great content. I was able to clone the git repo and installed all the requirements. However, when I run app.py file, I'm getting an error. can somebody advise on what to do here?
"{ImportError: cannot import name 'EVENT_TYPE_OPENED' from 'watchdog.events' (/opt/anaconda3/lib/python3.11/site-packages/watchdog/events.py)".
hi bro i tried this program will show error fastembed will not import but i will already install the package and again and again same error will show
Amazing video! Your explanation is super insightful and well-presented. I'm curious—do you have any thoughts or experience with using Ollama in a production environment? I'm not sure if Ollama can handle multiple requests at scale.
If I were to implement something like this in production, would you recommend Ollama, or would alternatives like llama.cpp or vllm be better suited? Would love to hear your perspective on its scalability and performance. Thanks again for sharing such awesome content!
The best tutorial
Hello i have a problem, the variable context is not declared
WINDOWS USERS INSTALL PROBLEM:
"uvloop" says it doesn't work on windows
remove it from requirements.txt
it should install and seems to work anyways
(I haven't tested the RAG part, only that the API call works)
I tried so hard to get lanchain_community to work but it wouldn't. I installed it and it would show in my env but it wouldn't work. I used 3.8 and 3.10 both and went through multiple doc and even took help of chatgpt for troubleshooting….It wouldn't work though. I am not sure why it wouldn't import langchain_community at all
Very excellent. I am doing chatbot creation for a bank and privacy/ security is of utmost importance. I'll surely use knowledge gained here.
At minute 23:00, I was getting an error with fastembedembeddings
This was the fix
There was a new version of langchain_community released 3 weeks ago. You need to roll back to the earlier versions. Run these commands and then it will work.
pip3 uninstall langchain_community.
pip3 install langchain_community==0.2.6.
pip3 uninstall fastembed
pip3 install fastembed==0.3.2
Postman gives this error, can anyone help?
<!doctype html>
<html lang=en>
<title>415 Unsupported Media Type</title>
<h1>Unsupported Media Type</h1>
<p>Did not attempt to load JSON data because the request Content-Type was not 'application/json'.</p>
Sorry for my written english, I just speak spanish: google. translate helped me : )
First Time Ever that I can finish a RAG project with chroma and llama3. Literally i saw dozens of videos y I was loosing my mind, because all videos out there are too complicated and langchain is huge and hard,(for me). Literaly I was loosing my mind. Thanks so much for took your time and give us this great tutor.
For some reason the context of my raw_prompt is empty (I dont know) why and I am not getting answer… I am getting this message when creatae my retriever:
/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain_core/vectorstores.py:342: UserWarning: No relevant docs were retrieved using the relevance score threshold 0.1. I assume that for that reason context is not injected in raw prompt?
I tried adding a dockerfile and run in docker I am getting error
fastembed.common.model_management:download_model:227 – Could not download model from HuggingFace: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/qdrant/bge-small-en-v1.5-onnx-q/revision/main (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1007)')))"), '(Request ID: ec0cb652-4ddc-4224-9caf-91eb57204534)')Falling back to other sources.
any thoughts?
Can you do one for Windows or Linux 😂 sorry im a bit lost too. Guess I need more python knowledge
Thats way man, great tutorial. Thank you
Dude do you sell any training. Excellent video