ChatWithPDF-Rag-App

Font Awesome Icons

Overview

ChatWithPDF-Rag-App is an intelligent document interaction system that allows users to have natural conversations with PDF documents. The application uses RAG (Retrieval-Augmented Generation) technology to provide accurate, context-aware responses to questions about PDF content.

Application Interface

Key Features

PDF Question Answering: Ask questions about any uploaded PDF document
Custom Prompt Engineering: Uses carefully crafted prompts for accurate and context-aware responses
Source Document Tracking: Displays the source documents used to generate each answer
Advanced Language Models:
- Uses sentence-transformers/all-MiniLM-L6-v2 for embeddings
- Powered by mistralai/Mistral-7B-Instruct-v0.3 for language generation

Technical Details

Backend: Python 3.11+
Frontend: Streamlit
Vector Database: FAISS for efficient similarity search
Authentication: Hugging Face API integration
Duration: Mar 2024 – Present

Installation and Setup

Clone the repository:

git clone https://github.com/ashrafulparan2/ChatWithPDF-Rag-App.git
cd ChatWithPDF-Rag-App

Install dependencies:
```
pip install -r requirements.txt
```
Set up Hugging Face authentication:
```
huggingface-cli login
```

Run the application:

streamlit run app.py --server.enableCORS false --server.enableXsrfProtection false

Implementation Details

Vector Store Setup

The application uses FAISS (Facebook AI Similarity Search) for efficient document retrieval
Requires pre-generated vectorstore files (index.faiss and index.pkl) in the vectorstore/db_faiss/ directory
Embeddings are generated using the sentence-transformers/all-MiniLM-L6-v2 model

Question Answering Pipeline

User uploads a PDF document
Document is processed and vectorized
User questions are embedded and matched against the document vectors
Relevant context is retrieved and fed to the Mistral-7B model
Model generates natural, context-aware responses

User Interface

Clean, intuitive Streamlit interface
Drag-and-drop PDF upload
Real-time question answering
Source document display for transparency
File size limit of 200MB per PDF

Ashraful Islam Paran