Skip to content

SohaKhare/CampusKnowledgeBase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🎓 CampusKnowledgeBase (CK Base)

Python Flask Next.js Gemini API FAISS Google OAuth


🎯 One-Liner

Elevating Campus Learning through Retrieval-Augmented Generation. CK Base is a high-performance AI assistant that synthesizes course materials into cited, verifiable answers for university students.

Note: The current implementation is built using course materials from our college (FY Semester 1 and Computer Engineering Semester 3). Support for other colleges and programs will be added in the future as access to their academic resources becomes available.


🚨 The Problem

Students face massive information overload during exam prep, with hundreds of pages of fragmented PDFs across various platforms.

  • Unverifiability: General LLMs often hallucinate facts not present in the official syllabus.
  • Speed: Manually searching through 100+ page PDFs for a single concept is inefficient.
  • Accuracy: Academic queries require precise grounding in verified institutional data.

✨ Key Features

  • ⚡ Sub-Millisecond Retrieval: Leveraging FAISS for low-latency semantic search across thousands of document fragments.
  • 📊 Accuracy Scoring: Every response includes an Accuracy Score (0-1), generated by a secondary LLM "judge" checking for grounding.
  • 📅 Semester-Aware Filtering: Metadata-locked retrieval ensures answers are specific to the student's current year (FY/SY) and semester.
  • 🛡️ Institutional Security: Integrated with Google OAuth 2.0, restricted to verified university domains.
  • 📍 Precision Citations: Automatic references to the specific PDF and subject used to generate the answer.

🛠️ Tech Stack

Component Technology Purpose
Backend API Flask + Python REST API, RAG orchestration
AI/ML Google Gemini API LLM for answer generation & evaluation
Embeddings Google Text Embeddings Semantic text representation
Vector Database FAISS Sub-millisecond similarity search
Frontend Next.js + TypeScript Interactive chat UI
Authentication Google OAuth 2.0 Secure student sign-in
Infrastructure Flask Dev Server Can be deployed on Cloud Run
Data Campus course PDFs Processed into chunks + embeddings

🚀 Getting Started

Environment Setup

Create a .env file in the root directory with the following variables:

# Google Gemini API
GEMINI_API_KEY=your_gemini_api_key_here

# Flask Configuration
FLASK_SECRET_KEY=your_secret_key_here
FLASK_ENV=development

# Frontend URL (for CORS)
FRONTEND_URL=http://localhost:3000

# Google OAuth (optional, for auth)
GOOGLE_CLIENT_ID=your_google_client_id
GOOGLE_CLIENT_SECRET=your_google_client_secret

Installation Steps

1️⃣ Clone the Repository

git clone https://github.com/SohaKhare/CampusKnowledgeBase.git
cd CampusKnowledgeBase

2️⃣ Backend Setup (Python + Flask)

# Navigate to backend
cd aiml

# Initialize and sync the virtual environment (.venv)
uv sync

# Activate the virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Start Flask server
uv run main.py

Server runs at: http://localhost:8000

3️⃣ Frontend Setup (Next.js)

# Navigate to frontend
cd frontend

# Install dependencies
npm install  # or yarn install

# Start development server
npm run dev  # or yarn dev

Frontend runs at: http://localhost:3000


📁 Project Structure

CampusKnowledgeBase/
├── .gitignore               # Root ignore file (node_modules, .venv, .env)
├── README.md                # The main documentation file (Paste content here)
│
├── aiml/                    # AI/ML Backend (Flask)
│   ├── .venv/               # Virtual environment (managed by uv)
│   ├── data/                # Institutional Knowledge (PDFs)
│   │   ├── FY/              # First Year materials
│   │   └── SY/              # Second Year materials
│   ├── ingestion/           # Pipeline to process PDFs into vectors
│   │   ├── ingest.py        # Main script to run indexing
│   │   └── chunker.py       # PDF splitting logic
│   ├── routes/              # Flask Blueprints for API endpoints
│   │   ├── auth_routes.py   # Google OAuth logic
│   ├── askllm.py            # Gemini API integration & Scoring logic
│   ├── rag.py               # FAISS retrieval logic
│   ├── config.py            # Environment & app configuration
│   ├── embedder.py          # Vector embedding generation (Google AI)
│   ├── extensions.py        # Shared Flask extensions (DB, Auth, etc.)
│   ├── pyproject.toml       # uv dependency management file
│   └── main.py              # Backend entry point
│
└── frontend/                # Next.js Frontend
    ├── public/              # Static assets (logos, icons)
    ├── src/
    │   ├── app/             # Next.js App Router (pages)
    │   ├── components/      # UI components (Chat, Sidebar, Navbar)
    │   ├── lib/             # Utility functions (API callers)
    │   └── types/           # TypeScript interface definitions
    ├── .env.local           # Frontend environment variables
    ├── package.json         # Node.js dependencies
    └── next.config.ts       # Next.js configuration

📊 How Accuracy Scoring Works

To eliminate AI hallucinations, we implement a Self-Correction Loop:

  • Retrieval
    The system performs a semantic search to fetch the most relevant content chunks from the local data/ store based on the student’s query.

  • Verification
    Gemini cross-checks the generated answer against the retrieved chunks, ensuring that every statement is grounded in the original source material.

  • Confidence Check
    A grounding score (ranging from 0.0 to 1.0) is generated. If the score falls below a predefined threshold, the UI warns the student and encourages cross-verification using the cited PDF sources.


📧 Contact & Support

Have questions or found a bug? Open an issue or reach out to us!

Built with ❤️ by Saish, Shaurya, Soha and Bhoumik.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors