Elevating Campus Learning through Retrieval-Augmented Generation. CK Base is a high-performance AI assistant that synthesizes course materials into cited, verifiable answers for university students.
Note: The current implementation is built using course materials from our college (FY Semester 1 and Computer Engineering Semester 3). Support for other colleges and programs will be added in the future as access to their academic resources becomes available.
Students face massive information overload during exam prep, with hundreds of pages of fragmented PDFs across various platforms.
- Unverifiability: General LLMs often hallucinate facts not present in the official syllabus.
- Speed: Manually searching through 100+ page PDFs for a single concept is inefficient.
- Accuracy: Academic queries require precise grounding in verified institutional data.
- ⚡ Sub-Millisecond Retrieval: Leveraging FAISS for low-latency semantic search across thousands of document fragments.
- 📊 Accuracy Scoring: Every response includes an Accuracy Score (0-1), generated by a secondary LLM "judge" checking for grounding.
- 📅 Semester-Aware Filtering: Metadata-locked retrieval ensures answers are specific to the student's current year (FY/SY) and semester.
- 🛡️ Institutional Security: Integrated with Google OAuth 2.0, restricted to verified university domains.
- 📍 Precision Citations: Automatic references to the specific PDF and subject used to generate the answer.
| Component | Technology | Purpose |
|---|---|---|
| Backend API | Flask + Python | REST API, RAG orchestration |
| AI/ML | Google Gemini API | LLM for answer generation & evaluation |
| Embeddings | Google Text Embeddings | Semantic text representation |
| Vector Database | FAISS | Sub-millisecond similarity search |
| Frontend | Next.js + TypeScript | Interactive chat UI |
| Authentication | Google OAuth 2.0 | Secure student sign-in |
| Infrastructure | Flask Dev Server | Can be deployed on Cloud Run |
| Data | Campus course PDFs | Processed into chunks + embeddings |
Create a .env file in the root directory with the following variables:
# Google Gemini API
GEMINI_API_KEY=your_gemini_api_key_here
# Flask Configuration
FLASK_SECRET_KEY=your_secret_key_here
FLASK_ENV=development
# Frontend URL (for CORS)
FRONTEND_URL=http://localhost:3000
# Google OAuth (optional, for auth)
GOOGLE_CLIENT_ID=your_google_client_id
GOOGLE_CLIENT_SECRET=your_google_client_secretgit clone https://github.com/SohaKhare/CampusKnowledgeBase.git
cd CampusKnowledgeBase# Navigate to backend
cd aiml
# Initialize and sync the virtual environment (.venv)
uv sync
# Activate the virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Start Flask server
uv run main.pyServer runs at: http://localhost:8000
# Navigate to frontend
cd frontend
# Install dependencies
npm install # or yarn install
# Start development server
npm run dev # or yarn devFrontend runs at: http://localhost:3000
CampusKnowledgeBase/
├── .gitignore # Root ignore file (node_modules, .venv, .env)
├── README.md # The main documentation file (Paste content here)
│
├── aiml/ # AI/ML Backend (Flask)
│ ├── .venv/ # Virtual environment (managed by uv)
│ ├── data/ # Institutional Knowledge (PDFs)
│ │ ├── FY/ # First Year materials
│ │ └── SY/ # Second Year materials
│ ├── ingestion/ # Pipeline to process PDFs into vectors
│ │ ├── ingest.py # Main script to run indexing
│ │ └── chunker.py # PDF splitting logic
│ ├── routes/ # Flask Blueprints for API endpoints
│ │ ├── auth_routes.py # Google OAuth logic
│ ├── askllm.py # Gemini API integration & Scoring logic
│ ├── rag.py # FAISS retrieval logic
│ ├── config.py # Environment & app configuration
│ ├── embedder.py # Vector embedding generation (Google AI)
│ ├── extensions.py # Shared Flask extensions (DB, Auth, etc.)
│ ├── pyproject.toml # uv dependency management file
│ └── main.py # Backend entry point
│
└── frontend/ # Next.js Frontend
├── public/ # Static assets (logos, icons)
├── src/
│ ├── app/ # Next.js App Router (pages)
│ ├── components/ # UI components (Chat, Sidebar, Navbar)
│ ├── lib/ # Utility functions (API callers)
│ └── types/ # TypeScript interface definitions
├── .env.local # Frontend environment variables
├── package.json # Node.js dependencies
└── next.config.ts # Next.js configuration
To eliminate AI hallucinations, we implement a Self-Correction Loop:
-
Retrieval
The system performs a semantic search to fetch the most relevant content chunks from the localdata/store based on the student’s query. -
Verification
Gemini cross-checks the generated answer against the retrieved chunks, ensuring that every statement is grounded in the original source material. -
Confidence Check
A grounding score (ranging from 0.0 to 1.0) is generated. If the score falls below a predefined threshold, the UI warns the student and encourages cross-verification using the cited PDF sources.
Have questions or found a bug? Open an issue or reach out to us!
Built with ❤️ by Saish, Shaurya, Soha and Bhoumik.