ParkerBackend API

ParkerBackend is a FastAPI backend that turns uploaded learning materials into a queryable knowledge graph. It is built for student-facing workflows: upload course content, run AI extraction, and query relationships/subgraphs for study assistance.

This README is intentionally API-first and hackathon-ready.

Why This Project Fits the Hackathon

Student scenario: convert class documents into structured knowledge for studying.
Meaningful AI: LLM extraction + optional enrichment (OpenAlex/Canvas) directly powers graph creation.
Demo-ready: clean upload -> extract -> query flow with deterministic API contracts.
Suggested guiding category: Studying.

Demo Story (5 Minutes)

Upload one class document (POST /upload).
Extract graph (POST /extract) and show added entity/relationship counts.
Query confidence-filtered relationships (GET /query/relationships).
Query a subgraph by source or entity (/query/subgraph/...) and show how it supports review.
Close with responsible AI notes (limits, validation, error paths).

Team Roles (Suggested Ownership)

Tech Lead: API reliability, data flow, and demo infrastructure.
Product Lead: problem framing, user story, and demo script.
Ethics Lead: risk review, mitigation messaging, and responsible use guardrails.

Architecture At a Glance

flowchart TD
  A[Student Uploads File] --> B[POST /upload]
  B --> C[Size + Path Safety Checks]
  C --> D[Text Extraction textract or UTF-8 fallback]
  D --> E[Artifact JSON in /tmp/backend-placeholder/uploads]

  E --> F[POST /extract]
  F --> G[Artifact Validation + Idempotency Check]
  G --> H[LangGraph Pipeline process_document]

  H --> I[inject_schema_options]
  I --> J[extract_graph]
  I --> K[canvas_node optional]
  J --> L[validate_graph]
  K --> L
  L -->|retry| M[retry_extract_graph]
  M --> L
  L -->|done| N[openalex_gate]
  N -->|query_openalex=true| O[enrich_with_openalex]
  N -->|query_openalex=false| P[link_canvas_assignments]
  O --> P
  P --> Q[mkgraph]

  Q --> R[DuckDB Persist add_data_to_db]
  R --> S[Mark Artifact Processed]
  S --> T[ExtractResponse]

  U[Query Endpoints] --> V[query_service wrappers]
  V --> W[database.py subgraph + relationship queries]
  W --> X[GraphSubgraphResponse or RelationshipsQueryResponse]

Tech Stack

Python 3.13+
FastAPI + Pydantic
LangGraph
DuckDB
Nix (dev/build)

Run Locally

1) Enter the dev shell

nix develop

2) Set environment variables

export OPENAI_API_KEY="your-openai-key"
export OPENALEX_API_KEY="your-openalex-key"   # optional
export CANVAS_API_KEY="your-canvas-key"       # optional

3) Start the API

nix run .#deploy-backend

Server default: http://localhost:8000

Core API Conventions

Base URL: http://localhost:8000
Content-Type: application/json for JSON endpoints
Upload endpoint uses multipart/form-data
Service-layer errors are normalized to:

{
  "detail": {
    "error_code": "string",
    "message": "human-readable message"
  }
}

Validation errors from Pydantic are wrapped as HTTP 422 with error_code: "validation_error".

Endpoint Index

Method	Path	Purpose
GET	`/health`	Liveness check
POST	`/upload`	Ingest file, extract text, create artifact
POST	`/extract`	Run KG extraction pipeline from artifact
GET	`/query/relationships`	List relationships with confidence filters
GET	`/query/subgraph/source/{source_id}`	Subgraph by one source
POST	`/query/subgraph/sources`	Subgraph by many sources
GET	`/query/subgraph/entity/{entity_id_or_name}`	Subgraph by entity ID or name
GET	`/query/relationships/type/{relationship_type}`	Subgraph by relationship type
POST	`/query/subgraph/entity-types`	Subgraph by entity type list

Detailed API Reference

GET `/health`

Simple readiness endpoint.

Response 200:

{
  "health": "ok"
}

POST `/upload`

Uploads a file, enforces size limits, extracts text, and writes an artifact JSON used later by /extract.

Request:

multipart/form-data
field: file (UploadFile)

Size constraints:

compressed upload <= 20 MiB
decompressed payload <= 100 MiB

Success 200 (UploadResponse):

{
  "source_id": "string",
  "source_name": "lecture-2.pdf",
  "artifact_path": "/tmp/backend-placeholder/uploads/artifact-<id>.json",
  "artifact_sha256": "hex_sha256",
  "metadata_status": "ok",
  "metadata_error_code": null,
  "metadata_error_message": null,
  "compressed_bytes": 12345,
  "decompressed_bytes": 67890
}

Common service errors:

413 compressed_size_exceeded
413 decompressed_size_exceeded
422 empty_extracted_text

Note on metadata_status:

textract issues do not always fail the request; metadata fields capture extraction warning/error state.

POST `/extract`

Runs extraction pipeline from an artifact created by /upload, persists the graph to DuckDB, and returns counts.

Request body (ExtractRequest):

{
  "artifact_path": "/tmp/backend-placeholder/uploads/artifact-<id>.json",
  "query_canvas": false,
  "query_openalex": false
}

Response 200 (ExtractResponse):

{
  "artifact_path": "/tmp/backend-placeholder/uploads/artifact-<id>.json",
  "artifact_sha256": "hex_sha256",
  "already_processed": false,
  "added_entities": 42,
  "added_relationships": 88,
  "sources": [
    {
      "source_id": "string",
      "source_name": "lecture-2.pdf"
    }
  ]
}

Idempotency behavior:

if the same (artifact_path, artifact_sha256) was processed before, returns already_processed: true and added_*: 0.

Common service errors:

400 artifact_path_not_allowed
400 artifact_path_invalid_suffix
404 artifact_not_found
400 artifact_invalid

GET `/query/relationships`

Returns relationship records with optional confidence bounds.

Query params:

limit (int, default 100, max 1000)
offset (int, default 0, min 0)
min_confidence (float 0..1, optional)
max_confidence (float 0..1, optional)

Validation rule:

if both are set, min_confidence <= max_confidence

Response 200 (RelationshipsQueryResponse):

{
  "items": [
    {
      "subject_entity_id": "ent_a",
      "object_entity_id": "ent_b",
      "relationship_type": "string",
      "confidence": 0.93,
      "data": {}
    }
  ],
  "total": 1,
  "limit": 100,
  "offset": 0
}

Validation failure:

422 validation_error

GET `/query/subgraph/source/{source_id}`

Returns a subgraph filtered by one source.

Path param:

source_id (string)

Query params:

limit (1..1000, default 100)
offset (>=0, default 0)

Response 200 (GraphSubgraphResponse):

{
  "entities": [],
  "relationships": [],
  "sources": [],
  "total_entities": 0,
  "total_relationships": 0,
  "total_sources": 0,
  "limit": 100,
  "offset": 0
}

Validation failure:

422 validation_error

POST `/query/subgraph/sources`

Returns a subgraph filtered by multiple source IDs.

Request body (SourcesSubgraphRequest):

{
  "source_ids": ["source_1", "source_2"]
}

Constraints:

list length: 1..100
blank IDs are stripped; all-empty payload is rejected

Response:

GraphSubgraphResponse (same structure as above)

Validation failure:

422 validation_error

GET `/query/subgraph/entity/{entity_id_or_name}`

Returns a subgraph around a matching entity.

Path param:

entity_id_or_name (string; lookup by exact ID first, then case-insensitive name)

Query params:

limit (1..1000, default 100)
offset (>=0, default 0)

Response:

GraphSubgraphResponse

Validation failure:

422 validation_error

GET `/query/relationships/type/{relationship_type}`

Returns a subgraph filtered by relationship type.

Path param:

relationship_type (must be valid ontology value)

Query params:

limit (1..1000, default 100)
offset (>=0, default 0)

Response:

GraphSubgraphResponse

Validation failure:

422 validation_error

POST `/query/subgraph/entity-types`

Returns a subgraph where both relationship endpoints are within selected entity types.

Request body (EntityTypesSubgraphRequest):

{
  "entity_types": ["Concept", "Method"]
}

Allowed values:

Concept, Theory, Person, Method, Assignment

Constraints:

list length: 1..20
invalid values rejected by validation

Response:

GraphSubgraphResponse

Validation failure:

422 validation_error

Data Model Notes (For Integrators)

DuckDB file location: /tmp/knowledge.duckdb
Upload/artifact root: /tmp/backend-placeholder/uploads
Artifact processing log table: PROCESSED_ARTIFACTS
Core graph tables: ENTITIES, RELATIONSHIPS, SOURCES
Confidence sorting in relationship queries: descending (NULLS LAST)

Responsible AI + Risk Mitigations

Input safety: artifact path sandboxing + required .json suffix to block arbitrary filesystem reads.
Resource protection: compressed/decompressed upload limits.
Robust extraction: timeout-guarded textract adapter with structured metadata errors.
Validation discipline: strong Pydantic constraints and 422 normalization.
Idempotency: repeated extract calls do not duplicate processing.

What Judges Can Verify Quickly

Technical Impressiveness (50%): full upload->extract->query pipeline with persistence and typed contracts.
Impact (20%): directly helps students turn raw notes/docs into structured study context.
Product Thinking (10%): clear problem -> API capability -> demo narrative.
Use of AI to Build (10%): AI is central in extraction/enrichment and can be explained live.
Ethics / Responsible Use (10%): explicit constraints, safe path handling, and error transparency.

Useful Commands

# Run backend
nix run .#deploy-backend

# Lint
flake8 lib/

# Build Docker image
nix build .#docker

Project Structure

lib/backend_placeholder/
  api.py                    # FastAPI routes + HTTP error translation
  models.py                 # Pydantic request/response contracts
  agent.py                  # LangGraph pipeline assembly + process_document
  database.py               # DuckDB schema, writes, and query functions
  services/
    upload_service.py       # Upload ingestion + artifact creation
    extract_service.py      # Artifact-to-graph extraction + persistence
    query_service.py        # API-facing query wrappers
    path_safety.py          # Upload root sandbox + path validation
    textract_adapter.py     # Timeout-guarded text extraction adapter

Current Scope Limitations

Automated test suite is not configured in this repository yet.
API auth/rate limiting is not implemented (hackathon prototype scope).
Upload flow currently assumes extraction-compatible documents and local tmp storage.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.github/workflows		.github/workflows
lib/backend_placeholder		lib/backend_placeholder
nix		nix
.flake8		.flake8
.gitignore		.gitignore
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ParkerBackend API

Why This Project Fits the Hackathon

Demo Story (5 Minutes)

Team Roles (Suggested Ownership)

Architecture At a Glance

Tech Stack

Run Locally

1) Enter the dev shell

2) Set environment variables

3) Start the API

Core API Conventions

Endpoint Index

Detailed API Reference

GET `/health`

POST `/upload`

POST `/extract`

GET `/query/relationships`

GET `/query/subgraph/source/{source_id}`

POST `/query/subgraph/sources`

GET `/query/subgraph/entity/{entity_id_or_name}`

GET `/query/relationships/type/{relationship_type}`

POST `/query/subgraph/entity-types`

Data Model Notes (For Integrators)

Responsible AI + Risk Mitigations

What Judges Can Verify Quickly

Useful Commands

Project Structure

Current Scope Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ParkerBackend API

Why This Project Fits the Hackathon

Demo Story (5 Minutes)

Team Roles (Suggested Ownership)

Architecture At a Glance

Tech Stack

Run Locally

1) Enter the dev shell

2) Set environment variables

3) Start the API

Core API Conventions

Endpoint Index

Detailed API Reference

GET /health

POST /upload

POST /extract

GET /query/relationships

GET /query/subgraph/source/{source_id}

POST /query/subgraph/sources

GET /query/subgraph/entity/{entity_id_or_name}

GET /query/relationships/type/{relationship_type}

POST /query/subgraph/entity-types

Data Model Notes (For Integrators)

Responsible AI + Risk Mitigations

What Judges Can Verify Quickly

Useful Commands

Project Structure

Current Scope Limitations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

GET `/health`

POST `/upload`

POST `/extract`

GET `/query/relationships`

GET `/query/subgraph/source/{source_id}`

POST `/query/subgraph/sources`

GET `/query/subgraph/entity/{entity_id_or_name}`

GET `/query/relationships/type/{relationship_type}`

POST `/query/subgraph/entity-types`

Packages