ParkerBackend is a FastAPI backend that turns uploaded learning materials into a queryable knowledge graph. It is built for student-facing workflows: upload course content, run AI extraction, and query relationships/subgraphs for study assistance.
This README is intentionally API-first and hackathon-ready.
- Student scenario: convert class documents into structured knowledge for studying.
- Meaningful AI: LLM extraction + optional enrichment (OpenAlex/Canvas) directly powers graph creation.
- Demo-ready: clean upload -> extract -> query flow with deterministic API contracts.
- Suggested guiding category: Studying.
- Upload one class document (
POST /upload). - Extract graph (
POST /extract) and show added entity/relationship counts. - Query confidence-filtered relationships (
GET /query/relationships). - Query a subgraph by source or entity (
/query/subgraph/...) and show how it supports review. - Close with responsible AI notes (limits, validation, error paths).
- Tech Lead: API reliability, data flow, and demo infrastructure.
- Product Lead: problem framing, user story, and demo script.
- Ethics Lead: risk review, mitigation messaging, and responsible use guardrails.
flowchart TD
A[Student Uploads File] --> B[POST /upload]
B --> C[Size + Path Safety Checks]
C --> D[Text Extraction textract or UTF-8 fallback]
D --> E[Artifact JSON in /tmp/backend-placeholder/uploads]
E --> F[POST /extract]
F --> G[Artifact Validation + Idempotency Check]
G --> H[LangGraph Pipeline process_document]
H --> I[inject_schema_options]
I --> J[extract_graph]
I --> K[canvas_node optional]
J --> L[validate_graph]
K --> L
L -->|retry| M[retry_extract_graph]
M --> L
L -->|done| N[openalex_gate]
N -->|query_openalex=true| O[enrich_with_openalex]
N -->|query_openalex=false| P[link_canvas_assignments]
O --> P
P --> Q[mkgraph]
Q --> R[DuckDB Persist add_data_to_db]
R --> S[Mark Artifact Processed]
S --> T[ExtractResponse]
U[Query Endpoints] --> V[query_service wrappers]
V --> W[database.py subgraph + relationship queries]
W --> X[GraphSubgraphResponse or RelationshipsQueryResponse]
- Python 3.13+
- FastAPI + Pydantic
- LangGraph
- DuckDB
- Nix (dev/build)
nix developexport OPENAI_API_KEY="your-openai-key"
export OPENALEX_API_KEY="your-openalex-key" # optional
export CANVAS_API_KEY="your-canvas-key" # optionalnix run .#deploy-backendServer default: http://localhost:8000
- Base URL:
http://localhost:8000 - Content-Type:
application/jsonfor JSON endpoints - Upload endpoint uses
multipart/form-data - Service-layer errors are normalized to:
{
"detail": {
"error_code": "string",
"message": "human-readable message"
}
}- Validation errors from Pydantic are wrapped as HTTP 422 with
error_code: "validation_error".
| Method | Path | Purpose |
|---|---|---|
| GET | /health |
Liveness check |
| POST | /upload |
Ingest file, extract text, create artifact |
| POST | /extract |
Run KG extraction pipeline from artifact |
| GET | /query/relationships |
List relationships with confidence filters |
| GET | /query/subgraph/source/{source_id} |
Subgraph by one source |
| POST | /query/subgraph/sources |
Subgraph by many sources |
| GET | /query/subgraph/entity/{entity_id_or_name} |
Subgraph by entity ID or name |
| GET | /query/relationships/type/{relationship_type} |
Subgraph by relationship type |
| POST | /query/subgraph/entity-types |
Subgraph by entity type list |
Simple readiness endpoint.
Response 200:
{
"health": "ok"
}Uploads a file, enforces size limits, extracts text, and writes an artifact JSON used later by /extract.
Request:
multipart/form-data- field:
file(UploadFile)
Size constraints:
- compressed upload <=
20 MiB - decompressed payload <=
100 MiB
Success 200 (UploadResponse):
{
"source_id": "string",
"source_name": "lecture-2.pdf",
"artifact_path": "/tmp/backend-placeholder/uploads/artifact-<id>.json",
"artifact_sha256": "hex_sha256",
"metadata_status": "ok",
"metadata_error_code": null,
"metadata_error_message": null,
"compressed_bytes": 12345,
"decompressed_bytes": 67890
}Common service errors:
413 compressed_size_exceeded413 decompressed_size_exceeded422 empty_extracted_text
Note on metadata_status:
- textract issues do not always fail the request; metadata fields capture extraction warning/error state.
Runs extraction pipeline from an artifact created by /upload, persists the graph to DuckDB, and returns counts.
Request body (ExtractRequest):
{
"artifact_path": "/tmp/backend-placeholder/uploads/artifact-<id>.json",
"query_canvas": false,
"query_openalex": false
}Response 200 (ExtractResponse):
{
"artifact_path": "/tmp/backend-placeholder/uploads/artifact-<id>.json",
"artifact_sha256": "hex_sha256",
"already_processed": false,
"added_entities": 42,
"added_relationships": 88,
"sources": [
{
"source_id": "string",
"source_name": "lecture-2.pdf"
}
]
}Idempotency behavior:
- if the same
(artifact_path, artifact_sha256)was processed before, returnsalready_processed: trueandadded_*: 0.
Common service errors:
400 artifact_path_not_allowed400 artifact_path_invalid_suffix404 artifact_not_found400 artifact_invalid
Returns relationship records with optional confidence bounds.
Query params:
limit(int, default100, max1000)offset(int, default0, min0)min_confidence(float 0..1, optional)max_confidence(float 0..1, optional)
Validation rule:
- if both are set,
min_confidence <= max_confidence
Response 200 (RelationshipsQueryResponse):
{
"items": [
{
"subject_entity_id": "ent_a",
"object_entity_id": "ent_b",
"relationship_type": "string",
"confidence": 0.93,
"data": {}
}
],
"total": 1,
"limit": 100,
"offset": 0
}Validation failure:
422 validation_error
Returns a subgraph filtered by one source.
Path param:
source_id(string)
Query params:
limit(1..1000, default 100)offset(>=0, default 0)
Response 200 (GraphSubgraphResponse):
{
"entities": [],
"relationships": [],
"sources": [],
"total_entities": 0,
"total_relationships": 0,
"total_sources": 0,
"limit": 100,
"offset": 0
}Validation failure:
422 validation_error
Returns a subgraph filtered by multiple source IDs.
Request body (SourcesSubgraphRequest):
{
"source_ids": ["source_1", "source_2"]
}Constraints:
- list length:
1..100 - blank IDs are stripped; all-empty payload is rejected
Response:
GraphSubgraphResponse(same structure as above)
Validation failure:
422 validation_error
Returns a subgraph around a matching entity.
Path param:
entity_id_or_name(string; lookup by exact ID first, then case-insensitive name)
Query params:
limit(1..1000, default 100)offset(>=0, default 0)
Response:
GraphSubgraphResponse
Validation failure:
422 validation_error
Returns a subgraph filtered by relationship type.
Path param:
relationship_type(must be valid ontology value)
Query params:
limit(1..1000, default 100)offset(>=0, default 0)
Response:
GraphSubgraphResponse
Validation failure:
422 validation_error
Returns a subgraph where both relationship endpoints are within selected entity types.
Request body (EntityTypesSubgraphRequest):
{
"entity_types": ["Concept", "Method"]
}Allowed values:
Concept,Theory,Person,Method,Assignment
Constraints:
- list length:
1..20 - invalid values rejected by validation
Response:
GraphSubgraphResponse
Validation failure:
422 validation_error
- DuckDB file location:
/tmp/knowledge.duckdb - Upload/artifact root:
/tmp/backend-placeholder/uploads - Artifact processing log table:
PROCESSED_ARTIFACTS - Core graph tables:
ENTITIES,RELATIONSHIPS,SOURCES - Confidence sorting in relationship queries: descending (
NULLS LAST)
- Input safety: artifact path sandboxing + required
.jsonsuffix to block arbitrary filesystem reads. - Resource protection: compressed/decompressed upload limits.
- Robust extraction: timeout-guarded textract adapter with structured metadata errors.
- Validation discipline: strong Pydantic constraints and 422 normalization.
- Idempotency: repeated extract calls do not duplicate processing.
- Technical Impressiveness (50%): full upload->extract->query pipeline with persistence and typed contracts.
- Impact (20%): directly helps students turn raw notes/docs into structured study context.
- Product Thinking (10%): clear problem -> API capability -> demo narrative.
- Use of AI to Build (10%): AI is central in extraction/enrichment and can be explained live.
- Ethics / Responsible Use (10%): explicit constraints, safe path handling, and error transparency.
# Run backend
nix run .#deploy-backend
# Lint
flake8 lib/
# Build Docker image
nix build .#dockerlib/backend_placeholder/
api.py # FastAPI routes + HTTP error translation
models.py # Pydantic request/response contracts
agent.py # LangGraph pipeline assembly + process_document
database.py # DuckDB schema, writes, and query functions
services/
upload_service.py # Upload ingestion + artifact creation
extract_service.py # Artifact-to-graph extraction + persistence
query_service.py # API-facing query wrappers
path_safety.py # Upload root sandbox + path validation
textract_adapter.py # Timeout-guarded text extraction adapter
- Automated test suite is not configured in this repository yet.
- API auth/rate limiting is not implemented (hackathon prototype scope).
- Upload flow currently assumes extraction-compatible documents and local tmp storage.