SciKG_Extract is a comprehensive agentic framework designed to extract structured scientific knowledge from research papers based on a semantic schema representation of the target domain. The framework leverages Large Language Models (LLMs) to interpret and extract relevant information based on a schema and incorporates various tools and techniques like JSON validation, normalization using external data sources etc. to ensure accuracy and semantic consistency of the extracted data.
scikg_extract/
├── agents/ # LangGraph agentic workflow components
│ ├── extraction_agent.py # Main extraction agent orchestrator
│ └── states.py # State definitions for agent workflow
├── config/ # Configuration management
│ ├── envConfig.py # Environment variables and settings
│ ├── normalizationConfig.py # Chemical name normalization configuration
│ └── processConfig.py # Process-specific configurations
├── models/ # LLM model adapters and interfaces
│ ├── model_adapter.py # Base adapter interface for LLM models
│ ├── openai_adapter.py # OpenAI API adapter (GPT-4, GPT-3.5)
├── prompts/ # Prompt templates for LLM interactions
│ ├── agents/ # Agent-specific prompts
│ └── tools/ # Tool-specific prompts
│ └── structure_knowledge_extraction.py # Prompts for structured extraction
├── services/ # External service integrations
│ └── pubchem_cid_mapping.py # PubChem API integration and CID mapping
├── tools/ # Tools for agent workflow
│ ├── json_cleaner.py # Clean and normalize JSON data
│ ├── json_validator.py # Validate JSON against schemas
│ ├── pubchem_normalization.py # Normalize chemical names with PubChem
│ └── structured_knowledge_extraction.py # Main structured extraction tool
└── utils/ # Utility functions and helpers
├── dict_utils.py # Dictionary utilities
├── evaluation_utils.py # Evaluation metrics and comparison tools
├── file_utils.py # File I/O operations
├── json_utils.py # JSON-specific utilities
├── log_handler.py # Logging configuration and management
├── rest_client.py # REST API client utilities
└── string_utils.py # String manipulation utilities
For questions, suggestions, or collaboration opportunities, please reach out to the project maintainers.
- Issues or Bug Reports: Please use the GitHub Issues section of this repository to report bugs or request features.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
