Skip to content

sciknoworg/scikgextract

Repository files navigation

SciKG-Extract Logo

License: Apache-2.0 pre-commit security: bandit

SciKG_Extract: An Agentic Workflow for Structured Scientific Knowledge Extraction

📋 Overview

SciKG_Extract is a comprehensive agentic framework designed to extract structured scientific knowledge from research papers based on a semantic schema representation of the target domain. The framework leverages Large Language Models (LLMs) to interpret and extract relevant information based on a schema and incorporates various tools and techniques like JSON validation, normalization using external data sources etc. to ensure accuracy and semantic consistency of the extracted data.

📁 SciKG_Extract Structure

scikg_extract/
├── agents/                                     # LangGraph agentic workflow components
│   ├── extraction_agent.py                     # Main extraction agent orchestrator
│   └── states.py                               # State definitions for agent workflow
├── config/                                     # Configuration management
│   ├── envConfig.py                            # Environment variables and settings
│   ├── normalizationConfig.py                  # Chemical name normalization configuration
│   └── processConfig.py                        # Process-specific configurations
├── models/                                     # LLM model adapters and interfaces
│   ├── model_adapter.py                        # Base adapter interface for LLM models
│   ├── openai_adapter.py                       # OpenAI API adapter (GPT-4, GPT-3.5)
├── prompts/                                    # Prompt templates for LLM interactions
│   ├── agents/                                 # Agent-specific prompts
│   └── tools/                                  # Tool-specific prompts
│       └── structure_knowledge_extraction.py   # Prompts for structured extraction
├── services/                                   # External service integrations
│   └── pubchem_cid_mapping.py                  # PubChem API integration and CID mapping
├── tools/                                      # Tools for agent workflow
│   ├── json_cleaner.py                         # Clean and normalize JSON data
│   ├── json_validator.py                       # Validate JSON against schemas
│   ├── pubchem_normalization.py                # Normalize chemical names with PubChem
│   └── structured_knowledge_extraction.py      # Main structured extraction tool
└── utils/                                      # Utility functions and helpers
    ├── dict_utils.py                           # Dictionary utilities
    ├── evaluation_utils.py                     # Evaluation metrics and comparison tools
    ├── file_utils.py                           # File I/O operations
    ├── json_utils.py                           # JSON-specific utilities
    ├── log_handler.py                          # Logging configuration and management
    ├── rest_client.py                          # REST API client utilities
    └── string_utils.py                         # String manipulation utilities

👥 Contact and Collaboration

For questions, suggestions, or collaboration opportunities, please reach out to the project maintainers.

  • Issues or Bug Reports: Please use the GitHub Issues section of this repository to report bugs or request features.

📃 License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •