Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
d6d64bf
Add script to scrape and load profiles
AnnieZQH Nov 20, 2025
8525b85
Add workflow to check profiles
AnnieZQH Nov 20, 2025
4e7aa91
Merge pull request #18 from eScienceLab/profile-pr-script
OliverWoolland Dec 5, 2025
811e08c
Update Profile Crate link in README
elichad Dec 5, 2025
2515bd1
Fix links in description of profiles perspective
AnnieZQH Dec 11, 2025
c0e2124
Merge pull request #23 from eScienceLab/fix-links
AnnieZQH Dec 11, 2025
96fcbba
Add url string and content type validation
AnnieZQH Dec 12, 2025
6a67c65
Fix base IRI handling
AnnieZQH Dec 12, 2025
571253c
Add entity type validation
AnnieZQH Dec 12, 2025
02bf4e0
Rename script
AnnieZQH Dec 12, 2025
0ba29ff
Add profile data upload code
AnnieZQH Dec 12, 2025
b6ec410
Add workflow to upload data on merge to main
AnnieZQH Dec 12, 2025
af826a2
Refactor code
AnnieZQH Dec 12, 2025
8d9c87e
Merge pull request #25 from eScienceLab/24-add-to-profile-submission-…
AnnieZQH Dec 15, 2025
8dfda32
Replace cd in GHA with working-directory keyword
AnnieZQH Dec 15, 2025
478bebd
Merge pull request #26 from eScienceLab/22-upload-new-profile-data-to…
AnnieZQH Dec 15, 2025
5101bc5
Replace PUT with POST to append data
AnnieZQH Dec 15, 2025
15d807f
Fix upload job name
AnnieZQH Dec 15, 2025
c316b67
Merge pull request #27 from eScienceLab/fix-data-overwrite
AnnieZQH Dec 16, 2025
edff092
Merge pull request #21 from eScienceLab/profile-crate-link
OliverWoolland Jan 15, 2026
75dbe03
Update base repo and dependencies
AnnieZQH Jan 20, 2026
a6de594
Merge pull request #30 from eScienceLab/28-fix-dependabot-vulnerabili…
AnnieZQH Jan 20, 2026
40caa24
Fix date of publication facet
AnnieZQH Jan 20, 2026
c40ba79
Merge pull request #31 from eScienceLab/fix-date-of-publication-facet
AnnieZQH Jan 20, 2026
748a0dc
Replace placeholder base URI
AnnieZQH Jan 20, 2026
87316a6
Update check profile pr workflow
AnnieZQH Jan 20, 2026
ba05c58
Merge pull request #32 from eScienceLab/6-decide-a-base-uri
AnnieZQH Jan 20, 2026
c693a42
Fix git diff command in the PR check workflow (#33)
AnnieZQH Jan 22, 2026
fe701ec
Add RO-Crate spec profile
AnnieZQH Jan 22, 2026
e285451
Merge pull request #34 from eScienceLab/add-rocrate-spec-profile
AnnieZQH Jan 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .docker/sampo-ui/deploy.dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM node:16.13.0-alpine
FROM node:22.21.1-alpine
ARG API_URL
ARG VERSION="3.0.0"
ARG VERSION="3.0.0-1"

# Based on https://nodejs.org/en/docs/guides/nodejs-docker-webapp/

Expand All @@ -11,7 +11,7 @@ WORKDIR /usr/src/app
# mv commands: install app dependencies, Babel 7 presets and plugins, and bundle app source
# Remove redundant files
RUN <<EOF
wget https://github.com/SemanticComputing/sampo-ui/archive/refs/tags/v$VERSION.zip
wget https://github.com/UoMResearchIT/sampo-ui/archive/refs/tags/v$VERSION.zip
unzip v$VERSION.zip
mv ./sampo-ui-$VERSION/package*.json ./
mv ./sampo-ui-$VERSION/webpack*.js ./
Expand Down
6 changes: 3 additions & 3 deletions .docker/sampo-ui/dev.dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
FROM node:16.13.0-alpine
FROM node:22.21.1-alpine

# Specify the Sampo UI version
ARG VERSION="3.0.0"
ARG VERSION="3.0.0-1"

RUN apk update && apk add bash

Expand All @@ -12,7 +12,7 @@ WORKDIR /usr/src/app
# mv commands: install app dependencies, Babel 7 presets and plugins, and bundle app source
# Remove redundant files
RUN <<EOF
wget https://github.com/SemanticComputing/sampo-ui/archive/refs/tags/v$VERSION.zip
wget https://github.com/UoMResearchIT/sampo-ui/archive/refs/tags/v$VERSION.zip
unzip v$VERSION.zip
mv ./sampo-ui-$VERSION/package*.json ./
mv ./sampo-ui-$VERSION/webpack*.js ./
Expand Down
30 changes: 30 additions & 0 deletions .github/workflows/check-profile-pr.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Check added profile link (if applicable)

on:
pull_request:
types: [opened, synchronize, reopened]
workflow_dispatch:
jobs:
check:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Check if profile URLs changed
id: changes
run: |
git fetch origin ${{ github.event.pull_request.base.ref }}
if git diff --name-only origin/${{ github.event.pull_request.base.ref }}...HEAD | grep -qx "scripts/profile_urls.txt"; then
echo "changed=true" >> "$GITHUB_OUTPUT"
else
echo "changed=false" >> "$GITHUB_OUTPUT"
fi
- name: Install dependencies
if: steps.changes.outputs.changed == 'true'
run: python3 -m pip install -r requirements.workflow.txt
- name: Run script
if: steps.changes.outputs.changed == 'true'
run: python upload_profiles.py --dry-run
working-directory: ./scripts
24 changes: 24 additions & 0 deletions .github/workflows/upload-profiles.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
name: Upload to Jena Fuseki

on:
push:
branches:
- main
paths:
- 'scripts/profile_urls.txt'
workflow_dispatch:
jobs:
upload:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Install dependencies
run: python3 -m pip install -r requirements.workflow.txt
- name: Run script
run: python upload_profiles.py
working-directory: ./scripts
env:
FUSEKI_UPLOAD_ENDPOINT: ${{ vars.ENV_FUSEKI_UPLOAD_ENDPOINT }}
FUSEKI_PASSWORD: ${{ secrets.ENV_FUSEKI_PASSWORD }}

1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# Environments
.env
venv/
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ RO-Crates are a method for packaging research data with their metadata. RO-Crate
The profile portal is accepting contributions!

> [!IMPORTANT]
> To be accepted, the profile must be a [Profile Crate](url) accessible on the public internet.
> To be accepted, the profile must be a [Profile Crate](https://www.researchobject.org/ro-crate/specification/1.2/profiles.html#profile-crate) accessible on the public internet.

To add your profile (or a profile you feel is missing):
- Open `scripts/profile_urls.txt` in this repo for editing (click file, then click pencil icon)
Expand Down
1 change: 1 addition & 0 deletions env.template
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Template file for populating .env

FUSEKI_PASSWORD=${ENV_FUSEKI_PASSWORD}
FUSEKI_UPLOAD_ENDPOINT=

# Local http://localhost:3006/api/v1
SAMPO_API_URL=${ENV_SAMPO_API_URL}
4 changes: 4 additions & 0 deletions requirements.workflow.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
rdflib==7.4.0
validators==0.35.0
requests==2.32.5
python-dotenv==1.2.1
24 changes: 24 additions & 0 deletions requirements.workflow.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#
# This file is autogenerated by pip-compile with Python 3.12
# by the following command:
#
# pip-compile --output-file=requirements.workflow.txt requirements.workflow.in
#
certifi==2025.11.12
# via requests
charset-normalizer==3.4.4
# via requests
idna==3.11
# via requests
pyparsing==3.2.5
# via rdflib
python-dotenv==1.2.1
# via -r requirements.workflow.in
rdflib==7.4.0
# via -r requirements.workflow.in
requests==2.32.5
# via -r requirements.workflow.in
urllib3==2.6.2
# via requests
validators==0.35.0
# via -r requirements.workflow.in
17 changes: 17 additions & 0 deletions scripts/gaps.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
@prefix schema: <http://schema.org/> .

<https://w3id.org/workflowhub/workflow-ro-crate/1.0>
a schema:Profile ;
schema:keywords "workflow" .
<https://w3id.org/ro/wfrun/process/0.5>
schema:datePublished "2024-06-19";
schema:keywords "provenance", "process", "informatics" .
<https://w3id.org/ro/wfrun/workflow/0.5>
schema:datePublished "2024-06-19";
schema:keywords "workflow", "provenance", "workflow execution", "process", "informatics" .
<https://w3id.org/ro/wfrun/provenance/0.5>
schema:datePublished "2024-06-19";
schema:keywords "workflow", "provenance", "workflow execution", "process", "informatics" .
<https://w3id.org/5s-crate/0.4>
schema:identifier "https://w3id.org/5s-crate/0.4" ;
schema:keywords "workflow execution", "five safes", "transparency", "sensitive data", "trusted research environment", "secure data environment", "data safe haven" .
5 changes: 5 additions & 0 deletions scripts/profile_urls.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
https://trefx.uk/5s-crate/0.4/ro-crate-metadata.json
https://www.researchobject.org/workflow-run-crate/profiles/0.5/process_run_crate/ro-crate-metadata.json
https://www.researchobject.org/workflow-run-crate/profiles/0.5/workflow_run_crate/ro-crate-metadata.json
https://www.researchobject.org/workflow-run-crate/profiles/0.5/provenance_run_crate/ro-crate-metadata.json
https://www.researchobject.org/ro-crate/specification/1.2/ro-crate-metadata.json
109 changes: 109 additions & 0 deletions scripts/upload_profiles.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
import os
import re
import validators
import requests
import argparse

from datetime import datetime, date
from dotenv import load_dotenv
from hashlib import md5
from rdflib import Graph, URIRef, Literal, XSD, RDF, OWL, Namespace
from urllib.parse import urljoin


SCHEMA = Namespace("http://schema.org/")

def map_sameas_uri(uri):
id = md5(uri.encode('utf-8')).hexdigest()
new_uri = URIRef(f"https://profiles.ro-crate.org/data/profile/{id}")
return new_uri

def main(dry_run):
g = Graph()
profile_class = URIRef("http://www.w3.org/ns/dx/prof/Profile")

with open("profile_urls.txt", "r") as file:
profile_urls = [line.strip() for line in file if line.strip()]

for url in profile_urls:
if not validators.url(url):
raise ValueError(f"{url} is not an URL")

headers = {
"Accept": "application/ld+json, application/json"
}

try:
response = requests.head(url, headers=headers, allow_redirects=True)
except requests.RequestException as e:
raise ValueError(f"Unable to reach {url} (Error: {e})")

# Fallback
if response.status_code >= 400:
response = requests.get(url, headers=headers)

content_type = response.headers.get("Content-Type", "").lower()
if "json" not in content_type:
raise ValueError(f"{url} does not return JSON-LD (Content type: {content_type})")

temp_g = Graph()
base_iri = urljoin(url, '.') if url.endswith("ro-crate-metadata.json") else f"{url.rstrip('/')}/"
temp_g.parse(url, format="json-ld", publicID=base_iri)
if any(temp_g.subjects(RDF.type, profile_class)):
g += temp_g
else:
raise ValueError(f"No profile entity found in {url}")

g.parse("gaps.ttl", format="turtle")

datetime_pattern = re.compile(r"^-?\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(Z|[+-]\d{2}:\d{2})?$")
date_pattern = re.compile(r"^-?\d{4}-\d{2}-\d{2}$")
time_pattern = re.compile(r"^\d{2}:\d{2}:\d{2}(Z|[+-]\d{2}:\d{2})?$")

for s, p, o in g.triples((None, None, None)):
typed_o = None

if datetime_pattern.match(o):
typed_o = Literal(o, datatype=XSD.dateTime)
elif date_pattern.match(o):
typed_o = Literal(o, datatype=XSD.date)
elif time_pattern.match(o):
typed_o = Literal(o, datatype=XSD.time)

# Add type to datetime, date and time data
if typed_o is not None:
g.add((s, p, typed_o))
g.remove((s, p, o))

# Create profile URI for use in the profile portal
if p == RDF.type and o == profile_class:
new_s = map_sameas_uri(s)
g.add((new_s, RDF.type, o))
g.add((new_s, OWL.sameAs, s))

# Add triple <new profile URI> schema:datePublished "yyyy-mm-dd"^^xsd:date
if p == SCHEMA.datePublished and typed_o is not None:
value = typed_o.toPython()
if isinstance(value, datetime):
new_s = map_sameas_uri(s)
g.add((new_s, p, Literal(value.date(), datatype=XSD.date)))
if isinstance(value, date):
new_s = map_sameas_uri(s)
g.add((new_s, p, Literal(value.isoformat(), datatype=XSD.date)))

ttl_data = g.serialize(format="turtle")
if not dry_run:
FUSEKI_ENDPOINT = os.getenv('FUSEKI_UPLOAD_ENDPOINT')
FUSEKI_PASSWORD = os.getenv('FUSEKI_PASSWORD')

response = requests.post(FUSEKI_ENDPOINT, data=ttl_data,
headers={"Content-Type": "text/turtle"},
auth=("admin", FUSEKI_PASSWORD))
response.raise_for_status()

if __name__ == "__main__":
load_dotenv()
parser = argparse.ArgumentParser(description="Parse profile Ro Crate and upload graph to Jena Fuseki instance")
parser.add_argument("--dry-run", action="store_true")
args = parser.parse_args()
main(args.dry_run)
4 changes: 2 additions & 2 deletions src/client/components/App.js
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import React from 'react'
import { ThemeProvider, createTheme } from '@mui/material/styles'
import AdapterMoment from '@mui/lab/AdapterMoment'
import LocalizationProvider from '@mui/lab/LocalizationProvider'
import { AdapterMoment } from '@mui/x-date-pickers/AdapterMoment'
import { LocalizationProvider } from '@mui/x-date-pickers/LocalizationProvider'
import SemanticPortal from '../containers/SemanticPortal'
import portalConfig from '../../configs/portalConfig.json'

Expand Down
2 changes: 1 addition & 1 deletion src/client/translations/rocrate/localeEN.json
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@
"label": "Profiles",
"facetResultsType": "Profiles",
"shortDescription": "Click here to enter the profile portal",
"longDescription": "<p class=\"MuiTypography-root MuiTypography-body1 MuiTypography-paragraph\"><a href='https://www.researchobject.org/ro-crate/' target='_blank' rel='noopener noreferrer'>RO-Crates</a> are a method for packaging research data with their metadata. RO-Crate Profiles define the conventions, types and properties to do this for specific communities and domains.<br><br>Welcome to the RO-Crate Profile Portal. [Visit the webpage] to learn about RO-Crates, or [read this guide] to learn about profiles. Click below to enter the RO-Crate Profile Portal, or read on [how to submit your profile]</p>",
"longDescription": "<p class=\"MuiTypography-root MuiTypography-body1 MuiTypography-paragraph\"><a href='https://www.researchobject.org/ro-crate/' target='_blank' rel='noopener noreferrer'>RO-Crates</a> are a method for packaging research data with their metadata. RO-Crate Profiles define the conventions, types and properties to do this for specific communities and domains.<br><br>Welcome to the RO-Crate Profile Portal. <a href='https://www.researchobject.org/ro-crate/' target='_blank' rel='noopener noreferrer'>Visit the webpage</a> to learn about RO-Crates, or <a href='https://www.researchobject.org/ro-crate/profiles' target='_blank' rel='noopener noreferrer'>read this guide</a> to learn about profiles. Click below to enter the RO-Crate Profile Portal, or read on <a href='https://github.com/eScienceLab/sampo-dashboard?tab=readme-ov-file#submitting-a-profile-to-the-portal' target='_blank' rel='noopener noreferrer'>how to submit your profile</a></p>",
"instancePage": {
"label": "Profile",
"description": "TODO: Instance page description"
Expand Down
4 changes: 2 additions & 2 deletions src/configs/rocrate/search_perspectives/profiles.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"prefixesFile": "SparqlQueriesPrefixes.js"
},
"sparqlQueriesFile": "SparqlQueriesProfiles.js",
"baseURI": "http://example.org/data",
"baseURI": "https://profiles.ro-crate.org/data",
"URITemplate": "<BASE_URI>/profile/<LOCAL_ID>",
"facetClass": "prof:Profile",
"defaultConstraint": "FILTER EXISTS { <SUBJECT> owl:sameAs [] }",
Expand Down Expand Up @@ -139,7 +139,7 @@
"publicationDate": {
"containerClass": "five",
"filterType": "dateNoTimespanFilter",
"predicate": "owl:sameAs/ns11:datePublished",
"predicate": "ns11:datePublished",
"sortByPredicate": "owl:sameAs/ns11:datePublished",
"max": "2118-01-01",
"min": "2018-01-01"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,6 @@ export const prefixes = `
PREFIX ns2: <https://w3id.org/ro/terms/test#>
PREFIX rel: <https://www.w3.org/ns/iana/link-relations/relation#>
PREFIX wfrun: <https://w3id.org/ro/terms/workflow-run#>
PREFIX rocrate: <http://example.org/data/> # TODO: Change URI
PREFIX rocrate: <https://profiles.ro-crate.org/data/>
PREFIX prof: <http://www.w3.org/ns/dx/prof/>
`