Pinned Loading
-
GeneLab_benchmark
GeneLab_benchmark PublicSpaceBio-Bench: mission-held-out spaceflight transcriptomics benchmark for ML and foundation-model generalization on NASA OSDR data
Python
-
SpaceOmicsBench
SpaceOmicsBench PublicA multi-omics AI benchmark for spaceflight biomedical data — 21 ML tasks across 9 modalities + 100-question LLM evaluation (Inspiration4, NASA Twins, JAXA)
Python
-
grounding-atlas
grounding-atlas PublicMeasurement-first map of biological content-grounding in language models: does the model ground a specialist's output by content or by name, and where should each capability live (train / retrieve …
Python
-
LabCraft-Eval
LabCraft-Eval PublicLabCraft-Eval: a stochastic Inspect AI environment for evaluating AI agents on benign molecular-microbiology protocols, with deterministic four-axis trajectory scoring.
Python
-
verify-or-trust
verify-or-trust PublicA verifiable-reward agentic benchmark: does an LLM correctly allocate verification when orchestrating a fallible biology foundation model?
Python
-
causalatlas
causalatlas PublicMeasurement study of causal grounding in LLM-orchestrated single-cell perturbation foundation models.
Python
If the problem persists, check the GitHub status page or contact support.


