Skip to content

Commit 8200e02

Browse files
authored
Merge pull request #4 from OpenMS/copilot/analyze-improve-tools
Merge redundant overlapping tools: isotope pattern trio → analyzer, MGF/mzML pair → single converter
2 parents f9a8bc7 + 6233d20 commit 8200e02

32 files changed

Lines changed: 1550 additions & 1510 deletions

.github/workflows/validate.yml

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,12 @@ jobs:
2020
name: Detect changed tool directories
2121
run: |
2222
# Note: github.base_ref is only available on pull_request events
23-
# Find all tool directories that changed in this PR
23+
# Find all tool directories that changed in this PR, keeping only
24+
# those that still exist (deleted tool dirs must not be linted/tested).
2425
CHANGED=$(git diff --name-only origin/${{ github.base_ref }}...HEAD -- 'tools/' \
2526
| grep -oP 'tools/[^/]+/[^/]+/[^/]+/' \
2627
| sort -u \
28+
| while read -r dir; do [ -d "$dir" ] && echo "$dir"; done \
2729
| jq -R -s -c 'split("\n") | map(select(length > 0))')
2830
2931
if [ "$CHANGED" = "[]" ] || [ -z "$CHANGED" ]; then
@@ -60,9 +62,11 @@ jobs:
6062
run: |
6163
DIRS='${{ needs.detect-changes.outputs.matrix }}'
6264
echo "$DIRS" | jq -r '.[]' | while read -r dir; do
63-
echo "::group::ruff $dir"
64-
/tmp/validate_venv/bin/python -m ruff check "$dir"
65-
echo "::endgroup::"
65+
if [ -d "$dir" ]; then
66+
echo "::group::ruff $dir"
67+
/tmp/validate_venv/bin/python -m ruff check "$dir"
68+
echo "::endgroup::"
69+
fi
6670
done
6771
6872
- name: Test changed tools

README.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Agentomics
22

3-
A growing collection of **123 standalone CLI tools** built with [pyopenms](https://pyopenms.readthedocs.io/) for proteomics and metabolomics workflows. Every tool in this repository fills a gap not covered by existing OpenMS TOPP tools — small, focused utilities that researchers need daily but typically write as throwaway scripts.
3+
A growing collection of **118 standalone CLI tools** built with [pyopenms](https://pyopenms.readthedocs.io/) for proteomics and metabolomics workflows. Every tool in this repository fills a gap not covered by existing OpenMS TOPP tools — small, focused utilities that researchers need daily but typically write as throwaway scripts.
44

55
## Why This Exists
66

@@ -155,12 +155,11 @@ Both `ruff` and `pytest` must pass with zero errors.
155155
| [`fasta_in_silico_digest_stats`](tools/proteomics/fasta_utils/fasta_in_silico_digest_stats/) | Digest a FASTA and report peptide-level statistics |
156156
| [`fasta_taxonomy_splitter`](tools/proteomics/fasta_utils/fasta_taxonomy_splitter/) | Split multi-organism FASTA by taxonomy from headers |
157157

158-
#### File Conversion (8 tools)
158+
#### File Conversion (7 tools)
159159

160160
| Tool | Description |
161161
|------|-------------|
162-
| [`mzml_to_mgf_converter`](tools/proteomics/file_conversion/mzml_to_mgf_converter/) | Convert MS2 spectra from mzML to MGF format |
163-
| [`mgf_to_mzml_converter`](tools/proteomics/file_conversion/mgf_to_mzml_converter/) | Convert MGF files to mzML format |
162+
| [`mgf_mzml_converter`](tools/proteomics/file_conversion/mgf_mzml_converter/) | Bidirectional MGF ↔ mzML converter with spectrum filtering (merged from `mgf_to_mzml_converter` + `mzml_to_mgf_converter`) |
164163
| [`consensus_map_to_matrix`](tools/proteomics/file_conversion/consensus_map_to_matrix/) | Convert consensusXML to flat quantification matrix |
165164
| [`idxml_to_tsv_exporter`](tools/proteomics/file_conversion/idxml_to_tsv_exporter/) | Export idXML identification results to flat TSV |
166165
| [`ms_data_to_csv_exporter`](tools/proteomics/file_conversion/ms_data_to_csv_exporter/) | Export mzML/featureXML data to CSV with column selection |
@@ -281,15 +280,13 @@ Both `ruff` and `pytest` must pass with zero errors.
281280
| [`mass_defect_filter`](tools/metabolomics/feature_processing/mass_defect_filter/) | Filter features by mass defect and Kendrick mass defect |
282281
| [`metabolite_feature_detection`](tools/metabolomics/feature_processing/metabolite_feature_detection/) | Metabolite feature detection from LC-MS data |
283282

284-
#### Spectral Analysis (6 tools)
283+
#### Spectral Analysis (4 tools)
285284

286285
| Tool | Description |
287286
|------|-------------|
288287
| [`spectral_entropy_scorer`](tools/metabolomics/spectral_analysis/spectral_entropy_scorer/) | Compute spectral entropy similarity (Li & Fiehn 2021) |
289288
| [`neutral_loss_scanner`](tools/metabolomics/spectral_analysis/neutral_loss_scanner/) | Scan MS2 spectra for characteristic neutral losses |
290-
| [`isotope_pattern_scorer`](tools/metabolomics/spectral_analysis/isotope_pattern_scorer/) | Score observed vs. theoretical isotope patterns |
291-
| [`isotope_pattern_matcher`](tools/metabolomics/spectral_analysis/isotope_pattern_matcher/) | Generate theoretical isotope distributions and cosine similarity scoring |
292-
| [`isotope_pattern_fit_scorer`](tools/metabolomics/spectral_analysis/isotope_pattern_fit_scorer/) | Score isotope pattern fit, detect Cl/Br from M+2 enhancement |
289+
| [`isotope_pattern_analyzer`](tools/metabolomics/spectral_analysis/isotope_pattern_analyzer/) | Generate theoretical isotope distributions, cosine similarity scoring, Da/ppm tolerance, Cl/Br halogen detection (merged from `isotope_pattern_matcher` + `isotope_pattern_scorer` + `isotope_pattern_fit_scorer`) |
293290
| [`massql_query_tool`](tools/metabolomics/spectral_analysis/massql_query_tool/) | Query mzML data using MassQL-like syntax |
294291

295292
#### Compound Annotation (4 tools)
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# Isotope Pattern Analyzer
2+
3+
Generate theoretical isotope distributions for molecular formulas, score observed
4+
isotope patterns using cosine similarity, and detect halogenation (Cl/Br).
5+
6+
This tool consolidates `isotope_pattern_matcher`, `isotope_pattern_scorer`, and
7+
`isotope_pattern_fit_scorer` into a single, improved utility.
8+
9+
## Features
10+
11+
- Theoretical isotope pattern generation via pyopenms `CoarseIsotopePatternGenerator`
12+
- Cosine similarity scoring between observed and theoretical patterns
13+
- **Da or ppm m/z tolerance** — choose your preferred unit
14+
- Halogen (Cl/Br) detection from M+2 peak enhancement
15+
- JSON output with per-peak detail
16+
- Terminal bar-chart preview of the theoretical distribution
17+
- Optional numpy acceleration for cosine computation
18+
19+
## Installation
20+
21+
```bash
22+
pip install pyopenms
23+
```
24+
25+
## CLI Usage
26+
27+
```bash
28+
# Generate and display the isotope pattern for glucose
29+
python isotope_pattern_analyzer.py --formula C6H12O6
30+
31+
# Score observed peaks against the formula (colon-separated format)
32+
python isotope_pattern_analyzer.py --formula C6H12O6 \
33+
--observed "180.063:100,181.067:6.5,182.070:0.5" \
34+
--output result.json
35+
36+
# Use legacy comma-separated format (one --peaks flag per peak)
37+
python isotope_pattern_analyzer.py --formula C6H12O6 \
38+
--peaks 180.063,100.0 --peaks 181.067,6.5 \
39+
--output result.json
40+
41+
# Use ppm tolerance
42+
python isotope_pattern_analyzer.py --formula C6H12O6 \
43+
--observed "180.063:100,181.067:6.5" \
44+
--tolerance 10 --tolerance-unit ppm
45+
46+
# Detect halogenation (chlorinated compound example)
47+
python isotope_pattern_analyzer.py --formula C6H5Cl \
48+
--observed "112.007:100,113.011:5.5,114.004:33.0" \
49+
--output halogen_result.json
50+
```
51+
52+
## Output JSON Structure
53+
54+
```json
55+
{
56+
"formula": "C6H12O6",
57+
"cosine_similarity": 0.9987,
58+
"n_peaks_compared": 3,
59+
"tolerance": 0.05,
60+
"tolerance_unit": "da",
61+
"peaks": [
62+
{"peak_index": 0, "obs_mz": 180.063, "theo_mz": 180.0634, "obs_intensity": 100.0, "theo_intensity": 100.0},
63+
...
64+
],
65+
"theoretical_pattern": [...],
66+
"halogen_detection": {
67+
"m2_ratio_observed": 0.5,
68+
"m2_ratio_theoretical": 0.42,
69+
"m2_excess": 0.08,
70+
"halogen_flag": false,
71+
"possible_halogen": "none"
72+
}
73+
}
74+
```
75+
76+
## Halogen Detection Thresholds
77+
78+
| M+2 excess above theoretical | Interpretation |
79+
|------------------------------|---------------------------------|
80+
| < 10 % | No halogenation detected |
81+
| 10–20 % | Cl (weak signal) |
82+
| 20–70 % | Cl |
83+
| > 70 % | Br |

0 commit comments

Comments
 (0)