Optimize legacy find-orientations and prototype new candidate algorithms by joelvbernier · Pull Request #907 · HEXRD/hexrd

joelvbernier · 2026-03-16T05:54:14Z

Summary

This PR improves the legacy seeded find-orientations workflow by reducing redundant seed-driven candidates up front, making threshold selection more data-driven, auto-selecting more useful HKLs, and cleaning up multiprocessing behavior across platforms.

The goal is to strengthen the proven discrete-fibers + paintGrid + clustering workflow without changing its core physical model.

What Changed

condense Friedel-paired seed spots before discrete fiber generation
enable Friedel pairing by default for seeded searches
simulate reflection statistics to set clustering/support thresholds more robustly
add auto-selection for active_hkls and seed_hkls
make auto-selection favor separable, low-order, high-visibility rings
clean up multiprocessing behavior for the legacy workflow across Linux/macOS/Windows
floor paintGrid eta/omega tolerances to map bin width and warn when the requested tolerance is too tight for the map resolution
add Python 3.10 compatibility fixes needed for benchmarking and development on newer machines

Details

Friedel condensation in the legacy seeded path

Seed spots are now condensed by Friedel pairing before discrete fibers are generated. This reduces redundant candidate orientations in the seeded search, which in turn reduces both the size of the discrete-fiber candidate cloud and the number of above-threshold orientations passed downstream.

This is especially helpful in denser indexing problems, where duplicated seed peaks contribute more to combinatorial growth than to useful discrimination.

Simulator-backed reflection statistics

The clustering/bootstrap heuristics now use simulated one-grain reflection visibility statistics for the exact indexing configuration instead of relying on a brittle minimum-reflection heuristic.

The statistics include:

active reflections per grain
seed reflections per grain
Friedel-reduced seed reflections per grain
visible seeded family count per grain

These are used to set support and clustering thresholds from a configurable lower-tail percentile, which makes thresholding more stable across different materials, detector layouts, and scan configurations.

Auto-selection of active and seed HKLs

The workflow can now auto-select both:

active_hkls: a broader support-oriented map set
seed_hkls: a cleaner subset used for candidate generation

The selector prefers:

separable merged 2theta ranges
lower-order reflections
higher simulated visibility / better detector coverage
eta support away from the unreliable near-polar regions
distinct fiber families rather than degenerate or multi-order duplicates

Manual HKL selection is still fully supported.

Multiprocessing cleanup

The legacy path now uses HEXRD’s platform-aware multiprocessing context and avoids pathological slowdowns on spawn-based platforms.

This includes:

context-aware pool creation
safer chunk sizing for small jobs
spawn-aware serial fallback where fine-grained process fanout is counterproductive

This keeps Linux fork behavior efficient while avoiding severe regressions on macOS/Windows.

paintGrid tolerance consistency

paintGrid now floors the effective eta/omega tolerances to the eta-omega map bin widths and emits a warning when a requested tolerance is smaller than the map can represent.

This keeps completeness scoring and Friedel matching more internally consistent with the actual map discretization.

Benchmark Summary

Synthetic ruby eta-omega benchmarks show that the updated legacy discrete-fibers workflow produces about 50% fewer candidate orientations and about 50% fewer above-threshold orientations than master.

Legacy workflow comparison

Grains	Master Total (s)	Candidate Total (s)	Total Delta	Master Candidates	Candidate Candidates	Above Threshold Delta	Matched Grains
10	13.20	14.01	`+6.1%`	213,844	106,564	`-51.8%`	`10/10 -> 10/10`
50	43.92	25.69	`-41.5%`	1,068,503	531,371	`-50.3%`	`50/50 -> 50/50`
250	195.13	106.50	`-45.4%`	5,175,105	2,606,092	`-49.3%`	`245/250 -> 250/250`

Key takeaways:

candidate count is reduced by roughly 50% across all tested sizes
the above-threshold cloud is also reduced by roughly 50%
at moderate and larger grain counts, this translates to about 40-45% lower end-to-end runtime
at 250 grains, recovery improved from 245/250 to 250/250
mean truth misorientation at 250 grains improved from 0.565° to 0.092°

At very small grain count (10 grains), the candidate branch is slightly slower overall, which appears to reflect fixed setup overhead rather than a weakness in the seeded-search cleanup itself.

Real multiruby benchmark

On the multiruby eta-omega benchmark with the legacy discrete-fibers path:

1 CPU, Friedel off: 14.314 s, 71,281 candidates, 3 grains
1 CPU, Friedel on: 10.264 s, 38,520 candidates, 3 grains
4 CPU, Friedel on (macOS spawn context): 10.188 s, 38,520 candidates, 3 grains

This shows a clear candidate-reduction and runtime improvement from Friedel condensation, while the multiprocessing cleanup prevents severe slowdown on spawn-based platforms.

Testing

py_compile on modified modules
config/unit coverage for the new seeded-search config and HKL selection logic
real-data smoke checks on the ruby example
synthetic eta-omega benchmark comparisons against master
synthetic auto-selection smoke checks for active/seed HKL selection

Notes

This PR focuses on strengthening the legacy seeded workflow that users rely on today.

Experimental continuous-fiber / pairwise proposal-generation work remains separate and is not part of the user-facing claim here.

psavery · 2026-03-25T15:33:22Z

This needs to be updated to use the simpler formulas that Don determined over the past couple of weeks.

joelvbernier · 2026-04-02T06:52:08Z

@psavery -- PTAL at the updates here and lmk what you think!

donald-e-boyce · 2026-04-02T17:41:12Z

I will take a look at this. There are extensive changes, so I will start by running it on our sample problems.

donald-e-boyce · 2026-04-02T18:38:44Z

First thoughts.

You really need to include a .md file that explains what the new parameters are and how to choose them. I picked mine from the test_find_orientations.py file, but there definitely needs to be some examples.
I ran the NIST_ruby/single_GE example. Here are the differences:

The eta-ome-maps-ruby.npz files are identical.
The scored-orientations-ruby.npz files differ. I haven't looked in detail yet.

(hexrd-py-3.12-pr209) (MacBook-Pro-2023: ruby) 1387. cmp pr-907/scored-orientations-ruby.npz master.bak/scored-orientations-ruby.npz 
pr-907/scored-orientations-ruby.npz master.bak/scored-orientations-ruby.npz differ: char 15, line 1

The new version finds 2 grains while the old version finds 1. They are all very close, and it looks like the old is the avergae of the two new ones.

(hexrd-py-3.12-pr209) (MacBook-Pro-2023: ruby) 1390. diff pr-907/grains.out master.bak/
2,3c2
< 0             1.000000      0.000000e+00  6.6946391748353062e-01   -9.8362043744221050e-01  7.3362531032015987e-01   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   1.0000000000000000e+00   1.0000000000000000e+00   1.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00 
< 1             1.000000      0.000000e+00  6.6326579715733758e-01   -1.0104893586171402e+00  7.5477420307336163e-01   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   1.0000000000000000e+00   1.0000000000000000e+00   1.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00 
---
> 0             1.000000      0.000000e+00  6.6622257391883510e-01   -9.9485825060686206e-01  7.4134698683940792e-01   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   1.0000000000000000e+00   1.0000000000000000e+00   1.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00

So, to start with:

We need to find out how and why the scored orientations differ.
We need documentaion on the new seed-search parameters and some examples.
Also, we need to improve the duplication.

donald-e-boyce · 2026-04-03T14:12:01Z

Time to address the elephant in the room.

This is a massive pull request. It involves 9 files, 4,269 new lines of code and 230 old lines removed. There is no way this will go through any time soon.

We need to break this into smaller requests. You effectively have 5 or 10 new features.

To start with, we might look at the changes to indexer.py and paintgrid. That could be a single pull request I think. In fact, there is active pull request (#909) involving indexer.py, but it's just documentation and formatting. Plus, your changes alter the scored_orientation files for the test examples, so we definitely need to explain that.

On another note, you mention python 3.10 compatibility. In fact, python 3.10 is end of life in October this year. You may want to make this runs on the latest python versions.

In the meantime, though, we can leave this pull request as active and people can and should exercise the code using your branch. We can spread the word at our regular hexrd meetings.

joelvbernier · 2026-04-04T17:54:44Z

@donald-e-boyce @psavery -- the changes are relatively straightforward from a technical aspect.

Please refer to the MD note at the head of the PR here.

TL;DR --

The candidate fibers, defined by detected peaks from the eta omega maps, are now reduced by condensing with their Friedel pairs. Recall that these would be identical fibers. It uses an existing function in rotations.py, and roughly halves the number of candidates, massively cleaning up the search space.
There is an enhanced grain reflection statistics calculator using the existing reflection simulator, to support better clustering statistics than a simple min_samples floor based on heuristics. Additionally, this supports an "auto-select" feature for some of the more esoteric indexing parameters for uninitiated users. Note that the config is not backward-compatible; the new options to automatically select active and seed HLKs require a version update, so this would likely be tagged with a v0.10.x number. I am adding user documentation.
find-orientations previously was not using the global platform-aware multiprocessing harness, but rather had some bespoke multiprocessing that was causing problems on Mac OS (and Windows) due to the lack of fork (so sad, Mac OS...). It is now making use of the module-level multiprocessing hooks that are platform aware, so it doesn't bog down in a spawn environment (works much better with fork, but that's just the reality).

There is a benchmarking script, scripts/benchmark_find_orientations_scaling.py, that uses simulated ruby eta omega maps with options for noise, dilation, and number of grains. This could form a basis for simulations for a long-awaited paper on the algorithm.

Besides passing the canonical multiruby test we use for CI, it outperforms the existing implementation soundly, particularly as the number of grains increases (see the results in the main PR documentation note above). Not only is there a ~2x speedup, but the massive reduction in the candidate space leads to both a reduction in the number of missed grains when the count is >200, but also a notable reduction in the number of false positive orientations (that are usually filtered out by fit-grains). The slight increase in overhead, apparent for small numbers of grains (<10) is justified by the significant performance and robustness gains for large numbers.

So while there are 9 files changed, one is wholly new, and for the most part, the changes are reasonably isolated. I suggest that users take advantage of this performance upgrade ASAP.

joelvbernier · 2026-04-04T18:14:56Z

First thoughts.

1. You really need to include a `.md` file that explains what the new parameters are and how to choose them.  I picked mine from the `test_find_orientations.py` file, but there definitely needs to be some examples.

2. I ran the `NIST_ruby/single_GE` example. Here are the differences:


* The `eta-ome-maps-ruby.npz` files are identical.

* The `scored-orientations-ruby.npz` files differ. I haven't looked in detail yet.

(hexrd-py-3.12-pr209) (MacBook-Pro-2023: ruby) 1387. cmp pr-907/scored-orientations-ruby.npz master.bak/scored-orientations-ruby.npz 
pr-907/scored-orientations-ruby.npz master.bak/scored-orientations-ruby.npz differ: char 15, line 1

* The new version finds 2 grains while the old version finds 1.  They are all very close, and it looks like the old is the avergae of the two new ones.

(hexrd-py-3.12-pr209) (MacBook-Pro-2023: ruby) 1390. diff pr-907/grains.out master.bak/
2,3c2
< 0             1.000000      0.000000e+00  6.6946391748353062e-01   -9.8362043744221050e-01  7.3362531032015987e-01   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   1.0000000000000000e+00   1.0000000000000000e+00   1.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00 
< 1             1.000000      0.000000e+00  6.6326579715733758e-01   -1.0104893586171402e+00  7.5477420307336163e-01   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   1.0000000000000000e+00   1.0000000000000000e+00   1.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00 
---
> 0             1.000000      0.000000e+00  6.6622257391883510e-01   -9.9485825060686206e-01  7.4134698683940792e-01   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   1.0000000000000000e+00   1.0000000000000000e+00   1.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00   0.0000000000000000e+00

So, to start with:

1. We need to find out how and why the scored orientations differ.

2. We need documentaion on the new `seed-search` parameters and some examples.

3. Also, we need to improve the duplication.

@donald-e-boyce -- the change in scored orientations is the result of condensing Friedel pairs; this is precisely the desired result. I believe the original GE ruby did indeed have two subdomains (I recall doing NF on it ages ago), and depending on the indexing config, it is sometimes found, sometimes not. The two raw orientations are ~2 degrees apart in their cluster centroids, but are pulled closer together by fit_grains. The second grain is very incomplete and could be an artifact, especially given the truncated rotation range (only 120 degrees).

If min_completeness is on the lower end, spurious fiber tangles can appear as a distinct cluster, especially if there are noise "peaks" in the set used. There is also the effect that min_samples for the clustering imposes. Taking a deeper look. Confirmed the candidate branch runs WAY faster.

You can also examine the benchmarking results using simulated data, as well as the 3-ruby (multiruby) example, which is the CI test for find-orientations.

joelvbernier · 2026-04-04T18:20:05Z

Time to address the elephant in the room.

This is a massive pull request. It involves 9 files, 4,269 new lines of code and 230 old lines removed. There is no way this will go through any time soon.

We need to break this into smaller requests. You effectively have 5 or 10 new features.

To start with, we might look at the changes to indexer.py and paintgrid. That could be a single pull request I think. In fact, there is active pull request (#909) involving indexer.py, but it's just documentation and formatting. Plus, your changes alter the scored_orientation files for the test examples, so we definitely need to explain that.

On another note, you mention python 3.10 compatibility. In fact, python 3.10 is end of life in October this year. You may want to make this runs on the latest python versions.

In the meantime, though, we can leave this pull request as active and people can and should exercise the code using your branch. We can spread the word at our regular hexrd meetings.

I'll admit that the changes to paintGrid -- to bring it in line with the package-level multiprocessing configuration -- are extensive and should have been a separate PR. The other changes are not as extensive as they might seem on their face, and one of the files is a new benchmarking script to test performance (I think we can use this to write a paper).

There is also a bunch of new prototype only code that mistakenly made it in that is playing with pairwise fiber distance-based indexing; Let me mark those areas as they are not germane to the PR -- they have no impact on user-facing code.

joelvbernier · 2026-04-04T18:28:55Z

You are right in pointing out my overzealousness, as I included a bunch of non-user-facing prototype code to play with new indexing strategies @donald-e-boyce .

To help scope review, please focus only on the changes that directly affect the legacy seeded discrete-fibers indexing path.

The intended review scope is:

Friedel-pair condensation before discrete fiber generation
simulator-backed reflection statistics for clustering/support thresholds
auto-selection of active_hkls and seed_hkls for the legacy workflow
multiprocessing cleanup for the legacy find-orientations path
paintGrid eta/omega tolerance flooring to map bin width

Concretely, the main review surface is:

hexrd/hedm/findorientations.py
- _pair_friedel_seed_peaks()
- _collect_seed_reflections()
- generate_orientation_fibers()
- compute_reflection_statistics()
- load_eta_ome_maps()
- create_clustering_parameters()
hexrd/hedm/config/findorientations.py
- friedel_pairing
- reflection_statistics_samples
- reflection_statistics_percentile
- active_hkl_selection
- hkl_seed_selection
- auto_select_count
hexrd/hedm/indexer.py
- paintGrid() tolerance flooring / warnings
tests
- tests/config/test_find_orientations.py
- tests/test_indexer.py

Please feel free to ignore, for purposes of this review, the prototype continuous-fiber / pairwise candidate-generation work:

pairwise
pairwise-greedy
pairwise-consensus
pairwise-cover

Those paths were exploratory and are not part of the user-facing claim I’m asking reviewers to evaluate here. The main user-facing changes I want reviewed are the legacy discrete-fiber cleanup and its associated config/test coverage.

joelvbernier · 2026-04-04T18:29:57Z

This needs to be updated to use the simpler formulas that Don determined over the past couple of weeks.

what is this in reference to @psavery ?

donald-e-boyce · 2026-04-06T16:38:05Z

This needs to be updated to use the simpler formulas that Don determined over the past couple of weeks.

what is this in reference to @psavery ?

I sent you this new write-up, much cleaner, but in case you missed it, here it is again (attaching). I also had Claude implement a simple function just for solving the systems (still tweaking this, here and there). I am playing with this a little bit, having hexrd write out the hkls and sample gvecs to feed to the solver. Very interesting so far, but I want to run it on bigger data. I'll keep you in the loop on that.

fiber_intersections.py
fiber_intersections.pdf

donald-e-boyce · 2026-04-08T16:32:38Z

We really need to break this into smaller pull requests.

I think a good place to start is with the indexer.py file. For one, we just updated that file (using Claude, mainly, with some minor tweaks). The file has clean documentation, docstrings on the functions, and type hints. It also uses a TypedDict data structure to hold the paintgrid parameters. You'll need to work from the latest master branch.

I also ran some tests to see how the new changes perform. I ran a real problem, torsion data, to check things out. I ran it first with the multiprocesisng input set to 8. I ran your verison on the same one, and it only used a single process. Then I changed the multiprocessing to 1, to make sure the first one was really in parallel mode. Here are excerpts from the log files. The load-0-mp1 uses recent hexrd with 1 process, ...-mp8 uses 8 processes, and the ...-pr907 uses your code with 8 procees.

(hexrd-py-3.12) (Mac: layer-0) 1149. grep paint load-0-*/*strip
load-0-mp1/find-orientations-lshr.log.strip:hexrd.hedm.indexer - paintGrid took 391.873 seconds

load-0-mp8/find-orientations-lshr.log.strip:hexrd.hedm.indexer - paintGrid took 81.173 seconds

load-0-pr907/find-orientations-lshr.log.strip:hexrd.hedm.indexer - running paintGrid in serial mode on spawn multiprocessing context
load-0-pr907/find-orientations-lshr.log.strip:hexrd.hedm.indexer - paintGrid took 259.286 seconds

Note that your version was scoring fewer orientations.

(hexrd-py-3.12) (Mac: layer-0) 1150. grep scor load-0-*/*strip
load-0-mp1/find-orientations-lshr.log.strip:hexrd.hedm.findorientations - 	Saving 12724266 scored orientations with max completeness 100.000000%

load-0-mp8/find-orientations-lshr.log.strip:hexrd.hedm.findorientations - 	Saving 12724266 scored orientations with max completeness 100.000000%

load-0-pr907/find-orientations-lshr.log.strip:hexrd.hedm.findorientations - Saving 8358366 scored orientations with max completeness 100.000000%

So it looks like the single process times are comparable, but your update broke the parallel execution. So that needs to be fixed.

Prototype pairwise seeding and optimize find-orientations

6b17424

joelvbernier requested review from ZackAttack614 and psavery March 16, 2026 05:54

joelvbernier self-assigned this Mar 16, 2026

joelvbernier added 2 commits March 15, 2026 23:19

Fix ProcessedImageSeries indexing on Python 3.10

703c9ec

Add Python 3.10 fallback for NotRequired

09f3174

joelvbernier marked this pull request as draft March 16, 2026 07:25

joelvbernier added 2 commits March 16, 2026 01:02

Simulate reflection statistics for indexing thresholds

3b0845b

Add auto selection for indexing hkls

e68679e

joelvbernier marked this pull request as ready for review April 2, 2026 06:50

joelvbernier changed the title ~~Prototype pairwise seeding and optimize find-orientations~~ Optimize legacy find-orientations and prototype new candidate algorithms Apr 2, 2026

joelvbernier requested a review from donald-e-boyce April 4, 2026 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize legacy find-orientations and prototype new candidate algorithms#907

Optimize legacy find-orientations and prototype new candidate algorithms#907
joelvbernier wants to merge 5 commits intomasterfrom
jbernier/find-orientations-cleanup

joelvbernier commented Mar 16, 2026 •

edited

Loading

Uh oh!

psavery commented Mar 25, 2026

Uh oh!

joelvbernier commented Apr 2, 2026

Uh oh!

donald-e-boyce commented Apr 2, 2026

Uh oh!

donald-e-boyce commented Apr 2, 2026

Uh oh!

donald-e-boyce commented Apr 3, 2026

Uh oh!

joelvbernier commented Apr 4, 2026 •

edited

Loading

Uh oh!

joelvbernier commented Apr 4, 2026 •

edited

Loading

Uh oh!

joelvbernier commented Apr 4, 2026

Uh oh!

joelvbernier commented Apr 4, 2026

Uh oh!

joelvbernier commented Apr 4, 2026

Uh oh!

donald-e-boyce commented Apr 6, 2026

Uh oh!

donald-e-boyce commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

joelvbernier commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Details

Friedel condensation in the legacy seeded path

Simulator-backed reflection statistics

Auto-selection of active and seed HKLs

Multiprocessing cleanup

paintGrid tolerance consistency

Benchmark Summary

Legacy workflow comparison

Real multiruby benchmark

Testing

Notes

Uh oh!

psavery commented Mar 25, 2026

Uh oh!

joelvbernier commented Apr 2, 2026

Uh oh!

donald-e-boyce commented Apr 2, 2026

Uh oh!

donald-e-boyce commented Apr 2, 2026

Uh oh!

donald-e-boyce commented Apr 3, 2026

Uh oh!

joelvbernier commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joelvbernier commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joelvbernier commented Apr 4, 2026

Uh oh!

joelvbernier commented Apr 4, 2026

Uh oh!

joelvbernier commented Apr 4, 2026

Uh oh!

donald-e-boyce commented Apr 6, 2026

Uh oh!

donald-e-boyce commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joelvbernier commented Mar 16, 2026 •

edited

Loading

joelvbernier commented Apr 4, 2026 •

edited

Loading

joelvbernier commented Apr 4, 2026 •

edited

Loading