Optimize legacy find-orientations and prototype new candidate algorithms#907
Optimize legacy find-orientations and prototype new candidate algorithms#907joelvbernier wants to merge 5 commits intomasterfrom
Conversation
|
This needs to be updated to use the simpler formulas that Don determined over the past couple of weeks. |
|
@psavery -- PTAL at the updates here and lmk what you think! |
|
I will take a look at this. There are extensive changes, so I will start by running it on our sample problems. |
|
First thoughts.
So, to start with:
|
|
Time to address the elephant in the room. This is a massive pull request. It involves 9 files, 4,269 new lines of code and 230 old lines removed. There is no way this will go through any time soon. We need to break this into smaller requests. You effectively have 5 or 10 new features. To start with, we might look at the changes to On another note, you mention python 3.10 compatibility. In fact, python 3.10 is end of life in October this year. You may want to make this runs on the latest python versions. In the meantime, though, we can leave this pull request as active and people can and should exercise the code using your branch. We can spread the word at our regular hexrd meetings. |
|
@donald-e-boyce @psavery -- the changes are relatively straightforward from a technical aspect. Please refer to the MD note at the head of the PR here. TL;DR --
There is a benchmarking script, Besides passing the canonical multiruby test we use for CI, it outperforms the existing implementation soundly, particularly as the number of grains increases (see the results in the main PR documentation note above). Not only is there a ~2x speedup, but the massive reduction in the candidate space leads to both a reduction in the number of missed grains when the count is >200, but also a notable reduction in the number of false positive orientations (that are usually filtered out by So while there are 9 files changed, one is wholly new, and for the most part, the changes are reasonably isolated. I suggest that users take advantage of this performance upgrade ASAP. |
@donald-e-boyce -- the change in scored orientations is the result of condensing Friedel pairs; this is precisely the desired result. I believe the original GE ruby did indeed have two subdomains (I recall doing NF on it ages ago), and depending on the indexing config, it is sometimes found, sometimes not. The two raw orientations are ~2 degrees apart in their cluster centroids, but are pulled closer together by If You can also examine the benchmarking results using simulated data, as well as the 3-ruby (multiruby) example, which is the CI test for |
I'll admit that the changes to There is also a bunch of new prototype only code that mistakenly made it in that is playing with pairwise fiber distance-based indexing; Let me mark those areas as they are not germane to the PR -- they have no impact on user-facing code. |
|
You are right in pointing out my overzealousness, as I included a bunch of non-user-facing prototype code to play with new indexing strategies @donald-e-boyce . To help scope review, please focus only on the changes that directly affect the legacy seeded The intended review scope is:
Concretely, the main review surface is:
Please feel free to ignore, for purposes of this review, the prototype continuous-fiber / pairwise candidate-generation work:
Those paths were exploratory and are not part of the user-facing claim I’m asking reviewers to evaluate here. The main user-facing changes I want reviewed are the legacy discrete-fiber cleanup and its associated config/test coverage. |
what is this in reference to @psavery ? |
I sent you this new write-up, much cleaner, but in case you missed it, here it is again (attaching). I also had Claude implement a simple function just for solving the systems (still tweaking this, here and there). I am playing with this a little bit, having hexrd write out the hkls and sample gvecs to feed to the solver. Very interesting so far, but I want to run it on bigger data. I'll keep you in the loop on that. |
|
We really need to break this into smaller pull requests. I think a good place to start is with the I also ran some tests to see how the new changes perform. I ran a real problem, torsion data, to check things out. I ran it first with the Note that your version was scoring fewer orientations. So it looks like the single process times are comparable, but your update broke the parallel execution. So that needs to be fixed. |
Summary
This PR improves the legacy seeded
find-orientationsworkflow by reducing redundant seed-driven candidates up front, making threshold selection more data-driven, auto-selecting more useful HKLs, and cleaning up multiprocessing behavior across platforms.The goal is to strengthen the proven
discrete-fibers + paintGrid + clusteringworkflow without changing its core physical model.What Changed
active_hklsandseed_hklspaintGrideta/omega tolerances to map bin width and warn when the requested tolerance is too tight for the map resolutionDetails
Friedel condensation in the legacy seeded path
Seed spots are now condensed by Friedel pairing before discrete fibers are generated. This reduces redundant candidate orientations in the seeded search, which in turn reduces both the size of the discrete-fiber candidate cloud and the number of above-threshold orientations passed downstream.
This is especially helpful in denser indexing problems, where duplicated seed peaks contribute more to combinatorial growth than to useful discrimination.
Simulator-backed reflection statistics
The clustering/bootstrap heuristics now use simulated one-grain reflection visibility statistics for the exact indexing configuration instead of relying on a brittle minimum-reflection heuristic.
The statistics include:
These are used to set support and clustering thresholds from a configurable lower-tail percentile, which makes thresholding more stable across different materials, detector layouts, and scan configurations.
Auto-selection of active and seed HKLs
The workflow can now auto-select both:
active_hkls: a broader support-oriented map setseed_hkls: a cleaner subset used for candidate generationThe selector prefers:
2thetarangesManual HKL selection is still fully supported.
Multiprocessing cleanup
The legacy path now uses HEXRD’s platform-aware multiprocessing context and avoids pathological slowdowns on spawn-based platforms.
This includes:
This keeps Linux
forkbehavior efficient while avoiding severe regressions on macOS/Windows.paintGrid tolerance consistency
paintGridnow floors the effective eta/omega tolerances to the eta-omega map bin widths and emits a warning when a requested tolerance is smaller than the map can represent.This keeps completeness scoring and Friedel matching more internally consistent with the actual map discretization.
Benchmark Summary
Synthetic ruby eta-omega benchmarks show that the updated legacy
discrete-fibersworkflow produces about 50% fewer candidate orientations and about 50% fewer above-threshold orientations thanmaster.Legacy workflow comparison
+6.1%-51.8%10/10 -> 10/10-41.5%-50.3%50/50 -> 50/50-45.4%-49.3%245/250 -> 250/250Key takeaways:
250grains, recovery improved from245/250to250/250250grains improved from0.565°to0.092°At very small grain count (
10grains), the candidate branch is slightly slower overall, which appears to reflect fixed setup overhead rather than a weakness in the seeded-search cleanup itself.Real multiruby benchmark
On the multiruby eta-omega benchmark with the legacy
discrete-fiberspath:1 CPU, Friedel off:14.314 s,71,281candidates,3grains1 CPU, Friedel on:10.264 s,38,520candidates,3grains4 CPU, Friedel on (macOS spawn context):10.188 s,38,520candidates,3grainsThis shows a clear candidate-reduction and runtime improvement from Friedel condensation, while the multiprocessing cleanup prevents severe slowdown on spawn-based platforms.
Testing
py_compileon modified modulesmasterNotes
This PR focuses on strengthening the legacy seeded workflow that users rely on today.
Experimental continuous-fiber / pairwise proposal-generation work remains separate and is not part of the user-facing claim here.