SOLR-13309: Introduce FloatRangeField to expose Lucene 'FloatRange'#4229
SOLR-13309: Introduce FloatRangeField to expose Lucene 'FloatRange'#4229gerlowskija wants to merge 3 commits intoapache:mainfrom
Conversation
This commit adds a new field type, FloatRangeField, that can be used to
hold singular or multi-dimensional (up to 4) ranges of floats.
FloatRangeField is compatible with the previously added
`{!numericRange}` and supports similar syntax.
| * Regex fragment matching a comma-separated list of signed floating-point numbers (integers or | ||
| * floating-point literals). | ||
| */ | ||
| protected static final String COMMA_DELIMITED_FP_NUMS = |
There was a problem hiding this comment.
[0] All of these regexes are a bit messy.
Ultimately the idea here is that we want the validation for each implementing type (int, float, etc.) to be specific to what that type looks like. IntRangeField needs to be able to reject fp-values, etc.
This validation is done by regex. The regexes live in this base class because some code here that invotes this verification. Individual sub-classes specify the regex pattern they want to use by using overrideable methods getRangePattern and getSingleBoundPattern.
So that's my rationale here. Open to other ways of doing it if folks can see a better approach....
HoustonPutman
left a comment
There was a problem hiding this comment.
In general looks great, the validation for floating point numbers could always be improved in the future.
| * {!numericRange criteria="within" field=float_range}[0.0 TO 9.99] | ||
| * | ||
| * // Multi-dimensional queries (bounding boxes, cubes, tesseracts) | ||
| * {!numericRange criteria="intersects" field=bbox}[0,0 TO 10,10] |
There was a problem hiding this comment.
should bbox be renamed here? Might be confusing since we have a bbox field type not supported here, right?
| protected static final String COMMA_DELIMITED_FP_NUMS = | ||
| "-?\\d+(?:\\.\\d+)?(?:\\s*,\\s*-?\\d+(?:\\.\\d+)?)*"; | ||
|
|
||
| private static final String FP_RANGE_PATTERN_STR = |
There was a problem hiding this comment.
I noticed a small gap in the float validation regex: it doesn't accept scientific notation. I tested FP_RANGE_PATTERN_STR pattern on regex101.com, and an input like [1.1 TO 1.5e10] does not match.
Was the exclusion of scientific notation intentional? Or are values like this expected to be rejected before reaching this validation step?
Description
We recently added Solr field types "IntRangeField" and "LongRangeField" to allow users to index and search integer and long ranges (e.g. prices, business hours, grade ranges). But not everything is an int/long. Lucene has additional "range" types:
DoubleRange, andFloatRange.This PR tackles exposing
FloatRangein Solr asFloatRangeField. This is ideal for things that may not fit as a standard "whole number".Solution
This approach takes a very similar approach to that taken by #4192. AbstractNumericRangeFieldType contains most of the plumbing common to these "range" types, so the new implementation added by this PR
FloatRangeFieldcontains only that code that differs from its base/parent class.The main difference between FloatRangeField and the pre-existing IntRangeField and LongRangeField is the value parsing, which accepts floating point values that (e.g.
3.14) that would be rejected as bounds in an IntRangeField.Tests
FloatRangeFieldTest has unit tests for the new functionality, with more "integration" functionality tested in NumericRangeQParserPluginFloatTest. Both of these test classes borrow heavily from their int and long counterparts.
Checklist
Please review the following and check all that apply:
mainbranch../gradlew check.