-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[Feature](function) Support function cross_product of DuckDB #59223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
| qt_sql "SELECT cross_product([0, 1, 0], [1, 0, 0])" | ||
|
|
||
| // abnormal test cases | ||
| try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont use this. we have test {sql, exception} for abnormal test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| // under the License. | ||
|
|
||
| suite("test_array_cross_product_function") { | ||
| // normal test cases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should create a table and test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
|
||
| public static final List<FunctionSignature> SIGNATURES = ImmutableList.of( | ||
| FunctionSignature.ret(ArrayType.of(FloatType.INSTANCE)) | ||
| .args(ArrayType.of(FloatType.INSTANCE), ArrayType.of(FloatType.INSTANCE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not double type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I referenced the vector distance functions in function_array_distance.h in the same directory. Should these be changed to double type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's interesting.. change your function to double first. we will discuss for function_array_distance.h
| const auto& arg1 = block.get_by_position(arguments[0]); | ||
| const auto& arg2 = block.get_by_position(arguments[1]); | ||
|
|
||
| auto col1 = arg1.column->convert_to_full_column_if_const(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use vector_const and const_vector to deal constancy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to confirm my understanding:
should I explicitly handle ColumnConst instead of calling convert_to_full_column_if_const()?
If my understanding is incorrect, I would appreciate your guidance on how to handle this.
| const ColumnArray* arr2 = nullptr; | ||
|
|
||
| if (const auto* nullable = | ||
| typeid_cast<const ColumnNullable*>(col1.get())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont use typeid_cast
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| auto col1 = arg1.column->convert_to_full_column_if_const(); | ||
| auto col2 = arg2.column->convert_to_full_column_if_const(); | ||
|
|
||
| if (col1->size() != col2->size()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these sizes are not array's size ,but array column's size. should be FatalError and make errmsg more proper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| float c2 = a3 * b1 - a1 * b3; | ||
| float c3 = a1 * b2 - a2 * b1; | ||
|
|
||
| nested_res->insert_value(c1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could pre-alloc all result length. and just set the value here.(resize() first, operator[] every time insert)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
| get_name(), size1, size2); | ||
| } | ||
|
|
||
| if (size1 != 3 || size2 != 3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L156 and L162 are same check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. One ensures length consistency between the two inputs, while the other enforces a fixed-size 3 requirement.
| const ColumnFloat32* float2 = nullptr; | ||
|
|
||
| if (const auto* nullable = | ||
| typeid_cast<const ColumnNullable*>(arr1->get_data_ptr().get())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in Doris BE, column inside column_array should always be ColumnNullable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
and remember to format your code ~ |
| BinaryExpression, AlwaysNotNullable { | ||
|
|
||
| public static final List<FunctionSignature> SIGNATURES = ImmutableList.of( | ||
| FunctionSignature.ret(ArrayType.of(FloatType.INSTANCE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可能最好是实现一个double
因为我看duckdb也是实现double和float两个的
ScalarFunctionSet ArrayCrossProductFun::GetFunctions() {
ScalarFunctionSet set("array_cross_product");
auto float_array = LogicalType::ARRAY(LogicalType::FLOAT, 3);
auto double_array = LogicalType::ARRAY(LogicalType::DOUBLE, 3);
set.AddFunction(
ScalarFunction({float_array, float_array}, float_array, ArrayFixedCombine<float, CrossProductOp, 3>));
set.AddFunction(
ScalarFunction({double_array, double_array}, double_array, ArrayFixedCombine<double, CrossProductOp, 3>));
for (auto &func : set.functions) {
func.SetFallible();
}
return set;
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,符号改成double的
| * cosine_distance function | ||
| */ | ||
| public class CrossProduct extends ScalarFunction implements ExplicitlyCastableSignature, | ||
| BinaryExpression, AlwaysNotNullable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duckdb的这个函数在输入为null的时候会返回null吧。
for (idx_t i = 0; i < count; i++) {
const auto lhs_idx = lhs_format.sel->get_index(i);
const auto rhs_idx = rhs_format.sel->get_index(i);
if (!lhs_format.validity.RowIsValid(lhs_idx) || !rhs_format.validity.RowIsValid(rhs_idx)) {
FlatVector::SetNull(result, i, true);
continue;
}
What problem does this PR solve?
Issue Number: #48203
Related PR: #xxx
Problem Summary:
Release note
doc: apache/doris-website#3209
The CROSS_PRODUCT function is used to compute the cross product of two arrays of size 3.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)