Skip to content

Conversation

@juruo-c
Copy link

@juruo-c juruo-c commented Dec 21, 2025

What problem does this PR solve?

Issue Number: #48203

Related PR: #xxx

Problem Summary:

Release note

doc: apache/doris-website#3209
The CROSS_PRODUCT function is used to compute the cross product of two arrays of size 3.

SELECT CROSS_PRODUCT([1, 2, 3], [2, 3, 4]), CROSS_PRODUCT([1, 0, 0], [0, 1, 0]);
+-------------------------------------+-------------------------------------+
| CROSS_PRODUCT([1, 2, 3], [2, 3, 4]) | CROSS_PRODUCT([1, 0, 0], [0, 1, 0]) |
+-------------------------------------+-------------------------------------+
| [-1, 2, -1]                         | [0, 0, 1]                           |
+-------------------------------------+-------------------------------------+

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@juruo-c juruo-c requested a review from zclllyybb as a code owner December 21, 2025 14:11
@Thearas
Copy link
Contributor

Thearas commented Dec 21, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zclllyybb zclllyybb self-assigned this Dec 21, 2025
qt_sql "SELECT cross_product([0, 1, 0], [1, 0, 0])"

// abnormal test cases
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont use this. we have test {sql, exception} for abnormal test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// under the License.

suite("test_array_cross_product_function") {
// normal test cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should create a table and test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


public static final List<FunctionSignature> SIGNATURES = ImmutableList.of(
FunctionSignature.ret(ArrayType.of(FloatType.INSTANCE))
.args(ArrayType.of(FloatType.INSTANCE), ArrayType.of(FloatType.INSTANCE))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not double type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I referenced the vector distance functions in function_array_distance.h in the same directory. Should these be changed to double type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's interesting.. change your function to double first. we will discuss for function_array_distance.h

const auto& arg1 = block.get_by_position(arguments[0]);
const auto& arg2 = block.get_by_position(arguments[1]);

auto col1 = arg1.column->convert_to_full_column_if_const();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use vector_const and const_vector to deal constancy

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to confirm my understanding:
should I explicitly handle ColumnConst instead of calling convert_to_full_column_if_const()?
If my understanding is incorrect, I would appreciate your guidance on how to handle this.

const ColumnArray* arr2 = nullptr;

if (const auto* nullable =
typeid_cast<const ColumnNullable*>(col1.get())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont use typeid_cast

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

auto col1 = arg1.column->convert_to_full_column_if_const();
auto col2 = arg2.column->convert_to_full_column_if_const();

if (col1->size() != col2->size()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these sizes are not array's size ,but array column's size. should be FatalError and make errmsg more proper

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

float c2 = a3 * b1 - a1 * b3;
float c3 = a1 * b2 - a2 * b1;

nested_res->insert_value(c1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could pre-alloc all result length. and just set the value here.(resize() first, operator[] every time insert)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

get_name(), size1, size2);
}

if (size1 != 3 || size2 != 3) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L156 and L162 are same check?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. One ensures length consistency between the two inputs, while the other enforces a fixed-size 3 requirement.

const ColumnFloat32* float2 = nullptr;

if (const auto* nullable =
typeid_cast<const ColumnNullable*>(arr1->get_data_ptr().get())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in Doris BE, column inside column_array should always be ColumnNullable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zclllyybb
Copy link
Contributor

and remember to format your code ~

BinaryExpression, AlwaysNotNullable {

public static final List<FunctionSignature> SIGNATURES = ImmutableList.of(
FunctionSignature.ret(ArrayType.of(FloatType.INSTANCE))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可能最好是实现一个double
因为我看duckdb也是实现double和float两个的

ScalarFunctionSet ArrayCrossProductFun::GetFunctions() {
	ScalarFunctionSet set("array_cross_product");

	auto float_array = LogicalType::ARRAY(LogicalType::FLOAT, 3);
	auto double_array = LogicalType::ARRAY(LogicalType::DOUBLE, 3);
	set.AddFunction(
	    ScalarFunction({float_array, float_array}, float_array, ArrayFixedCombine<float, CrossProductOp, 3>));
	set.AddFunction(
	    ScalarFunction({double_array, double_array}, double_array, ArrayFixedCombine<double, CrossProductOp, 3>));
	for (auto &func : set.functions) {
		func.SetFallible();
	}
	return set;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯,符号改成double的

* cosine_distance function
*/
public class CrossProduct extends ScalarFunction implements ExplicitlyCastableSignature,
BinaryExpression, AlwaysNotNullable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duckdb的这个函数在输入为null的时候会返回null吧。

	for (idx_t i = 0; i < count; i++) {
		const auto lhs_idx = lhs_format.sel->get_index(i);
		const auto rhs_idx = rhs_format.sel->get_index(i);

		if (!lhs_format.validity.RowIsValid(lhs_idx) || !rhs_format.validity.RowIsValid(rhs_idx)) {
			FlatVector::SetNull(result, i, true);
			continue;
		}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants