GH-15483: [C++] Add a Fixed Shape Tensor canonical ExtensionType#8510
GH-15483: [C++] Add a Fixed Shape Tensor canonical ExtensionType#8510jorisvandenbossche merged 39 commits intoapache:mainfrom
Conversation
|
Currently, only the shape is stored. Is this enough? That does a assume a fixed row major order? |
I think we either assume that or also store strides / dimension order. I am not sure how dimension order changes are done in other frameworks (TF, pytorch, etc.) but I would assume they don't reorder tensors in memory. So I would go for storing strides. |
d4608a9 to
356c300
Compare
b5a8643 to
a5b19d7
Compare
|
In the context of testing metadata equality withinin multiple parquet files in a dataset, equality on shape and strides may be a very strict requirement. Would relaxing the equality requirement to only compare the number of tensor dimensions negatively impact the design? |
Good point. By tensor dimensions you mean shape, right? |
I was thinking even looser: def __eq__(self, other):
len(self.shape) == len(other.shape) |
Done. |
|
@jorisvandenbossche @sjperkins @pitrou is there interest to get this in? |
|
Currently we don't ship any standard extension types. I recommend discussing this on the mailing-list. |
|
fyi, the ray project created its own Tensor type: |
|
Indeed I think having a built-in Tensor value type (implemented using extension arrays) in Arrow/pyarrow would be better than having third party projects rolling their own. |
|
@wesm would there be interest in folding the Pandas side of these third-party extensions into Pandas also? |
That will be something to discuss in the pandas project. |
|
@rok, you are awesome! 👍 |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
The failing CI is unrelated? (it seems the R failures are being worked on, and the C++ failures are related to LLVM update #34768)
|
Great news! |
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
They seem unrelated indeed and I don't think they obscure any new problems as the change was fairly minimal. |
|
Merged after 2.5 years ;) Thanks @rok! |
|
Thanks for all the input and reviews everyone, very happy to see this merged! @jorisvandenbossche now let's talk about strides @ #34797 :D |
|
Benchmark runs are scheduled for baseline = 81c828e and contender = a84a39b. a84a39b is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
### Rationale for this change In the C++ the fixed shape tensor canonical extension type is implementated #8510 so we can add bindings to the extension type in Python. ### What changes are included in this PR? Binding for fixed shape tensor canonical extension type. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: #34882 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
### Rationale for this change In the C++ the fixed shape tensor canonical extension type is implementated apache#8510 so we can add bindings to the extension type in Python. ### What changes are included in this PR? Binding for fixed shape tensor canonical extension type. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#34882 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
apache#8510) > [ARROW-1614](https://issues.apache.org/jira/browse/ARROW-1614): In an Arrow table, we would like to add support for a column that has values cells each containing a tensor value, with all tensors having the same dimensions. These would be stored as a binary value, plus some metadata to store type and shape/strides. * Closes: apache#15483 Lead-authored-by: Rok Mihevc <rok@mihevc.org> Co-authored-by: Rok <rok@mihevc.org> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Ben Harkins <60872452+benibus@users.noreply.github.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
### Rationale for this change In the C++ the fixed shape tensor canonical extension type is implementated apache#8510 so we can add bindings to the extension type in Python. ### What changes are included in this PR? Binding for fixed shape tensor canonical extension type. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * Closes: apache#34882 Lead-authored-by: Alenka Frim <frim.alenka@gmail.com> Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
|
We started a mailing list discussion about potential |
Uh oh!
There was an error while loading. Please reload this page.