-
Notifications
You must be signed in to change notification settings - Fork 138
feat: CXL documentation
#2260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: CXL documentation
#2260
Conversation
| No matter where `CXL` is used, it always manifests in queries. | ||
| For example, [a calculated element](./cdl/#calculated-elements) defined in an entity will be resolved | ||
| to the respective calculation in the generated query when the entity is queried. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use expressions in various other places
- translated to EDMX-expressions
- to define projections between types
- projections can be resolved at runtime (runtime views)
- expressions can be evaluated in memory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Query does not mean that it is a database query. There can be multiple vehicles for an expression. Conceptually though, it can be understood as part of a query - whether it is either sent to the database, converted to edmx (and then sent to the backend again), or evaluated in memory.
This is an important point though. Expressions are not only meant for database queries.
Keeping this open -> happy for suggestions on how to formulate this.
assets/cxl/sets-leftjoin.drawio.svg
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whenever I see Venn diagrams explaining JOINs I'm reminded of this article/rant, where the author makes the case that set theory is a false analogy as JOINs are rather a cross product filtered by a predicate with an optional UNION:
https://blog.jooq.org/say-no-to-venn-diagrams-when-explaining-joins/
Food for thought 🤔 The JOIN diagrams shown there could also come in helpful in some Capire guides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link! This topic also came up when writing the theory section, because the inner join is indeed
Intersections (and thus Venn diagrams) are possible though if the sets are made union-compatible:
The key point of the blog entry in regards to why this can not be shown as a Venn diagram is that the sets (Authors and Books) are of a different type and are not union-compatible. So
Rather, we implicitely look at the cross product
with each circle representing:
Getting the actual authors back is now only an exercise of projection:
-> Select distinct a.* from Authors a full join Books b on 1 = 1
Since the sets are independent of the projection (what we select), any join operations simply filter on the cross product, which imho can be nicely displayed using venn diagrams. So I do disagree with the author of the blog even though they have a point. I also think the join diagrams could come in helpful.
One of the key points in cql / cxl is that path navigation allows us to work on sets and only consider the select list (projection in relational algebra) at the end. When actually writing relational algebra, the tuple needs to be defined explicitely. CQL and SQL also handle null values nicely. When writing relational algebra, null values need to be explicitely included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explaining JOINs
But we don't want to explain joins! We have sets of instances of some entity type that are associated with some other instances of in a different set. Sometimes our SQL backends use a JOIN in SQL to compute a cross product of these sets. But this is an implementation detail of the SQL backends.
Co-authored-by: Steffen Waldmann <[email protected]>
| ??? | ||
| In this example, we select all books and order them by the date of birth of their authors. | ||
| The table alias for the `author` association is used in the order by clause of the SQL query. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a hint that you should any path expression used in order by should also be on the select list
cds/cxl.md
Outdated
|
|
||
|
|
||
|
|
||
| Every entity defines a set of all possible instances: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Every entity defines a set of all possible instances: | |
| Every entity defines a set of all instances: |
cds/cxl.md
Outdated
| Every entity defines a set of all possible instances: | ||
| $${ b \in \text{Books} }$$ | ||
|
|
||
| A simple select query on Books returns the complete set → all books. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| A simple select query on Books returns the complete set → all books. | |
| A simple select from Books returns the complete set → all books. |
cds/cxl.md
Outdated
| group by genre | ||
| order by genre asc` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| group by genre | |
| order by genre asc` | |
| group by genre.name | |
| order by genre.name` |
cds/cxl.md
Outdated
| Using the infix notation to specify the query modifiers is just | ||
| syntactic sugar: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Using the infix notation to specify the query modifiers makes it really hard to understand (and explain) what the infix does. I think the plain infix filter is easy to understand though.
cds/cxl.md
Outdated
| syntactic sugar: | ||
|
|
||
| ```js | ||
| await cds.ql` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this example is easy to understand
cds/cxl.md
Outdated
| } | ||
| GROUP BY books.author.ID` | ||
|
|
||
| ## ordering term <Badge class="badge-inline" type="tip" text="💡 clickable diagram" /> { #ordering-term } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to the query language and not to the expression language, I think. Move to CQL chapter?
cds/cxl.md
Outdated
|
|
||
| $$\text{books} = \{ (a,b) \in \text{Books} \times \text{Authors} \mid b.\text{author\_id} = a.\text{id} \}$$ | ||
|
|
||
| We can select this set using the path expression `Authors:books` in the [from clause](#in-from-clause). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not Authors.books ?
cds/cql-draft.md
Outdated
| ``` | ||
| ::: | ||
|
|
||
| TODO explanation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path expression Books:author denotes the set of all author entities that can be reached by following the association author from any entity Books. I.e. the set of authors that have written a book.
cds/cql-draft.md
Outdated
|
|
||
| ### in the where clause {#in-where-clause} | ||
|
|
||
| A path expression can also be used as part of the where clause to filter based on elements of related entities: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| A path expression can also be used as part of the where clause to filter based on elements of related entities: | |
| A single-valued path expression can also be used as part of the where clause to filter based on elements of related entities: |
cds/cql-draft.md
Outdated
| ::: | ||
|
|
||
| In this example, we select all books that belong to the `Fantasy` genre. | ||
| The table alias for the `genre` association is used in the where clause of the SQL query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The table alias for the `genre` association is used in the where clause of the SQL query. |
This implementation detail does not apply to CAP Java. In CAP Java, we use table aliases T0, T1, ..., which is also an implementation detail not worth to mention.
cds/cql-draft.md
Outdated
| ::: | ||
|
|
||
| In this example, we select all books and order them by the date of birth of their authors. | ||
| The table alias for the `author` association is used in the order by clause of the SQL query. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The table alias for the `author` association is used in the order by clause of the SQL query. |
see above
cds/cql-draft.md
Outdated
| ```js [CQL] {3} | ||
| > await cds.ql` | ||
| SELECT from Authors { name, | ||
| books[ price < 19.99 ] as cheapBooks { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| books[ price < 19.99 ] as cheapBooks { | |
| books[ price < 20.00 ] as cheapBooks { |
cds/cql-draft.md
Outdated
|
|
||
| TODO: This is database specific | ||
|
|
||
| In this example, the runtime makes use of JSON functions to aggregate the related `books` into a JSON array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| In this example, the runtime makes use of JSON functions to aggregate the related `books` into a JSON array. | |
| In this example, the CAP Node.js runtime makes use of JSON functions to aggregate the related `books` into a JSON array. |
cds/cql-draft.md
Outdated
| In this example, the runtime makes use of JSON functions to aggregate the related `books` into a JSON array. | ||
| This is because SQL databases do not have a native concept of nested result sets. | ||
|
|
||
| > TODO: Link to guide about JSON functions, What about java? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about java?
How CAP Java executes this depends the expand optimization mode used. We always do multiple queries:
join (default):
SELECT T0.name FROM Authors
SELECT T1.title, T1.price FROM Authors T0 JOIN Books T1 ON T1.author_id = T0.ID AND T1.price < 20.00parent-keys:
SELECT T0.ID, T0.name FROM Authors T0 -- yields <set of author IDs>
SELECT T0.tile, T0.price FROM Books T0 WHERE T0.author_id IN (<set of author IDs>) AND T0.price < 20.00load-single (legacy):
SELECT T0.ID, T0.name FROM Authors T0 -- yields <set of author IDs>
for(id in <set of author IDs>):
SELECT T0.tile, T0.price FROM Books T0 WHERE T0.author_id = id AND T0.price < 20.00subquery
SELECT T0.ID, T0.name FROM Authors T0 -- yields <set of author IDs>
SELECT T0.tile, T0.price FROM BOOKS T0 WHERE T0.ID IN (SELECT T1.ID FROM Authors T0 JOIN Books T1 ON T1.author_id = T0.ID AND T1.price < 20.00)(Not 100% sure about subquery)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for listing these. Imho we should include the explanation around how expands are executed in a database specific section. While a developer should not need to interact with them or at most simply choose an optimization mode, it does help in the understanding of what is actually happening -> demystifying.
| ### in the from clause {#in-from-clause} | ||
|
|
||
| A path expression can also be used in the `from` clause of a query to navigate to a related entity: | ||
| When navigating along a to-many association to a leaf element, the result is flattened: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is flattened
No. You create a cross product of the associated entities. This is nothing we should promote as it might have slipped in by accident. Also it might work only in the translation to SQL. The example below will result in a left outer join of books and authors. Also authors which have not written a book will be included.
Please either expand books from authors or start the selection from books:
Select from Authors { books { title } as title, name as author }(gives a nested result)
or
Select from Books { title, author.name as author }
(will not include authors that have written no books)
The important point here is: the expression language CXL does not make any assumption that it is translated to SQL or how it's converted to SQL.
In your example below, the path expression books.title is just the set of titles of the books which are associated to a particular author via the Authors entity's books association. Hence the query
SELECT from Authors { books.title as title, name as author }should return something like
[
{ title: ['Wuthering Heights'], author: 'Emily Brontë' },
{ title: ['Jane Eyre'], author: 'Charlotte Brontë' },
{ title: ['Eleonora', 'The Raven'], author: 'Edgar Allen Poe' },
{ title: ['Catweazle'], author: 'Richard Carpenter' }
]
which it doesn't. Instead it returns the "flattened" result which you mention. I think it does so by accident!
To make things worse think about:
SELECT FROM Authors {
name,
books.title book,
addresses.{city, street} as address,
phonenumbers.{country_code, area_code, extension}
}This would result in an unwanted, unexpected 4-way left outer join.
I think there is no valid use case for path expression involving to-many associations besides in expands and exists!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an ask by Daniel -> we should discuss this in our next review meeting
cds/cxl.md
Outdated
| ### in the where clause {#in-where-clause} | ||
|
|
||
| A path expression can also be used as part of the where clause to filter based on elements of related entities: | ||
| ::: info 💡 Associations are **forward-declared joins** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should sunset the phrase "forward-declared joins"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danjoa the forward declaration terminology according to Wikipedia does not quite match the behavior of associations.
Let's discuss this in our next review meeting.
Earlier discussion: #2260 (comment)
| > await cds.ql`SELECT from Books { title } where genre.name = 'Fantasy'` // [!code focus] | ||
| [ { title: 'Catweazle' } ] | ||
| ``` | ||
| The join condition is defined **ahead of time** as part of the association. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The join condition is defined **ahead of time** as part of the association. | |
| The foreign-key relationship is defined **ahead of time** as part of the association. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think join condition is accurate as it does not need to be a foreign key relationship, although it pretty much always includes one. A join is the SQL terminology and well understood, even if cxl can also manifest in other ways.
|
@danjoa the PR is now ready for an initial version. CQL specific sections are moved out to a separate branch for a follow-up PR. We also moved the scientific background section as it still needs some work to be useful. @agoerler I've gone through your feedback for the parts that are in scope for the merge, will go through the rest later. |
agoerler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this documentation is way to much focused on explaining how CQL is converted to SQL. Instead we should provide a documentation that explains the fundamentals of the expression language:
Values
- literals
- refs
- paths
- filtered paths -> predicates
- paths
- parameters
- value functions
- value expressions
Predicates
- comparison predicates
- connectives (AND, OR)
- negation
- Boolean functions
I would not know why you need SQL to explain any of this.
cds/cxl.md
Outdated
|
|
||
| ```cds | ||
| annotate AdminService.Authors:dateOfDeath with @assert: (case | ||
| when dateOfDeath > $now then 'Cannot be in the future' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dateOfDeath > $now
$now is a timestamp whereas dateOfDeath is a (local) date hence the types are incompatible.
We should introduce $today, which would be quite handy.
| :::code-group | ||
| ```js [CQL] | ||
| > await cds.ql`SELECT from Books:author { name }` // [!code focus] | ||
| > await cds.ql `SELECT from Authors { books.title as title, name as author }` // [!code focus] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please void the navigation over the to-many association books as it introduces the before mentioned join
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned, it was requested by Daniel -> need to discuss this together. Until then, I'm keeping it in the PR
|
|
||
|
|
||
| ::: info 💡 A binary operator is an operator that operates on two operands. | ||
| ::: info A binary operator is an operator that operates on two operands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here would be the place to document which operators there are. That = does a three-valued comparison whereas == does a two-valued comparison. What's the precedence of operators. That we support || for string concatenation. This is all not documented so far. But it's very fundamental. We can't explain infix filters without explaining filter predicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we still need to document the operators (and also the literals) more.
Here's also some explanation for operators, but interestingly only for the node impl variant:
https://cap.cloud.sap/docs/guides/databases?impl-variant=node#standard-operators
Java then has its own explanation in the Java CQL docs:
https://cap.cloud.sap/docs/java/working-with-cql/query-api#comparison-operators
I like the way it is documented in Java, we can probably do something similar here. Do you want to contribute this?
The samples all have the corresponding SQL to make transparent what actually happens in the background. This is quite important for humans and especially for LLMs. The SQL is always hidden behind an extra tab, so it is not the focus. If CQL (which CXL is a part of) was really its own language, it would make sense to describe and specify everything in detail from the ground up. Since it is based on SQL, we can use that fact to build on pre-existing knowledge. To the structure of Value vs. Predicate... I don't agree. Predicates are also "values"... or rather, expressions... simply boolean expressions. There are some constructs which require that an expression returns a boolean (case when, ternary, filters, ...), but this boolean can also come from a literal, a ref etc. |

TBD