-
Notifications
You must be signed in to change notification settings - Fork 0
Add documentation on preferring binary enums over booleans #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,168 @@ | ||
| # Prefer Binary Enums Over Booleans | ||
|
|
||
| When modeling data that has two possible states, it may be tempting to use a boolean type (`true`/`false`). However, in many cases, a two-element enumeration (binary enum) is the better choice. This document explains why and when to prefer binary enums over booleans in your LinkML schemas. | ||
|
|
||
| ## The Case for Binary Enums | ||
|
|
||
| The [Tidy Design Principles](https://design.tidyverse.org/boolean-strategies.html) from the tidyverse project articulate several compelling reasons to prefer enums even when there are only two choices. | ||
|
|
||
| ### 1. Extensibility | ||
|
|
||
| If you later discover a third (or fourth, or fifth) option, you'll need to change the interface. With an enum, adding new values is straightforward. With a boolean, you face a breaking change. | ||
|
|
||
| **Example:** Consider a data submission status. You might initially think "submitted" or "not submitted" covers it: | ||
|
|
||
| ```yaml | ||
| # Boolean approach - seems simple at first | ||
| slots: | ||
| is_submitted: | ||
| range: boolean | ||
| ``` | ||
|
|
||
| But what about "pending review", "rejected", or "withdrawn"? With a boolean, you're stuck. With an enum, you simply add new values: | ||
|
|
||
| ```yaml | ||
| # Enum approach - extensible | ||
| enums: | ||
| SubmissionStatus: | ||
| permissible_values: | ||
| SUBMITTED: | ||
| NOT_SUBMITTED: | ||
| PENDING_REVIEW: # Easy to add later | ||
| REJECTED: # Easy to add later | ||
| ``` | ||
|
|
||
| ### 2. Clarity of Intent | ||
|
|
||
| Boolean values often have asymmetric clarity. `something = TRUE` tells you what *will* happen, but `something = FALSE` only tells you what *won't* happen, not what will happen instead. | ||
|
|
||
| **Example from tidyverse:** The `sort()` function uses `decreasing = TRUE/FALSE`. Reading `decreasing = FALSE` leaves ambiguity: | ||
| - Does it mean "sort in increasing order"? | ||
| - Or does it mean "don't sort at all"? | ||
|
|
||
| Compare this with `vctrs::vec_sort()` which uses `direction = "asc"` or `direction = "desc"`. Both options are explicit and self-documenting. | ||
|
|
||
| ### 3. Avoiding Cryptic Negations | ||
|
|
||
| Boolean parameters often require mental gymnastics to interpret, especially with negated names. | ||
|
|
||
| **Example from tidyverse:** The `cut()` function has a `right` parameter: | ||
| - `right = TRUE`: right-closed, left-open intervals `(a, b]` | ||
| - `right = FALSE`: right-open, left-closed intervals `[a, b)` | ||
|
|
||
| A clearer design would be `open_side = c("right", "left")` or `bounds = c("[)", "(]")`. | ||
|
|
||
| ### 4. Self-Documenting Code | ||
|
|
||
| Enums make data and code more readable without needing to consult documentation. | ||
|
|
||
| ```yaml | ||
| # What does this mean? Need to check docs. | ||
| sample: | ||
| is_control: false | ||
|
|
||
| # Self-explanatory | ||
| sample: | ||
| sample_type: EXPERIMENTAL | ||
| ``` | ||
|
|
||
| ### 5. The "Name the Scale" Pattern | ||
|
|
||
| When converting booleans to enums, consider naming the scale with values that represent points on it. This signals that intermediate values could be added. | ||
|
|
||
| **Example:** Instead of `verbose = TRUE/FALSE`, use: | ||
|
|
||
| ```yaml | ||
| enums: | ||
| VerbosityLevel: | ||
| permissible_values: | ||
| NONE: | ||
| description: No output | ||
| MINIMAL: | ||
| description: Errors only | ||
| NORMAL: | ||
| description: Standard output | ||
| VERBOSE: | ||
| description: Detailed output | ||
| DEBUG: | ||
| description: All available information | ||
| ``` | ||
|
|
||
| ## When Booleans Are Acceptable | ||
|
|
||
| Booleans remain appropriate in certain cases: | ||
|
|
||
| 1. **Truly binary states**: The states are fundamentally and permanently binary (e.g., physical properties like "alive/dead" in certain contexts) | ||
|
|
||
| 2. **Well-named parameters**: The parameter name makes both states crystal clear (e.g., `include_header` where `false` clearly means "exclude header") | ||
|
|
||
| 3. **Toggle operations**: When the operation is clearly about enabling/disabling something (`enabled = true/false`) | ||
|
|
||
| ## LinkML Examples | ||
|
|
||
| ### Binary Enum Pattern | ||
|
|
||
| ```yaml | ||
| enums: | ||
| SortDirection: | ||
| permissible_values: | ||
| ASCENDING: | ||
| description: Sort from lowest to highest | ||
| meaning: SIO:001395 # ascending order | ||
| DESCENDING: | ||
| description: Sort from highest to lowest | ||
| meaning: SIO:001396 # descending order | ||
|
|
||
| StrandOrientation: | ||
| permissible_values: | ||
| FORWARD: | ||
| description: Forward/plus strand | ||
| meaning: SO:0000853 # forward_strand | ||
| REVERSE: | ||
| description: Reverse/minus strand | ||
| meaning: SO:0000854 # reverse_strand | ||
|
|
||
| PresenceStatus: | ||
| permissible_values: | ||
| PRESENT: | ||
| description: The entity is present | ||
| ABSENT: | ||
| description: The entity is absent | ||
| NOT_DETERMINED: | ||
| description: Presence could not be determined | ||
| ``` | ||
|
Comment on lines
+103
to
+133
|
||
|
|
||
| ### Applying to Slots | ||
|
|
||
| ```yaml | ||
| slots: | ||
| sort_direction: | ||
| range: SortDirection | ||
| description: Direction for sorting results | ||
|
|
||
| strand: | ||
| range: StrandOrientation | ||
| description: DNA strand orientation | ||
|
|
||
| presence: | ||
| range: PresenceStatus | ||
| description: Whether the feature was detected | ||
| ``` | ||
|
|
||
| ## Summary | ||
|
|
||
| | Aspect | Boolean | Binary Enum | | ||
| |--------|---------|-------------| | ||
| | Extensibility | Poor - breaking change to add states | Good - add new values easily | | ||
| | Clarity | Often asymmetric | Both values explicit | | ||
| | Documentation | Requires external docs | Self-documenting | | ||
| | Ontology mapping | Not possible | Supports `meaning` annotations | | ||
| | Future-proofing | Risky | Safe | | ||
|
|
||
| When in doubt, prefer a two-element enum. The small additional effort pays dividends in clarity, maintainability, and extensibility. | ||
|
|
||
| ## References | ||
|
|
||
| - [Tidy Design Principles: Prefer an enum, even if only two choices](https://design.tidyverse.org/boolean-strategies.html) | ||
| - [Tidy Design Principles: Explicit Strategies](https://design.tidyverse.org/explicit-strategies.html) | ||
| - [Tidy Design Principles: Extract strategies into objects](https://design.tidyverse.org/strategy-objects.html) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The YAML snippet under “Self-Documenting Code” contains two
sample:top-level keys in the same YAML document, which makes the example invalid (the second key overwrites the first). Consider splitting this into two separate code blocks or using distinct keys (e.g.,sample1/sample2) so readers can copy/paste valid YAML.