Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 168 additions & 0 deletions docs/explanations/binary-enums-vs-booleans.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Prefer Binary Enums Over Booleans

When modeling data that has two possible states, it may be tempting to use a boolean type (`true`/`false`). However, in many cases, a two-element enumeration (binary enum) is the better choice. This document explains why and when to prefer binary enums over booleans in your LinkML schemas.

## The Case for Binary Enums

The [Tidy Design Principles](https://design.tidyverse.org/boolean-strategies.html) from the tidyverse project articulate several compelling reasons to prefer enums even when there are only two choices.

### 1. Extensibility

If you later discover a third (or fourth, or fifth) option, you'll need to change the interface. With an enum, adding new values is straightforward. With a boolean, you face a breaking change.

**Example:** Consider a data submission status. You might initially think "submitted" or "not submitted" covers it:

```yaml
# Boolean approach - seems simple at first
slots:
is_submitted:
range: boolean
```

But what about "pending review", "rejected", or "withdrawn"? With a boolean, you're stuck. With an enum, you simply add new values:

```yaml
# Enum approach - extensible
enums:
SubmissionStatus:
permissible_values:
SUBMITTED:
NOT_SUBMITTED:
PENDING_REVIEW: # Easy to add later
REJECTED: # Easy to add later
```

### 2. Clarity of Intent

Boolean values often have asymmetric clarity. `something = TRUE` tells you what *will* happen, but `something = FALSE` only tells you what *won't* happen, not what will happen instead.

**Example from tidyverse:** The `sort()` function uses `decreasing = TRUE/FALSE`. Reading `decreasing = FALSE` leaves ambiguity:
- Does it mean "sort in increasing order"?
- Or does it mean "don't sort at all"?

Compare this with `vctrs::vec_sort()` which uses `direction = "asc"` or `direction = "desc"`. Both options are explicit and self-documenting.

### 3. Avoiding Cryptic Negations

Boolean parameters often require mental gymnastics to interpret, especially with negated names.

**Example from tidyverse:** The `cut()` function has a `right` parameter:
- `right = TRUE`: right-closed, left-open intervals `(a, b]`
- `right = FALSE`: right-open, left-closed intervals `[a, b)`

A clearer design would be `open_side = c("right", "left")` or `bounds = c("[)", "(]")`.

### 4. Self-Documenting Code

Enums make data and code more readable without needing to consult documentation.

```yaml
# What does this mean? Need to check docs.
sample:
is_control: false

# Self-explanatory
sample:
Comment on lines +61 to +65
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The YAML snippet under “Self-Documenting Code” contains two sample: top-level keys in the same YAML document, which makes the example invalid (the second key overwrites the first). Consider splitting this into two separate code blocks or using distinct keys (e.g., sample1/sample2) so readers can copy/paste valid YAML.

Suggested change
sample:
is_control: false
# Self-explanatory
sample:
sample1:
is_control: false
# Self-explanatory
sample2:

Copilot uses AI. Check for mistakes.
sample_type: EXPERIMENTAL
```

### 5. The "Name the Scale" Pattern

When converting booleans to enums, consider naming the scale with values that represent points on it. This signals that intermediate values could be added.

**Example:** Instead of `verbose = TRUE/FALSE`, use:

```yaml
enums:
VerbosityLevel:
permissible_values:
NONE:
description: No output
MINIMAL:
description: Errors only
NORMAL:
description: Standard output
VERBOSE:
description: Detailed output
DEBUG:
description: All available information
```

## When Booleans Are Acceptable

Booleans remain appropriate in certain cases:

1. **Truly binary states**: The states are fundamentally and permanently binary (e.g., physical properties like "alive/dead" in certain contexts)

2. **Well-named parameters**: The parameter name makes both states crystal clear (e.g., `include_header` where `false` clearly means "exclude header")

3. **Toggle operations**: When the operation is clearly about enabling/disabling something (`enabled = true/false`)

## LinkML Examples

### Binary Enum Pattern

```yaml
enums:
SortDirection:
permissible_values:
ASCENDING:
description: Sort from lowest to highest
meaning: SIO:001395 # ascending order
DESCENDING:
description: Sort from highest to lowest
meaning: SIO:001396 # descending order

StrandOrientation:
permissible_values:
FORWARD:
description: Forward/plus strand
meaning: SO:0000853 # forward_strand
REVERSE:
description: Reverse/minus strand
meaning: SO:0000854 # reverse_strand

PresenceStatus:
permissible_values:
PRESENT:
description: The entity is present
ABSENT:
description: The entity is absent
NOT_DETERMINED:
description: Presence could not be determined
```
Comment on lines +103 to +133
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is titled “Binary Enum Pattern”, but the PresenceStatus example includes three permissible values (adds NOT_DETERMINED). That’s a good illustration of extensibility, but it’s no longer a binary enum and may confuse readers. Consider moving PresenceStatus into its own “Extending a binary enum” example or renaming the section to reflect that it includes non-binary cases.

Copilot uses AI. Check for mistakes.

### Applying to Slots

```yaml
slots:
sort_direction:
range: SortDirection
description: Direction for sorting results

strand:
range: StrandOrientation
description: DNA strand orientation

presence:
range: PresenceStatus
description: Whether the feature was detected
```

## Summary

| Aspect | Boolean | Binary Enum |
|--------|---------|-------------|
| Extensibility | Poor - breaking change to add states | Good - add new values easily |
| Clarity | Often asymmetric | Both values explicit |
| Documentation | Requires external docs | Self-documenting |
| Ontology mapping | Not possible | Supports `meaning` annotations |
| Future-proofing | Risky | Safe |

When in doubt, prefer a two-element enum. The small additional effort pays dividends in clarity, maintainability, and extensibility.

## References

- [Tidy Design Principles: Prefer an enum, even if only two choices](https://design.tidyverse.org/boolean-strategies.html)
- [Tidy Design Principles: Explicit Strategies](https://design.tidyverse.org/explicit-strategies.html)
- [Tidy Design Principles: Extract strategies into objects](https://design.tidyverse.org/strategy-objects.html)
4 changes: 4 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ A collection of commonly used value sets
- [Agentic IDE Support](how-to-guides/agentic-ide-support.md)
- [Sync UniProt Species](how-to-guides/sync-uniprot-species.md)

## Explanations

- [Prefer Binary Enums Over Booleans](explanations/binary-enums-vs-booleans.md) - Why two-element enums are often better than boolean types

Note: this schema consists ONLY of enums, so it is normal
that classes and slots are empty.

Expand Down