Skip to content

Add wildcard CQL#866

Open
ldecicco-USGS wants to merge 6 commits intoDOI-USGS:developfrom
ldecicco-USGS:develop
Open

Add wildcard CQL#866
ldecicco-USGS wants to merge 6 commits intoDOI-USGS:developfrom
ldecicco-USGS:develop

Conversation

@ldecicco-USGS
Copy link
Collaborator

This PR includes some minor fixes for the field and combined meta.

It switches the WQP table reader to data.table::fread. There are a few minor differences in the WQX3 table, but the fread conversion looks closer to the actual returned text.

The big functional change is adding a wildcard CQL2 template which will allow this:

hucs <- read_waterdata_combined_meta(
+   hydrologic_unit_code = c("11010008", "11010009"),
+   site_type = c("Stream", "Spring")
+ )
Requesting:
https://api.waterdata.usgs.gov/ogcapi/v0/collections/combined-metadata/items?f=json&lang=en-US&limit=50000
Remaining requests this hour:2275 
                                   
> unique(hucs$site_type)
[1] "Stream" "Spring"
> unique(hucs$hydrologic_unit_code)
 [1] "110100080704" "110100090104" "110100090202" "110100080302" "110100080203"
 [6] "11010008"     "110100080212" "110100080102" "110100080802" "110100090107"
[11] "110100080904" "110100080603" "110100090306" "110100080502" "110100080701"
[16] "110100080209" "110100080902" "110100080206" "110100080309" "110100080210"
[21] "110100090201" "110100090204" "110100080111" "110100080801" "11010009"    
[26] "110100080211" "110100080401" "110100080104" "110100080308" "110100080903"
[31] "110100080806" "110100080101" "110100090302" "110100090305" "110100080112"
[36] "110100090206" "110100090103"

So the logic is if a user specifies >1 HUC, we'll use the wildcard "like" and "or" template (Huc 0123% OR Huc01222%). Any other parameters will be tacked on with ANDs

Copy link
Collaborator

@jzemmels jzemmels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I like the new CQL templates. No breaking changes, so I'll approve. Just some scattered throughts.

In the future, exposing the templates more explicitly might be a good, flexible way for users to make more sophisticated queries. Perhaps the format could be building a CQL query string piece-by-piece using a combination of the template helper functions. From a set algebra perspective, I think the primitive operations are and, or, and not.


# Wildcards:
if(names(parameter) %in% c("hydrologic_unit_code")){
template_path <- system.file("templates/param.CQL2.like", package = "dataRetrieval")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool! Is the assumption that if a user wants data for a certain HUC level that they'll also want data for the sub-HUCs (no idea if that's the right terminology)?

return(whisker::whisker.render(template, parameter_list))

# Wildcards:
if(names(parameter) %in% c("hydrologic_unit_code")){
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to think if there are other parameters that a user might want to match "like" with. Maybe something site_type_cd to capture ST and related site types? Perhaps that's not "assumed" default behavior?

quote = ifelse(csv, '\"', ""),
delim = ifelse(csv, ",", "\t")))

retval <- data.table::fread(text = doc, data.table = FALSE,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😎

#' What and where the control of flow is for the gage pool.
#' @param measurement_rated `r get_ogc_params("field-measurements")$measurement_rated`
#' Rated measurement based on the hydrologic/hydraulic conditions in which the measurement was made
#' (excellent (2 percent), good (5 percent), fair (8 percent), or poor (more than 8 percent). percent), or poor (more than 8 percent)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' (excellent (2 percent), good (5 percent), fair (8 percent), or poor (more than 8 percent). percent), or poor (more than 8 percent)]
#' (excellent (2 percent), good (5 percent), fair (8 percent), or poor (more than 8 percent). percent)

```{r}
range <- as.Date(c("2025-01-01", "2026-03-02"))

complete_df <- data.frame(time = seq.Date(from = range[1],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the range vector need to be defined if it's only used here?

Suggested change
complete_df <- data.frame(time = seq.Date(from = range[1],
complete_df <- data.frame(time = seq.Date(from = as.Date("2025-01-01"),

```{r, message=FALSE, warning=FALSE}
range <- as.Date(c("2025-01-01", "2026-02-01"))
```{r}
range <- as.Date(c("2025-01-01", "2026-03-02"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete range if start date is hardcoded on next line

Suggested change
range <- as.Date(c("2025-01-01", "2026-03-02"))

This section will focus on important differences between the statistics and OGC-compliant APIs and other tips for working with the endpoint.

* **No request limit or API token**: at time of writing, the statistics API does not limit the number of requests that can be made per hour. It also does not require you sign up for an API token. Requesting data from the statistics API does not count against your total request limit to the OGC-compliant APIs.
* **Higher API limits**: at time of writing, the statistics API is limited to the default limits for any api.gov service which is 4000 requests per hour per IP. The API token used for the other USGS OGC-compliant APIs is included in the request, and changes the limit to 4000 requests per hour per token. There is a chance this could make a difference if you are running code on a shared IP: for example a large office, GitHub, Gitlab, etc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing.

Instead of `time_of_year` and `time_of_year_type` columns, this output contains `start_date`, `end_date`, and `interval_type` columns representing the daterange over which the average was calculated.
The first row shows the average January, 2024 discharge was about 112 cubic feet per second.
We again have extra rows: the second row contains the **calendar** year 2024 average and the third contains the **water** year 2024 average.
Instead of `time_of_year` and `time_of_year_type` columns, this output contains `start_date`, `end_date`, and `interval_type` columns representing the daterange over which the average was calculated. The first row shows the average January, 2024 discharge was about 112 cubic feet per second. We again have extra rows: the second row contains the **calendar** year 2024 average and the third contains the **water** year 2024 average.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal preference, but I like starting new sentences in the same paragraph on a new line in markdown. I find it more readable. These changes are fine, though.

@@ -35,7 +42,7 @@
Consider the output below, where we request day-of-year discharge averages for January 1 and January 2.
Note that the `start_date` and `end_date` are set in `month-year` format to describe the day-of-year range.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My typo

Suggested change
Note that the `start_date` and `end_date` are set in `month-year` format to describe the day-of-year range.
Note that the `start_date` and `end_date` are set in `month-day` format to describe the day-of-year range.

multiyear_daterange_mean
```

Before we move on, consider the following example where we create a Monthly mean statistics table similar to what you'd find in the [Water Year Summaries](https://rconnect.chs.usgs.gov/water-year-summaries-dev/?_inputs_&render_button=1&site_no_select=%2205428500%22&wateryear_select=%222024%22).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I noticed while writing but forgot to mention. I still think this is helpful as an example, but the WYS use the more complex, "proper" rounding rules.

Suggested change
Before we move on, consider the following example where we create a Monthly mean statistics table similar to what you'd find in the [Water Year Summaries](https://rconnect.chs.usgs.gov/water-year-summaries-dev/?_inputs_&render_button=1&site_no_select=%2205428500%22&wateryear_select=%222024%22). Note that the values reported here are slightly different from what you'll find in the Water Year Summary because of differences in how values are rounded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants