Conversation
markdroth
left a comment
There was a problem hiding this comment.
This looks good! My comments are mostly minor cosmetic things, but there are two small substantive comments.
Please let me know if you have any questions. Thanks!
ejona86
left a comment
There was a problem hiding this comment.
So you understand why I'm looking at this so much more closely now vs before: Earlier this was "some metrics y'all wanted for yourselves". And I knew why you'd want them, and why they'd be useful to others, but there wasn't an idea we'd move quickly to implement this cross-language for wide availability. With the idea these are going to be on by default, that means these are seen as being more important/widely used metrics. They are now intended for the masses. Thus I started reviewing them as such.
Since I'm going to be unavailable next week: I'd said the comments I need to say. Once the threads are actually resolved (not just marked resolved), it seems it would be good to go.
ejona86
left a comment
There was a problem hiding this comment.
I had thought grpc.tcp.connections_created would be paired with a closed counter, as that's what we've done in the past instead of UpDown counters. But I can live with this.
Thank you!
| | grpc.tcp.min\_rtt | Histogram (floating-point) | s | TCP's current estimate of minimum round trip time (RTT). It can be used as an indication of the network health between two endpoints. Corresponds to `tcpi_min_rtt` from `struct tcp_info`. | | ||
| | grpc.tcp.delivery\_rate | Histogram (floating-point) | By/s | TCP’s most recent measure of the connection’s "non-app-limited" throughput. The term non-app-limited means that the link is saturated by the application. The delivery rate is only reported when it is non-app-limited. Corresponds to `tcpi_delivery_rate` from `tcp_info` when `tcpi_delivery_rate_app_limited` is `false`. | | ||
| | grpc.tcp.packets\_sent | Counter (integer) | {packet} | Total packets sent by TCP including retransmissions and spurious retransmissions. Corresponds to `tcpi_data_segs_out` from `struct tcp_info`. | | ||
| | grpc.tcp.packets\_retransmitted | Counter (integer) | {packet} | Total packets sent by TCP except those sent for the first time. A packet may be retransmitted multiple times and will be counted multiple times as retransmitted. Retransmission counts include spurious retransmissions. Corresponds to `tcpi_total_retrans` from `struct tcp_info`. | |
There was a problem hiding this comment.
Since tcpi_total_retrans tracks cumulative retransmissions over a specific connection's lifetime, it should be recorded as a Gauge. Using a Counter seems incorrect because a reset to zero in the kernel would break the monotonicity required for Counters, whereas a Gauge accurately captures the connection's current state and handles resets gracefully.
There was a problem hiding this comment.
Under what conditions is the tcpi_total_retrans reset in the kernel?
There was a problem hiding this comment.
You're right—it doesn't reset mid-connection, so a monotonic Counter is appropriate here.
My concern is that struct tcp_info provides absolute cumulative totals, while the OpenTelemetry Counter API expects increments (deltas). Since implementations are forced to calculate these deltas manually between polling intervals, shouldn't we explicitly state that deltas must be reported? Without it, implementations might mistakenly report the ever-growing absolute kernel values directly.
No description provided.