diff --git a/docs/tools/google-cloud/bigquery-agent-analytics.md b/docs/tools/google-cloud/bigquery-agent-analytics.md
index aff590f68..4e2cf7382 100644
--- a/docs/tools/google-cloud/bigquery-agent-analytics.md
+++ b/docs/tools/google-cloud/bigquery-agent-analytics.md
@@ -1,10 +1,17 @@
# BigQuery Agent Analytics Plugin
- Supported in ADKPython v1.19.0Preview
+ Supported in ADKPython v1.21.0Preview
-The BigQuery Agent Analytics Plugin significantly enhances the Agent Development Kit (ADK) by providing a robust solution for in-depth agent behavior analysis. Using the ADK Plugin architecture and the BigQuery Storage Write API, it captures and logs critical operational events directly into a Google BigQuery table, empowering you with advanced capabilities for debugging, real-time monitoring, and comprehensive offline performance evaluation.
+The BigQuery Agent Analytics Plugin (v2.0) significantly enhances the Agent Development Kit (ADK) by providing a robust, scalable, and feature-rich solution for in-depth agent behavior analysis. Using the high-throughput BigQuery Storage Write API, it captures and logs critical operational events directly into a Google BigQuery table. This empowers you with advanced capabilities for debugging, real-time monitoring, and comprehensive offline performance evaluation.
+
+Version 2.0 is a complete rewrite of the plugin, introducing major improvements:
+
+- **High-Throughput Ingestion:** Now uses the BigQuery Write API for asynchronous, streaming ingestion, making it suitable for high-volume production environments.
+- **Multi-Modal Content Logging:** Can now log complex, multi-modal inputs and outputs, with capabilities to offload large content (like images, audio, video) to Google Cloud Storage (GCS).
+- **Richer Data Schema:** The BigQuery table schema is redesigned to be more structured, using `JSON` data types and nested fields to capture more detailed event data, including OpenTelemetry-style tracing (`trace_id`, `span_id`) and detailed latency metrics.
+- **Enhanced Configuration:** Offers more granular control over batching, retries, content handling, and more.
!!! example "Preview release"
@@ -27,15 +34,13 @@ The BigQuery Agent Analytics Plugin significantly enhances the Agent Development
asynchronously in a separate thread to avoid blocking the main agent
execution. Designed to handle high event volumes, the plugin preserves
event order via timestamps.
-
-The agent event data recorded varies based on the ADK event type. For more
-information, see [Event types and payloads](#event-types).
+- **Multi-modal Agent Analysis:** Track and store not just text, but also images, audio, and other binary data used or generated by your agent by offloading them to Google Cloud Storage.
## Prerequisites
-- **Google Cloud Project** with the **BigQuery API** enabled.
+- **Google Cloud Project** with the **BigQuery API** and **BigQuery Storage Write API** enabled.
- **BigQuery Dataset:** Create a dataset to store logging tables before
- using the plugin. The plugin automatically would create the necessary events table within the dataset if the table does not exist. By default, this table is named agent_events, while you can customize this with the table_id parameter in the plugin configuration.
+ using the plugin. The plugin automatically would create the necessary events table within the dataset if the table does not exist.
- **Authentication:**
- **Local:** Run `gcloud auth application-default login`.
- **Cloud:** Ensure your service account has the required permissions.
@@ -43,68 +48,72 @@ information, see [Event types and payloads](#event-types).
### IAM permissions
For the agent to work properly, the principal (e.g., service account, user account) under which the agent is running needs these Google Cloud roles:
-* `roles/bigquery.jobUser` at Project Level to run BigQuery queries in your project. This role doesn't grant access to any data on its own.
-* `roles/bigquery.dataEditor` at Table Level to write log/event data to a BigQuery Table of your choice.
-If you need the agent to create this table, you need to grant the `roles/bigquery.dataEditor` on the BigQuery dataset where you want the table to be created.
+* `roles/bigquery.jobUser` at the Project Level to run BigQuery jobs.
+* `roles/bigquery.dataEditor` at the Table or Dataset Level to write event data.
+* If you enable GCS offloading, the principal also needs `roles/storage.objectCreator` on the target GCS Bucket.
## Use with agent
You use the BigQuery Analytics Plugin by configuring and registering it with
-your ADK agent's App object. The following example shows an implementation of an
-agent with this plugin and BigQuery tools enabled:
+your ADK agent's `App` object. The following example shows a typical implementation.
```python title="my_bq_agent/agent.py"
# my_bq_agent/agent.py
import os
import google.auth
from google.adk.apps import App
-from google.adk.plugins.bigquery_agent_analytics_plugin import BigQueryAgentAnalyticsPlugin
+from google.adk.plugins.bigquery_agent_analytics_plugin import BigQueryAgentAnalyticsPlugin, BigQueryLoggerConfig
from google.adk.agents import Agent
from google.adk.models.google_llm import Gemini
-from google.adk.tools.bigquery import BigQueryToolset, BigQueryCredentialsConfig
# --- Configuration ---
+
PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT", "your-gcp-project-id")
DATASET_ID = os.environ.get("BIG_QUERY_DATASET_ID", "your-big-query-dataset-id")
-LOCATION = os.environ.get("GOOGLE_CLOUD_LOCATION", "your-gcp-project-location") # use the location of your google cloud project
+LOCATION = os.environ.get("GOOGLE_CLOUD_LOCATION", "us-central1") # The location of your dataset
+GCS_BUCKET = os.environ.get("GCS_BUCKET_FOR_OFFLOAD", "your-gcs-bucket-name") # Optional: for multi-modal
if PROJECT_ID == "your-gcp-project-id":
raise ValueError("Please set GOOGLE_CLOUD_PROJECT or update the code.")
if DATASET_ID == "your-big-query-dataset-id":
raise ValueError("Please set BIG_QUERY_DATASET_ID or update the code.")
-if LOCATION == "your-gcp-project-location":
- raise ValueError("Please set GOOGLE_CLOUD_LOCATION or update the code.")
# --- CRITICAL: Set environment variables BEFORE Gemini instantiation ---
+
os.environ['GOOGLE_CLOUD_PROJECT'] = PROJECT_ID
os.environ['GOOGLE_CLOUD_LOCATION'] = LOCATION
os.environ['GOOGLE_GENAI_USE_VERTEXAI'] = 'True' # Make sure you have Vertex AI API enabled
# --- Initialize the Plugin ---
-bq_logging_plugin = BigQueryAgentAnalyticsPlugin(
- project_id=PROJECT_ID, # project_id is required input from user
- dataset_id=DATASET_ID, # dataset_id is required input from user
- table_id="agent_events" # Optional: defaults to "agent_events". The plugin automatically creates this table if it doesn't exist.
-)
-# --- Initialize Tools and Model ---
-credentials, _ = google.auth.default(scopes=["https://www.googleapis.com/auth/cloud-platform"])
-bigquery_toolset = BigQueryToolset(
- credentials_config=BigQueryCredentialsConfig(credentials=credentials)
+# Configure the plugin for GCS offloading and specify a new table name
+bq_plugin_config = BigQueryLoggerConfig(
+ table_id="my_agent_events_v2",
+ gcs_bucket_name=GCS_BUCKET,
+ log_multi_modal_content=True,
+ batch_size=5, # Write to BigQuery in batches of 5 events
+ batch_flush_interval=2.0 # or every 2 seconds
)
-llm = Gemini(
- model="gemini-2.5-flash",
+bq_logging_plugin = BigQueryAgentAnalyticsPlugin(
+ project_id=PROJECT_ID,
+ dataset_id=DATASET_ID,
+ config=bq_plugin_config,
+ location=LOCATION
)
+# --- Initialize Model and Agent ---
+
+llm = Gemini(model="gemini-1.5-flash-001")
+
root_agent = Agent(
model=llm,
name='my_bq_agent',
- instruction="You are a helpful assistant with access to BigQuery tools.",
- tools=[bigquery_toolset]
+ instruction="You are a helpful assistant.",
)
# --- Create the App ---
+
app = App(
name="my_bq_agent",
root_agent=root_agent,
@@ -114,421 +123,184 @@ app = App(
### Run and test agent
-Test the plugin by running the agent and making a few requests through the chat
-interface, such as ”tell me what you can do” or "List datasets in my cloud project “. These actions create events which are
-recorded in your Google Cloud project BigQuery instance. Once these events have
-been processed, you can view the data for them in the [BigQuery Console](https://console.cloud.google.com/bigquery), using this query
+Run your agent and make a few requests. These actions create events which are
+recorded in your BigQuery table. You can then query the data in the [BigQuery Console](https://console.cloud.google.com/bigquery).
```sql
-SELECT timestamp, event_type, content
-FROM `your-gcp-project-id.your-big-query-dataset-id.agent_events`
-ORDER BY timestamp DESC
+SELECT
+ timestamp,
+ event_type,
+ JSON_QUERY(content, '$') AS content_json,
+ status,
+ error_message
+FROM
+ `your-gcp-project-id.your-big-query-dataset-id.my_agent_events_v2`
+ORDER BY
+ timestamp DESC
LIMIT 20;
+
```
## Configuration options
-You can customize the plugin using `BigQueryLoggerConfig`.
-
-- **`enabled`** (`bool`, default: `True`): To disable the plugin from logging agent data to the BigQuery table, set this parameter to False.
-- **`event_allowlist`** (`Optional[List[str]]`, default: `None`): A list
- of event types to log. If `None`, all events are logged except those in
- `event_denylist`. For a comprehensive list of supported event types, refer
- to the [Event types and payloads](#event-types) section.
-- **`event_denylist`** (`Optional[List[str]]`, default: `None`): A list of
- event types to skip logging. For a comprehensive list of supported event
- types, refer to the [Event types and payloads](#event-types) section.
-- **`content_formatter`** (`Optional[Callable[[Any], str]]`, default:
- `None`): An optional function to format event content before logging. The
- following code illustrates how to implement the content formatter.
-- **`shutdown_timeout`** (`float`, default: `5.0`): Seconds to wait for
- logs to flush during shutdown.
-- **`client_close_timeout`** (`float`, default: `2.0`): Seconds to wait
- for the BigQuery client to close.
-- **`max_content_length`** (`int`, default: `500`): The maximum length of
- content parts before truncation.
-
-The following code sample shows how to define a configuration for the
-BigQuery Agent Analytics plugin:
-
-```python
-import json
-import re
-
-from google.adk.plugins.bigquery_agent_analytics_plugin import BigQueryLoggerConfig
-
-def redact_dollar_amounts(event_content: Any) -> str:
- """
- Custom formatter to redact dollar amounts (e.g., $600, $12.50)
- and ensure JSON output if the input is a dict.
- """
- text_content = ""
- if isinstance(event_content, dict):
- text_content = json.dumps(event_content)
- else:
- text_content = str(event_content)
-
- # Regex to find dollar amounts: $ followed by digits, optionally with commas or decimals.
- # Examples: $600, $1,200.50, $0.99
- redacted_content = re.sub(r'\$\d+(?:,\d{3})*(?:\.\d+)?', 'xxx', text_content)
-
- return redacted_content
-
-config = BigQueryLoggerConfig(
- enabled=True,
- event_allowlist=["LLM_REQUEST", "LLM_RESPONSE"], # Only log these events
- # event_denylist=["TOOL_STARTING"], # Skip these events
- shutdown_timeout=10.0, # Wait up to 10s for logs to flush on exit
- client_close_timeout=2.0, # Wait up to 2s for BQ client to close
- max_content_length=500, # Truncate content to 500 chars (default)
- content_formatter=redact_dollar_amounts, # Redact the dollar amounts in the logging content
-
-)
-
-plugin = BigQueryAgentAnalyticsPlugin(..., config=config)
-```
+You can customize the plugin by passing a `BigQueryLoggerConfig` object during initialization.
+
+- **`enabled`** (`bool`, default: `True`): Enable or disable the plugin.
+- **`event_allowlist`** (`Optional[List[str]]`, default: `None`): If set, only log events of these types.
+- **`event_denylist`** (`Optional[List[str]]`, default: `None`): If set, skip logging for these event types.
+- **`max_content_length`** (`int`, default: `512000`): The maximum length in characters for inline text content before truncation or offloading.
+- **`table_id`** (`str`, default: `agent_events_v2`): The ID of the BigQuery table to log events to.
+- **`clustering_fields`** (`List[str]`, default: `["event_type", "agent", "user_id"]`): A list of fields to use for clustering the BigQuery table, which can improve query performance and reduce costs.
+- **`log_multi_modal_content`** (`bool`, default: `True`): If `True`, logs detailed information about multi-modal content parts into the `content_parts` field.
+- **`gcs_bucket_name`** (`Optional[str]`, default: `None`): If provided, large content parts (like images, audio, or long text) will be offloaded to this Google Cloud Storage bucket.
+- **`connection_id`** (`Optional[str]`, default: `None`): If provided, this connection ID will be used as the authorizer for ObjectRef columns when offloading to GCS. Format: `"location.connection_id"`.
+- **`batch_size`** (`int`, default: `1`): The number of event rows to batch together before writing to BigQuery. Increasing this can improve ingestion performance.
+- **`batch_flush_interval`** (`float`, default: `1.0`): The maximum time in seconds to wait before flushing a batch, even if it hasn't reached `batch_size`.
+- **`shutdown_timeout`** (`float`, default: `10.0`): The number of seconds to wait for the event queue to drain during a graceful shutdown.
+- **`queue_max_size`** (`int`, default: `10000`): The maximum number of events to hold in the in-memory queue before dropping new events.
+- **`retry_config`** (`RetryConfig`, default: `RetryConfig()`): Configuration for the retry mechanism on failed writes. See the source code for `RetryConfig` attributes.
+- **`content_formatter`** (`Optional[Callable]`, default: `None`): A custom function to format or redact the `content` payload before logging.
## Schema and production setup
-The plugin automatically creates the table if it does not exist. However, for
-production, we recommend creating the table manually with **partitioning** and
-**clustering** for performance and cost optimization.
+The plugin automatically creates the table if it does not exist using a default schema. For production environments, we strongly recommend creating the table manually with **partitioning** and **clustering** to optimize performance and costs.
**Recommended DDL:**
```sql
-CREATE TABLE `your-gcp-project-id.adk_agent_logs.agent_events`
-(
- timestamp TIMESTAMP NOT NULL OPTIONS(description="The UTC time at which the event was logged."),
- event_type STRING OPTIONS(description="Indicates the type of event being logged (e.g., 'LLM_REQUEST', 'TOOL_COMPLETED')."),
- agent STRING OPTIONS(description="The name of the ADK agent or author associated with the event."),
- session_id STRING OPTIONS(description="A unique identifier to group events within a single conversation or user session."),
- invocation_id STRING OPTIONS(description="A unique identifier for each individual agent execution or turn within a session."),
- user_id STRING OPTIONS(description="The identifier of the user associated with the current session."),
- content STRING OPTIONS(description="The event-specific data (payload). Format varies by event_type."),
- error_message STRING OPTIONS(description="Populated if an error occurs during the processing of the event."),
- is_truncated BOOLEAN OPTIONS(description="Boolean flag indicates if the content field was truncated due to size limits.")
+CREATE TABLE `your-gcp-project-id.your-dataset-id.agent_events_v2` (
+ timestamp TIMESTAMP NOT NULL OPTIONS(description="The UTC timestamp when the event occurred."),
+ event_type STRING OPTIONS(description="The category of the event (e.g., 'LLM_REQUEST', 'TOOL_CALL')."),
+ agent STRING OPTIONS(description="The name of the agent that generated this event."),
+ session_id STRING OPTIONS(description="A unique identifier for the entire conversation session."),
+ invocation_id STRING OPTIONS(description="A unique identifier for a single turn or execution within a session."),
+ user_id STRING OPTIONS(description="The identifier of the end-user participating in the session, if available."),
+ trace_id STRING OPTIONS(description="OpenTelemetry trace ID for distributed tracing."),
+ span_id STRING OPTIONS(description="OpenTelemetry span ID for this specific operation."),
+ parent_span_id STRING OPTIONS(description="OpenTelemetry parent span ID to reconstruct the operation hierarchy."),
+ content JSON OPTIONS(description="The primary payload of the event, stored as a JSON object."),
+ content_parts ARRAY,
+ text STRING,
+ part_index INT64,
+ part_attributes STRING,
+ storage_mode STRING
+ >> OPTIONS(description="For multi-modal events, contains a list of content parts (text, images, etc.)."),
+ attributes JSON OPTIONS(description="A JSON object for additional event metadata."),
+ latency_ms JSON OPTIONS(description="A JSON object containing latency measurements, such as 'total_ms' and 'time_to_first_token_ms'."),
+ status STRING OPTIONS(description="The outcome of the event, typically 'OK' or 'ERROR'."),
+ error_message STRING OPTIONS(description="Detailed error message if the status is 'ERROR'."),
+ is_truncated BOOL OPTIONS(description="Boolean flag indicating if any content was truncated.")
)
PARTITION BY DATE(timestamp)
CLUSTER BY event_type, agent, user_id;
```
-### Event types and payloads {#event-types}
-
-The `content` column contains a formatted string specific to the `event_type`.
-The following table descibes these events and corresponding content.
-
-!!! note
-
- - All variable content fields (e.g., user input, model response, tool arguments, system prompt)
- - are truncated to `max_content_length` characters
- - (configured in `BigQueryLoggerConfig`, default 500) to manage log size.
-
-#### LLM interactions (plugin lifecycle)
-
-These events track the raw requests sent to and responses received from the
-LLM.
-
-
-
-
- | Event Type |
- Trigger Condition |
- Content Format Logic |
- Example Content |
-
-
-
-
-
-LLM_REQUEST
- |
-
-before_model_callback
- |
-
-Model: {model} | Prompt: {prompt} | System Prompt: Model: {model} | Prompt: {formatted_contents} | System Prompt: {system_prompt} | Params: {params} | Available Tools: {tool_names}
- |
-
-Model: gemini-2.5-flash | Prompt: user: Model: gemini-flash-2.5| Prompt: user: text: 'Hello'| System Prompt: You are a helpful assistant. | Params: {temperature=1.0} | Available Tools: ['bigquery_tool']
- |
-
-
-
-LLM_RESPONSE
- |
-
-after_model_callback
- |
- If Tool Call: Tool Name: {func_names} | Token
-Usage: {usage}
-
-**If Text:** `Tool Name: text_response, text: '{text}' | Token Usage:
-{usage}` |
-
-Tool Name: text_response, text: 'Here is the data.' | Token Usage: {prompt: 10, candidates: 5, total: 15}
- |
-
-
-
-LLM_ERROR
- |
-
-on_model_error_callback
- |
- None (Error details are in error_message
-column) |
-
-None
- |
-
-
-
-
-#### Tool usage (plugin lifecycle)
-
-These events track the execution of tools by the agent.
-
-
-
-
- | Event Type |
- Trigger Condition |
- Content Format Logic |
- Example Content |
-
-
-
-
-
-TOOL_STARTING
- |
-
-before_tool_callback
- |
-
-Tool Name: {name}, Description: {desc}, Arguments: {args}
- |
-
-Tool Name: list_datasets, Description: Lists datasets..., Arguments: {'project_id': 'my-project'}
- |
-
-
-
-TOOL_COMPLETED
- |
-
-after_tool_callback
- |
-
-Tool Name: {name}, Result: {result}
- |
-
-Tool Name: list_datasets, Result: ['dataset_1', 'dataset_2']
- |
-
-
-
-TOOL_ERROR
- |
-
-on_tool_error_callback
- |
- Tool Name: {name}, Arguments: {args} (Error details in
-error_message) |
-
-Tool Name: list_datasets, Arguments: {}
- |
-
-
-
-
-#### Agent lifecycle (plugin lifecycle)
-
-These events track the start and end of agent execution, including
-sub-agents.
-
-
-
-
- | Event Type |
- Trigger Condition |
- Content Format Logic |
- Example Content |
-
-
-
-
-
-INVOCATION_STARTING
- |
-
-before_run_callback
- |
-
-None
- |
-
-None
- |
-
-
-
-INVOCATION_COMPLETED
- |
-
-after_run_callback
- |
-
-None
- |
-
-None
- |
-
-
-
-AGENT_STARTING
- |
-
-before_agent_callback
- |
-
-Agent Name: {agent_name}
- |
-
-Agent Name: sub_agent_researcher
- |
-
-
-
-AGENT_COMPLETED
- |
-
-after_agent_callback
- |
-
-Agent Name: {agent_name}
- |
-
-Agent Name: sub_agent_researcher
- |
-
-
-
-
-#### User and generic events (Event stream)
-
-These events are derived from the `Event` objects yielded by the agent or the
-runner.
-
-
-
-
- | Event Type |
- Trigger Condition |
- Content Format Logic |
- Example Content |
-
-
-
-
-
-USER_MESSAGE_RECEIVED
- |
-
-on_user_message_callback
- |
-
-User Content: {formatted_message}
- |
-
-User Content: text: 'Show me the sales data.'
- |
-
-
-
-TOOL_CALL
- |
- event.get_function_calls() is true |
-
-call: {func_name}
- |
-
-call: list_datasets
- |
-
-
-
-TOOL_RESULT
- |
- event.get_function_responses() is true |
-
-resp: {func_name}
- |
-
-resp: list_datasets
- |
-
-
-
-MODEL_RESPONSE
- |
- event.content has parts |
-
-text: '{text}'
- |
-
-text: 'I found 2 datasets.'
- |
-
-
-
-## Advanced analysis queries
+## Event Data Structure
+
+The `content` and `attributes` columns are now `JSON` fields, providing a more structured way to store event data. The format of the `content` JSON object varies by `event_type`.
-The following example queries demonstrate how to extract information from the
-recorded ADK agent event analytics data in BigQuery. You can run these queries
-using the [BigQuery Console](https://console.cloud.google.com/bigquery).
+#### LLM_REQUEST
-Before executing these queries, ensure you update the GCP project ID, BigQuery dataset ID, and the table ID (defaulting to "agent_events" if unspecified) within the provided SQL.
+- **content:** `{"prompt": [{"role": "user", "content": "..."}], "system_prompt": "..."}`
+- **attributes:** `{"llm_config": {"temperature": 1.0, ...}, "tools": ["tool_name_1"]}`
+
+#### LLM_RESPONSE
+
+- **content:** `{"response": "text: 'Here is the data.'", "usage": {"prompt": 10, "completion": 5, "total": 15}}`
+
+#### TOOL_STARTING
+
+- **content:** `{"tool": "list_datasets", "args": {"project_id": "my-project"}}`
+
+#### TOOL_COMPLETED
+
+- **content:** `{"tool": "list_datasets", "result": ["dataset_1", "dataset_2"]}`
+
+#### Multi-Modal Data (`content_parts`)
+
+When `log_multi_modal_content` is `True`, detailed information about each part of a multi-modal request or response is stored in the `content_parts` array.
+
+- **`storage_mode`** indicates how the part is stored:
+ - `INLINE`: The content is stored directly in the `text` field.
+ - `GCS_REFERENCE`: The content was offloaded to GCS, and the `uri` field contains the `gs://` path.
+ - `EXTERNAL_URI`: The content was already a URI provided by the user.
+- **`object_ref`** contains a structured reference to the GCS object that can be queried directly if you have the correct permissions.
+
+## Advanced analysis queries
+
+The following examples demonstrate how to query the new `JSON`-based schema.
**Trace a specific conversation turn**
```sql
-SELECT timestamp, event_type, agent, content
-FROM `your-gcp-project-id.your-dataset-id.agent_events`
-WHERE invocation_id = 'your-invocation-id'
-ORDER BY timestamp ASC;
+SELECT
+ timestamp,
+ event_type,
+ agent,
+ JSON_QUERY(content) AS content_payload,
+ status
+FROM
+ `your-gcp-project-id.your-dataset-id.agent_events_v2`
+WHERE
+ invocation_id = 'your-invocation-id'
+ORDER BY
+ timestamp ASC;
+
```
**Daily invocation volume**
```sql
-SELECT DATE(timestamp) as log_date, COUNT(DISTINCT invocation_id) as count
-FROM `your-gcp-project-id.your-dataset-id.agent_events`
-WHERE event_type = 'INVOCATION_STARTING'
-GROUP BY log_date ORDER BY log_date DESC;
+SELECT
+ DATE(timestamp) as log_date,
+ COUNT(DISTINCT invocation_id) as count
+FROM
+ `your-gcp-project-id.your-dataset-id.agent_events_v2`
+WHERE
+ event_type = 'INVOCATION_STARTING'
+GROUP BY
+ log_date
+ORDER BY
+ log_date DESC;
+
```
-**Token usage analysis**
+**Average Token Usage**
+
+Extract total tokens from the `content` JSON payload for `LLM_RESPONSE` events.
```sql
SELECT
- AVG(CAST(REGEXP_EXTRACT(content, r"Token Usage:.*total: ([0-9]+)") AS INT64)) as avg_tokens
-FROM `your-gcp-project-id.your-dataset-id.agent_events`
-WHERE event_type = 'LLM_RESPONSE';
+ AVG(CAST(JSON_VALUE(content, '$.usage.total') AS INT64)) as avg_total_tokens
+FROM
+ `your-gcp-project-id.your-dataset-id.agent_events_v2`
+WHERE
+ event_type = 'LLM_RESPONSE'
+ AND JSON_VALUE(content, '$.usage.total') IS NOT NULL;
```
-**Error monitoring**
+**Find Offloaded Content**
+
+Query the `content_parts` array to find all events where content was offloaded to GCS.
```sql
-SELECT timestamp, event_type, error_message
-FROM `your-gcp-project-id.your-dataset-id.agent_events`
-WHERE error_message IS NOT NULL
-ORDER BY timestamp DESC LIMIT 50;
+SELECT
+ timestamp,
+ event_type,
+ invocation_id,
+ part.uri AS gcs_uri
+FROM
+ `your-gcp-project-id.your-dataset-id.agent_events_v2`,
+ UNNEST(content_parts) AS part
+WHERE
+ part.storage_mode = 'GCS_REFERENCE';
```
## Additional resources
- [BigQuery Storage Write API](https://cloud.google.com/bigquery/docs/write-api)
+- [Querying JSON data in BigQuery](https://cloud.google.com/bigquery/docs/reference/standard-sql/json-functions)
- [BigQuery product documentation](https://cloud.google.com/bigquery/docs)