Update agent examples, quickstart and skills to support Lakebase Autoscaling#151
Conversation
248b026 to
e39267c
Compare
dhruv0811
left a comment
There was a problem hiding this comment.
Looks good overall!! Left some comments for local testing cleanup and some clarifications on lakebase options.
| # TODO: Update with your Lakebase instance for session storage | ||
| # Option 1: Provisioned instance (set instance name) | ||
| # LAKEBASE_INSTANCE_NAME= | ||
| # Option 2: Autoscaling instance (set project and branch) |
There was a problem hiding this comment.
Did we not have a third option with just autoscaling_endpoint?
There was a problem hiding this comment.
the thinking here is that when users develop locally, they do so via passing in project/branch name. the autoscaling_endpoint is only used when we are in the app environment and we can directly read from PGENDPOINT (since it's not as user friendly to pass in) - hence adding this check: https://github.com/databricks/app-templates/pull/151/changes#diff-1e8ffff9ea9c91fc9dce7d8af129175d1fcfe357eabbcec14705feff13eeacacR53
| # Check for Lakebase access/connection errors | ||
| if any(keyword in error_msg for keyword in ["permission"]): | ||
| logger.error(f"Lakebase access error: {e}") | ||
| lakebase_desc = LAKEBASE_INSTANCE_NAME or LAKEBASE_AUTOSCALING_ENDPOINT or f"{LAKEBASE_AUTOSCALING_PROJECT}/{LAKEBASE_AUTOSCALING_BRANCH}" |
There was a problem hiding this comment.
Is it worth doing some validation like we did in the SDK to ensure that we only have one valid combination of variables set here? Say a user has project and branch set, but accidentally also set instance name, they should be notified rather than silently picking up instance name and short circuiting here.
There was a problem hiding this comment.
since the sdk covers all these cases/will throw errors, I think adding validation here would be a bit redundant
| LAKEBASE_AUTOSCALING_ENDPOINT = os.getenv("PGENDPOINT") if _is_app_env else None | ||
| LAKEBASE_AUTOSCALING_PROJECT = os.getenv("LAKEBASE_AUTOSCALING_PROJECT") or None | ||
| LAKEBASE_AUTOSCALING_BRANCH = os.getenv("LAKEBASE_AUTOSCALING_BRANCH") or None | ||
|
|
There was a problem hiding this comment.
bit of a chicken & egg prob here when DABs are not ready -
- in order for us to be able to pass in PGENDPOINT as a var, we need the autoscaling instance as a resource on the app
- DAB deployment will overwrite app resources, so it would remove manually-added postgres resource when we deploy
- we will therefore use LAKEBASE_AUTOSCALING_PROJECT and LAKEBASE_AUTOSCALING_BRANCH as
static env vars supported by our agent SDK
if we want endpoint extension in future, code will look something like:
# Autoscaling params: in the app environment, PGENDPOINT is provided automatically;
# for local dev, use project/branch names directly.
_is_app_env = bool(os.getenv("DATABRICKS_APP_NAME"))
LAKEBASE_AUTOSCALING_ENDPOINT = os.getenv("PGENDPOINT") if _is_app_env else None
LAKEBASE_AUTOSCALING_PROJECT = os.getenv("LAKEBASE_AUTOSCALING_PROJECT") or None
LAKEBASE_AUTOSCALING_BRANCH = os.getenv("LAKEBASE_AUTOSCALING_BRANCH") or None
dhruv0811
left a comment
There was a problem hiding this comment.
Added some comments based on a dry run with Claude.
agent-langgraph-long-term-memory/scripts/grant_lakebase_permissions.py
Outdated
Show resolved
Hide resolved
| value: "<your-project-name>" | ||
| - name: LAKEBASE_AUTOSCALING_BRANCH | ||
| value: "<your-branch-name>" | ||
| # Use for provisioned lakebase resource |
There was a problem hiding this comment.
adding the app.yaml back for the stateful agent examples because we need an app.yaml to deploy the app for the following flow to properly add postgres as resource:
databricks bundle deploy
│
│ Uploads source code (including app.yaml) to workspace
│ Sets app config from databricks.yml (command, env vars, resources)
│
│
▼
databricks bundle run <app_resource_name>
│
│ Starts/restarts the app using config from databricks.yml
│ App boots with LAKEBASE_AUTOSCALING_PROJECT/BRANCH env vars
│ But postgres resource is NOT attached (wiped by bundle deploy)
│
▼
databricks api patch /api/2.0/apps/ --json '{"resources": [...]}'
│
│ Re-adds postgres resource to the app
│ Grants SP access to Lakebase + injects PG* env vars for frontend
│ But running app process doesn't have the new env vars yet
│
▼
databricks api post /api/2.0/apps//deployments --json '{"source_code_path": "..."}'
│
│ Triggers a redeploy (like pressing "Deploy" in UI)
│ Reads app.yaml from source code for command + env vars
│ App restarts with postgres resource env vars now available
│
▼
App is running with all env vars + postgres resource ✓
There was a problem hiding this comment.
couple comments:
- inconsistencies between the oai stm, lg stm, and lg ltm databricks.yml files
- quickstart should be replacing / deleting the set of env vars that are NOT relevant to the lakebase instance type selected
- adding tests so we fully understand the behavior of quickstart.py
- no need to exclude lakebase client perms grant script from all the templates
in a followup, let's also make sure to update + run the integration tests
|
updates:
|
- Add autoscaling (project/branch) as an alternative to provisioned Lakebase instances across all 3 memory templates - Detect app environment via DATABRICKS_APP_NAME and use PGENDPOINT (auto-injected by platform) for autoscaling in deployed apps - Update quickstart to support creating new Lakebase instances or connecting to existing provisioned/autoscaling instances - Use autoscaling_endpoint parameter (renamed in databricks-ai-bridge) - Add databricks-ai-bridge git dependency for autoscaling support - Update tests to remove update_databricks_yml_lakebase references Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
bf85d09 to
2cdc939
Compare
| "agent-langgraph-short-term-memory": { | ||
| "sdk": "langgraph", | ||
| "bundle_name": "agent_langgraph_short_term_memory", | ||
| "has_memory": True, |
There was a problem hiding this comment.
can we just use the databricks.yml as a source of truth as to whether or not we have a database? this way we don't have to remember to change as many things
There was a problem hiding this comment.
the openai long running agent example has a lakebase persistence layer but isn't using it as a "memory store" the way the other stateful agents are. so are we sure this is the best proxy?
https://github.com/databricks/app-templates/tree/main/agent-openai-agents-sdk-long-running-agent
bbqiu
left a comment
There was a problem hiding this comment.
before merging, could we please
- manually test + add code tests for the happy paths for provisioned and autoscaling in the quickstart
a. should modify the .env.example and databricks.yml file. - verify that all templates are ready to deploy after running quickstart for both paths
.scripts/source/quickstart.py
Outdated
| # If --lakebase-provisioned-name was provided, use it directly | ||
| if provisioned_name: | ||
| print(f"Using provided provisioned Lakebase instance: {provisioned_name}") | ||
| if not validate_lakebase_instance(profile_name, provisioned_name): |
There was a problem hiding this comment.
can we have a similar check for autoscaling?
|
|
||
| def run_quickstart(template_dir: Path, profile: str, lakebase: str | None = None): | ||
| """Run `uv run quickstart --profile <profile>`, optionally with --lakebase.""" | ||
| def run_quickstart( |
There was a problem hiding this comment.
TODO: will update agent-integration-tests in follow-up PR to cover autoscaling cases
| update_env_file("LAKEBASE_AUTOSCALING_BRANCH", autoscaling_branch) | ||
| update_env_file("LAKEBASE_INSTANCE_NAME", "") | ||
|
|
||
| update_env_file("PGUSER", username) |
There was a problem hiding this comment.
do we not need pghost when using autoscaling? afaict, it's not looking for PGENDPOINT
There was a problem hiding this comment.
stamping to unblock, but please address comments before merging! i also ran all integration tests for the lakebase provisioned version of things and it worked!
thank you for getting this out so quicly!
non-blocking -- we can probably simplify the setup_lakebase script a bit, but it's fine as is for now
dhruv0811
left a comment
There was a problem hiding this comment.
Looks good overall. I ran a couple dry runs with claude, and noticed a few issues:
- DELETE privilege doesnt exist
- .env updates are creating new lines
- Should be pinning latest sdk version since that breaks grant privileges script if not on latest
- deploy sequence ordering?
- some other local vars snuck into pyproject.toml I think
| instance_info = validate_lakebase_instance(profile_name, provisioned_name) | ||
| if not instance_info: | ||
| sys.exit(1) | ||
| update_env_file("LAKEBASE_INSTANCE_NAME", provisioned_name) |
There was a problem hiding this comment.
It seems update_env_file appends to the env file and creates duplicate entries since it doesn't match the comment line in .env.example. This is more of a nit, but it'd be cleaner to properly replace the commented env var rather than add a new line with the var?
.claude/skills/add-tools-langgraph/examples/lakebase-autoscaling.md
Outdated
Show resolved
Hide resolved
…update quickstart to replace env vars









updated quickstart experience to select instance:


examples work

ex app: https://dbc-fb904e0e-bca7.staging.cloud.databricks.com/apps/agent-longtermj?o=4146361371977516
uses askuserquestion to get lakebase information upfront to run quickstart script