Skip to content

Update agent examples, quickstart and skills to support Lakebase Autoscaling#151

Merged
jennsun merged 40 commits intodatabricks:mainfrom
jennsun:lakebase-autoscaling-updates-v2
Mar 13, 2026
Merged

Update agent examples, quickstart and skills to support Lakebase Autoscaling#151
jennsun merged 40 commits intodatabricks:mainfrom
jennsun:lakebase-autoscaling-updates-v2

Conversation

@jennsun
Copy link
Contributor

@jennsun jennsun commented Mar 4, 2026

updated quickstart experience to select instance:
image
image

examples work
image

image image

ex app: https://dbc-fb904e0e-bca7.staging.cloud.databricks.com/apps/agent-longtermj?o=4146361371977516

uses askuserquestion to get lakebase information upfront to run quickstart script

image

@jennsun jennsun changed the title Update agent examples to support Lakebase Autoscaling Update agent examples and quickstart to support Lakebase Autoscaling Mar 4, 2026
@jennsun jennsun marked this pull request as ready for review March 4, 2026 21:44
@jennsun jennsun force-pushed the lakebase-autoscaling-updates-v2 branch from 248b026 to e39267c Compare March 5, 2026 23:21
Copy link
Contributor

@dhruv0811 dhruv0811 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall!! Left some comments for local testing cleanup and some clarifications on lakebase options.

# TODO: Update with your Lakebase instance for session storage
# Option 1: Provisioned instance (set instance name)
# LAKEBASE_INSTANCE_NAME=
# Option 2: Autoscaling instance (set project and branch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we not have a third option with just autoscaling_endpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the thinking here is that when users develop locally, they do so via passing in project/branch name. the autoscaling_endpoint is only used when we are in the app environment and we can directly read from PGENDPOINT (since it's not as user friendly to pass in) - hence adding this check: https://github.com/databricks/app-templates/pull/151/changes#diff-1e8ffff9ea9c91fc9dce7d8af129175d1fcfe357eabbcec14705feff13eeacacR53

# Check for Lakebase access/connection errors
if any(keyword in error_msg for keyword in ["permission"]):
logger.error(f"Lakebase access error: {e}")
lakebase_desc = LAKEBASE_INSTANCE_NAME or LAKEBASE_AUTOSCALING_ENDPOINT or f"{LAKEBASE_AUTOSCALING_PROJECT}/{LAKEBASE_AUTOSCALING_BRANCH}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth doing some validation like we did in the SDK to ensure that we only have one valid combination of variables set here? Say a user has project and branch set, but accidentally also set instance name, they should be notified rather than silently picking up instance name and short circuiting here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the sdk covers all these cases/will throw errors, I think adding validation here would be a bit redundant

@jennsun jennsun requested review from bbqiu and dhruv0811 March 6, 2026 02:36
LAKEBASE_AUTOSCALING_ENDPOINT = os.getenv("PGENDPOINT") if _is_app_env else None
LAKEBASE_AUTOSCALING_PROJECT = os.getenv("LAKEBASE_AUTOSCALING_PROJECT") or None
LAKEBASE_AUTOSCALING_BRANCH = os.getenv("LAKEBASE_AUTOSCALING_BRANCH") or None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit of a chicken & egg prob here when DABs are not ready -

  • in order for us to be able to pass in PGENDPOINT as a var, we need the autoscaling instance as a resource on the app
  • DAB deployment will overwrite app resources, so it would remove manually-added postgres resource when we deploy
  • we will therefore use LAKEBASE_AUTOSCALING_PROJECT and LAKEBASE_AUTOSCALING_BRANCH as
    static env vars supported by our agent SDK

if we want endpoint extension in future, code will look something like:

# Autoscaling params: in the app environment, PGENDPOINT is provided automatically;
# for local dev, use project/branch names directly.
_is_app_env = bool(os.getenv("DATABRICKS_APP_NAME"))
LAKEBASE_AUTOSCALING_ENDPOINT = os.getenv("PGENDPOINT") if _is_app_env else None
LAKEBASE_AUTOSCALING_PROJECT = os.getenv("LAKEBASE_AUTOSCALING_PROJECT") or None
LAKEBASE_AUTOSCALING_BRANCH = os.getenv("LAKEBASE_AUTOSCALING_BRANCH") or None

Copy link
Contributor

@dhruv0811 dhruv0811 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments based on a dry run with Claude.

@jennsun jennsun changed the title Update agent examples and quickstart to support Lakebase Autoscaling Update agent examples, quickstart and skills to support Lakebase Autoscaling Mar 10, 2026
value: "<your-project-name>"
- name: LAKEBASE_AUTOSCALING_BRANCH
value: "<your-branch-name>"
# Use for provisioned lakebase resource
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding the app.yaml back for the stateful agent examples because we need an app.yaml to deploy the app for the following flow to properly add postgres as resource:
databricks bundle deploy

│ Uploads source code (including app.yaml) to workspace
│ Sets app config from databricks.yml (command, env vars, resources)
⚠️ Overwrites app resources → wipes manually-added postgres resource


databricks bundle run <app_resource_name>

│ Starts/restarts the app using config from databricks.yml
│ App boots with LAKEBASE_AUTOSCALING_PROJECT/BRANCH env vars
│ But postgres resource is NOT attached (wiped by bundle deploy)


databricks api patch /api/2.0/apps/ --json '{"resources": [...]}'

│ Re-adds postgres resource to the app
│ Grants SP access to Lakebase + injects PG* env vars for frontend
│ But running app process doesn't have the new env vars yet


databricks api post /api/2.0/apps//deployments --json '{"source_code_path": "..."}'

│ Triggers a redeploy (like pressing "Deploy" in UI)
│ Reads app.yaml from source code for command + env vars
│ App restarts with postgres resource env vars now available


App is running with all env vars + postgres resource ✓

Copy link
Contributor

@bbqiu bbqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple comments:

  • inconsistencies between the oai stm, lg stm, and lg ltm databricks.yml files
  • quickstart should be replacing / deleting the set of env vars that are NOT relevant to the lakebase instance type selected
  • adding tests so we fully understand the behavior of quickstart.py
  • no need to exclude lakebase client perms grant script from all the templates

in a followup, let's also make sure to update + run the integration tests

@jennsun
Copy link
Contributor Author

jennsun commented Mar 11, 2026

updates:

  • make sure app.yaml/databricks.yml is synced across stateful examples
  • quickstart will sync env variables/replace unneeded ones depending on if lakebase is provisioned or autoscaling instance
  • updated test quickstart script
  • made lakebase permission granting script sync across all templates
  • added test for granting lakebase permissions script

@jennsun jennsun requested review from bbqiu and dhruv0811 March 11, 2026 21:48
jennsun and others added 13 commits March 11, 2026 16:44
- Add autoscaling (project/branch) as an alternative to provisioned
  Lakebase instances across all 3 memory templates
- Detect app environment via DATABRICKS_APP_NAME and use PGENDPOINT
  (auto-injected by platform) for autoscaling in deployed apps
- Update quickstart to support creating new Lakebase instances or
  connecting to existing provisioned/autoscaling instances
- Use autoscaling_endpoint parameter (renamed in databricks-ai-bridge)
- Add databricks-ai-bridge git dependency for autoscaling support
- Update tests to remove update_databricks_yml_lakebase references

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jennsun jennsun force-pushed the lakebase-autoscaling-updates-v2 branch from bf85d09 to 2cdc939 Compare March 11, 2026 23:45
"agent-langgraph-short-term-memory": {
"sdk": "langgraph",
"bundle_name": "agent_langgraph_short_term_memory",
"has_memory": True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just use the databricks.yml as a source of truth as to whether or not we have a database? this way we don't have to remember to change as many things

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the openai long running agent example has a lakebase persistence layer but isn't using it as a "memory store" the way the other stateful agents are. so are we sure this is the best proxy?

https://github.com/databricks/app-templates/tree/main/agent-openai-agents-sdk-long-running-agent

Copy link
Contributor

@bbqiu bbqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before merging, could we please

  1. manually test + add code tests for the happy paths for provisioned and autoscaling in the quickstart
    a. should modify the .env.example and databricks.yml file.
  2. verify that all templates are ready to deploy after running quickstart for both paths

# If --lakebase-provisioned-name was provided, use it directly
if provisioned_name:
print(f"Using provided provisioned Lakebase instance: {provisioned_name}")
if not validate_lakebase_instance(profile_name, provisioned_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a similar check for autoscaling?

Copy link
Contributor Author

@jennsun jennsun Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added validation for autoscaling instance:

Image Image

@jennsun
Copy link
Contributor Author

jennsun commented Mar 12, 2026

added provisioned + autoscaling quickstart tests and tested stateful agents (manually and with claude):
image

image image image image

@jennsun jennsun requested a review from bbqiu March 12, 2026 17:26

def run_quickstart(template_dir: Path, profile: str, lakebase: str | None = None):
"""Run `uv run quickstart --profile <profile>`, optionally with --lakebase."""
def run_quickstart(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: will update agent-integration-tests in follow-up PR to cover autoscaling cases

update_env_file("LAKEBASE_AUTOSCALING_BRANCH", autoscaling_branch)
update_env_file("LAKEBASE_INSTANCE_NAME", "")

update_env_file("PGUSER", username)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not need pghost when using autoscaling? afaict, it's not looking for PGENDPOINT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added back all the PG vars we need for frontend UI to work properly:
image

Copy link
Contributor

@bbqiu bbqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamping to unblock, but please address comments before merging! i also ran all integration tests for the lakebase provisioned version of things and it worked!

thank you for getting this out so quicly!

non-blocking -- we can probably simplify the setup_lakebase script a bit, but it's fine as is for now

Copy link
Contributor

@dhruv0811 dhruv0811 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I ran a couple dry runs with claude, and noticed a few issues:

  1. DELETE privilege doesnt exist
  2. .env updates are creating new lines
  3. Should be pinning latest sdk version since that breaks grant privileges script if not on latest
  4. deploy sequence ordering?
  5. some other local vars snuck into pyproject.toml I think

instance_info = validate_lakebase_instance(profile_name, provisioned_name)
if not instance_info:
sys.exit(1)
update_env_file("LAKEBASE_INSTANCE_NAME", provisioned_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems update_env_file appends to the env file and creates duplicate entries since it doesn't match the comment line in .env.example. This is more of a nit, but it'd be cleaner to properly replace the commented env var rather than add a new line with the var?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated so it should replace instead!
image

@jennsun jennsun merged commit 10c59f9 into databricks:main Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants