-
Notifications
You must be signed in to change notification settings - Fork 104
Deep debug #262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deep debug #262
Changes from all commits
d7259d2
64f34f3
bacade9
16b5251
0fc1732
1f76e35
c4d3c4a
73c64ce
6a4c808
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,77 @@ | ||
| """ | ||
| Note: This script is a convenience script to launch experiments instead of using | ||
| the command line. | ||
|
|
||
| Copy this script and modify at will, but don't push your changes to the | ||
| repository. | ||
| """ | ||
|
|
||
| import logging | ||
| from copy import deepcopy | ||
|
|
||
| import bgym | ||
|
|
||
| from agentlab.agents.tool_use_agent.tool_use_agent import ( | ||
| DEFAULT_PROMPT_CONFIG, | ||
| GPT_4_1, | ||
| ToolUseAgentArgs, | ||
| ) | ||
| from agentlab.experiments.study import Study | ||
|
|
||
| logging.getLogger().setLevel(logging.INFO) | ||
|
|
||
| config = deepcopy(DEFAULT_PROMPT_CONFIG) | ||
| # config.keep_last_n_obs = 1 | ||
| config.obs.use_som = True | ||
|
|
||
|
|
||
| agent_configs = [ | ||
| ToolUseAgentArgs( | ||
| model_args=GPT_4_1, | ||
| config=config, | ||
| ), | ||
| # ToolUseAgentArgs( | ||
| # model_args=GPT_4_1, | ||
| # config=config, | ||
| # ), | ||
| ] | ||
|
|
||
| for agent_config in agent_configs: | ||
| agent_config.config.action_subsets = ("workarena",) # use the workarena action set | ||
|
Comment on lines
+28
to
+40
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agent Configuration Should Be Separated
Tell me moreWhat is the issue?Agent configuration is mixed with experiment setup code, violating the Single Responsibility Principle. Why this mattersMixing configuration with setup code makes it harder to maintain different agent configurations and reduces code reusability across different experiments. Suggested change ∙ Feature PreviewExtract agent configuration into a separate factory or builder class: class AgentConfigFactory:
@staticmethod
def create_workarena_config(base_config):
config = deepcopy(base_config)
config.action_subsets = ("workarena",)
return ToolUseAgentArgs(model_args=GPT_4_1, config=config)
agent_configs = [AgentConfigFactory.create_workarena_config(DEFAULT_PROMPT_CONFIG)]Provide feedback to improve future suggestions💬 Looking for more details? Reply to this comment to chat with Korbit. |
||
|
|
||
|
|
||
| # ## select the benchmark to run on | ||
| # benchmark = "miniwob_tiny_test" | ||
| benchmark = "workarena_l1" | ||
|
|
||
|
|
||
| benchmark = bgym.DEFAULT_BENCHMARKS[benchmark](n_repeats=4) # type: bgym.Benchmark | ||
| benchmark = benchmark.subset_from_glob("task_name", "*create*") | ||
|
|
||
| # for env_args in benchmark.env_args_list: | ||
| # print(env_args.task_name) | ||
| # env_args.max_steps = 15 | ||
|
|
||
| relaunch = False | ||
|
|
||
| ## Number of parallel jobs | ||
| n_jobs = 10 # Make sure to use 1 job when debugging in VSCode | ||
| parallel_backend = "ray" | ||
|
Comment on lines
+58
to
+59
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Static Parallel Job Configuration
Tell me moreWhat is the issue?Hard-coded number of parallel jobs without considering system resources (CPU cores, memory) could lead to suboptimal performance. Why this mattersSetting a fixed number of jobs might either underutilize available resources or overload the system, causing overhead from context switching and memory pressure. Suggested change ∙ Feature PreviewUse system information to determine optimal job count: from multiprocessing import cpu_count
n_jobs = min(cpu_count(), 10) # Cap at 10 but adjust based on available coresProvide feedback to improve future suggestions💬 Looking for more details? Reply to this comment to chat with Korbit. |
||
| # parallel_backend = "sequential" # activate sequential backend for debugging in VSCode | ||
|
|
||
| if __name__ == "__main__": # necessary for dask backend | ||
|
|
||
| if relaunch: | ||
| # relaunch an existing study | ||
| study = Study.load_most_recent(contains=None) | ||
| study.find_incomplete(include_errors=True) | ||
|
|
||
| else: | ||
| study = Study(agent_configs, benchmark, logging_level_stdout=logging.WARNING) | ||
|
|
||
| study.run( | ||
| n_jobs=n_jobs, | ||
| parallel_backend=parallel_backend, # "ray", "joblib" or "sequential" | ||
| strict_reproducibility=False, | ||
| n_relaunch=3, | ||
| ) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basic root logger configuration
Tell me more
What is the issue?
Using root logger without a proper logger configuration (no handler, formatter or name)
Why this matters
Without proper logger configuration, logs may lack contextual information like timestamps and source, making debugging production issues difficult.
Suggested change ∙ Feature Preview
Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.