Summary
The SembleIndex.from_path() SDK method has an include_text_files: bool = False parameter that, when enabled, indexes .md / .yaml / .json / .toml alongside code. The CLI (src/semble/cli.py:79) and MCP server (src/semble/mcp.py) both call SembleIndex.from_path(source) without forwarding this parameter, so prose-heavy repos are invisible to CLI and MCP queries.
Concrete impact
Indexing a personal knowledge-base / agent-playbook repo where ~80% of content is markdown practice docs and skill READMEs. The default CLI returned almost nothing useful for queries like "lethal trifecta" or "skill capability frontmatter"; switching to the SDK with include_text_files=True returned the canonical doc as the top hit on both. The MCP server inherits the same gap, so sub-agents using the recommended semble init template hit the same blind spot on prose-heavy projects.
Proposed change
- CLI: add
--include-text-files (or --include-docs) boolean flag to search, find-related, and init. Default False so code-only repos see no behaviour change.
- MCP server: expose the same flag as a tool parameter, or a server-startup env var (
SEMBLE_INCLUDE_TEXT_FILES=1) for users who index the same repo every session.
Workaround in use today
A short SDK wrapper script bridges the gap, but it duplicates the CLI's argument parsing and result-rendering. Happy to send a PR if you'd take the change — just wanted to confirm the design preference (flag vs. env vs. both) before opening one.
Thanks for shipping semble — the static-embeddings + RRF + code-aware reranking stack is exactly the right shape for this use case.
Summary
The
SembleIndex.from_path()SDK method has aninclude_text_files: bool = Falseparameter that, when enabled, indexes.md/.yaml/.json/.tomlalongside code. The CLI (src/semble/cli.py:79) and MCP server (src/semble/mcp.py) both callSembleIndex.from_path(source)without forwarding this parameter, so prose-heavy repos are invisible to CLI and MCP queries.Concrete impact
Indexing a personal knowledge-base / agent-playbook repo where ~80% of content is markdown practice docs and skill READMEs. The default CLI returned almost nothing useful for queries like "lethal trifecta" or "skill capability frontmatter"; switching to the SDK with
include_text_files=Truereturned the canonical doc as the top hit on both. The MCP server inherits the same gap, so sub-agents using the recommendedsemble inittemplate hit the same blind spot on prose-heavy projects.Proposed change
--include-text-files(or--include-docs) boolean flag tosearch,find-related, andinit. DefaultFalseso code-only repos see no behaviour change.SEMBLE_INCLUDE_TEXT_FILES=1) for users who index the same repo every session.Workaround in use today
A short SDK wrapper script bridges the gap, but it duplicates the CLI's argument parsing and result-rendering. Happy to send a PR if you'd take the change — just wanted to confirm the design preference (flag vs. env vs. both) before opening one.
Thanks for shipping semble — the static-embeddings + RRF + code-aware reranking stack is exactly the right shape for this use case.