Skip to content

Expose include_text_files in CLI and MCP server #83

@cdarius0

Description

@cdarius0

Summary

The SembleIndex.from_path() SDK method has an include_text_files: bool = False parameter that, when enabled, indexes .md / .yaml / .json / .toml alongside code. The CLI (src/semble/cli.py:79) and MCP server (src/semble/mcp.py) both call SembleIndex.from_path(source) without forwarding this parameter, so prose-heavy repos are invisible to CLI and MCP queries.

Concrete impact

Indexing a personal knowledge-base / agent-playbook repo where ~80% of content is markdown practice docs and skill READMEs. The default CLI returned almost nothing useful for queries like "lethal trifecta" or "skill capability frontmatter"; switching to the SDK with include_text_files=True returned the canonical doc as the top hit on both. The MCP server inherits the same gap, so sub-agents using the recommended semble init template hit the same blind spot on prose-heavy projects.

Proposed change

  • CLI: add --include-text-files (or --include-docs) boolean flag to search, find-related, and init. Default False so code-only repos see no behaviour change.
  • MCP server: expose the same flag as a tool parameter, or a server-startup env var (SEMBLE_INCLUDE_TEXT_FILES=1) for users who index the same repo every session.

Workaround in use today

A short SDK wrapper script bridges the gap, but it duplicates the CLI's argument parsing and result-rendering. Happy to send a PR if you'd take the change — just wanted to confirm the design preference (flag vs. env vs. both) before opening one.

Thanks for shipping semble — the static-embeddings + RRF + code-aware reranking stack is exactly the right shape for this use case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions