Skip to content

Conversation

@AyanSinhaMahapatra
Copy link
Member

@AyanSinhaMahapatra AyanSinhaMahapatra commented Nov 17, 2025

This PR improve package scan performance by....

References:

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Use multiregex to use a cached regex path patterns and
datafile handlers mapping to detect package datafiles faster.

Reference: #4064
Reference: #4061
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
@AyanSinhaMahapatra AyanSinhaMahapatra marked this pull request as draft November 17, 2025 09:49
@AyanSinhaMahapatra AyanSinhaMahapatra changed the title Fast package scan Improve package scan performance Nov 17, 2025
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
@AyanSinhaMahapatra AyanSinhaMahapatra marked this pull request as ready for review November 19, 2025 09:47
Introduce a new option --binary-packages which looks for
package/dependency data in binaries.

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
We do not need the license index in a --package-only scan
as this is designed to do a fast package detection only scan
which skips the license detection. As license index loading
takes a couple seconds in each case, this makes the
package only scan much faster.

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
--system-package Scan ``<input>`` for installed system package
databases.

-b, --binary-package Scan <input> for package and dependency related
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this:

--package-in-exec Scan compiled executable binaries such as ELF, WinpE and Mach-O files, looking for structured package and dependency metadata as found for example in Go and Rust binaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or

--package-in-compiled Scan compiled executable binaries such as ELF, WinpE and Mach-O files, looking for structured package and dependency metadata as found for example in Go and Rust compiled binaries.


./configure --dev
venv/bin/scancode-reindex-licenses
venv/bin/scancode-cache-package-patterns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about naming this venv/bin/scancode-reindex-package-patterns to be consistent?



# These handlers are special as they use filetype to
# detect these binaries instead of datafile path patterns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# detect these binaries instead of datafile path patterns
# detect these compiled executable binaries instead of datafile path patterns

PACKAGE_INDEX_DIR = 'package_patterns_index'
PACKAGE_INDEX_FILENAME = 'index_cache'
PACKAGE_LOCKFILE_NAME = 'scancode_package_index_lockfile'
PACKAGE_CHECKSUM_FILE = 'scancode_package_index_tree_checksums'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used anymore (also should be dropped from licensing)

Suggested change
PACKAGE_CHECKSUM_FILE = 'scancode_package_index_tree_checksums'

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some nits for your consideration!

--system-package Scan ``<input>`` for installed system package
databases.

-b, --binary-package Scan <input> for package and dependency related
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or

--package-in-compiled Scan compiled executable binaries such as ELF, WinpE and Mach-O files, looking for structured package and dependency metadata as found for example in Go and Rust compiled binaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants