Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/commoncode-release.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Create library release archives, create a GH release and publish PyPI wheel and sdist on tag in main branch
name: Create and release commoncode wheels on GitHub and Pypi


# This is executed automatically on a tag in the main branch
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/licensedcode-data-index-release.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Create library release archives, create a GH release and publish PyPI wheel and sdist on tag in main branch
name: Create and release licensedcode index & data wheels on GitHub and Pypi


# This is executed automatically on a tag in the main branch
Expand Down
14 changes: 14 additions & 0 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,20 @@ jobs:
venv/bin/scancode -i --verbose samples/ -n3 --json foo.json;
done

################################################################################
# Tests with released commoncode instead of local editable commoncode
################################################################################

- template: etc/ci/azure-posix.yml
parameters:
job_name: ubuntu_test_released_commocode
image_name: ubuntu-22.04
python_versions: ['3.14']
python_architecture: x64
test_suites:
all:
venv/bin/pip uninstall -y commoncode && venv/bin/pip install commoncode && venv/bin/pytest -n 2 -vvs tests/scancode/test_cli.py --reruns 2


################################################################################
# Tests using a plain pip install to get the latest of all wheels
Expand Down
23 changes: 23 additions & 0 deletions commoncode-CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,29 @@
Release notes
=============

Version 32.5.2 - (2026-06-11)
-----------------------------

- Bump version properly.

Version 32.5.1 - (2026-06-11)
-----------------------------

- Minor fix in pyproject.toml to release wheels
to pypi properly.

Version 32.5.0 - (2026-06-11)
-----------------------------

- Merge commoncode back into scancode-toolkit
https://github.com/aboutcode-org/scancode-toolkit/pull/5116

- Add support to create codebase from multiple input paths by
starting codebase walk from these inputs and then ignoring
based on path patterns. Improves codebase and resource
collection and creation performance for multi-path scan inputs
https://github.com/aboutcode-org/scancode-toolkit/pull/5055

Version 32.4.2 - (2025-01-08)
-----------------------------

Expand Down
11 changes: 10 additions & 1 deletion configure
Original file line number Diff line number Diff line change
Expand Up @@ -256,6 +256,15 @@ install_packages() {
$1
}

install_packages_with_local() {
# commoncode is present as dependencies of dependencies and so
# we need to install commoncode from local source first so this
# is tested and not the released commoncode
"$CFG_BIN_DIR/flot" --pyproject pyproject-commoncode.toml
"$CFG_BIN_DIR/pip" install ./dist/commoncode*.whl
install_packages "$CFG_REQUIREMENTS"
}


################################
cli_help() {
Expand Down Expand Up @@ -313,7 +322,7 @@ PIP_EXTRA_ARGS="$PIP_EXTRA_ARGS"
find_python
create_virtualenv "$VIRTUALENV_DIR"
install_packages "$FLOT_REQUIREMENTS"
install_packages "$CFG_REQUIREMENTS"
install_packages_with_local
. "$CFG_BIN_DIR/activate"
"$CFG_BIN_DIR/scancode-train-gibberish-model"

Expand Down
3 changes: 3 additions & 0 deletions configure.bat
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,9 @@ if %ERRORLEVEL% neq 0 (
%PIP_EXTRA_ARGS% ^
%FLOT_REQUIREMENTS%

"%CFG_BIN_DIR%\flot" --pyproject pyproject-commoncode.toml
"%CFG_BIN_DIR%\pip" install ./dist/commoncode*.whl

"%CFG_BIN_DIR%\pip" install ^
--upgrade ^
%CFG_QUIET% ^
Expand Down
95 changes: 95 additions & 0 deletions docs/source/reference/scancode-cli/cli-core-options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -145,3 +145,98 @@ Comparing progress message options

This would scan the file ``samples/levelone/leveltwo/file`` but ignore
``samples/levelone/leveltwo/levelthree/file``

----

.. _cli-ignore-option:

``--ignore <pattern>``
----------------------

In a scan, all files inside the directory specified as an input argument is scanned. But if
there are some files which you don't want to scan, the ``--ignore`` option can be used to do
the same.

**Example**

.. code-block:: shell

scancode --ignore "*.java" samples samples.json

Here, ScanCode ignores files ending with `.java`, and continues with other files as usual.

More information on :ref:`glob-pattern-matching`.

----

.. _cli-config-option:

``--config-file <path>``
------------------------

Path patterns which should be ignored in the scan can also be provided
through a configuration file.

**Example**

.. code-block:: shell

scancode --config-file scancode-config.yaml samples samples.json

.. code-block:: yaml

ignored_patterns:
- '*.java'
- '*/licenses/*'

Here, ScanCode ignores files ending with `.java` and the `licenses` directory,
and continues with other files as usual.

This is also compatible with the `scancode.io configuration file <https://scancodeio.readthedocs.io/en/latest/project-configuration.html#ignored-patterns>`_.

----

.. _glob-pattern-matching:

Glob Pattern Matching
---------------------

All the pre-scan options use pattern matching, so the basics of Glob Pattern Matching is
discussed briefly below.

Glob pattern matching is useful for matching a group of files, by using patterns in their
names. Then using these patterns, files are grouped and treated differently as required.

Here are some rules from the `Linux Manual <http://man7.org/linux/man-pages/man7/glob.7.html>`_
on glob patterns. Refer the same for more detailed information.

A string is a wildcard pattern if it contains one of the characters '?', '*' or '['. Globbing
is the operation that expands a wildcard pattern into the list of pathnames matching the
pattern. Matching is defined by:

- A '?' (not between brackets) matches any single character.

- A '*' (not between brackets) matches any string, including the empty string.

- An expression "[...]" where the first character after the leading '[' is not an '!' matches a
single character, namely any of the characters enclosed by the brackets.

- There is one special convention: two characters separated by '-' denote a range.

- An expression "[!...]" matches a single character, namely any character that is not matched
by the expression obtained by removing the first '!' from it.

- A '/' in a pathname cannot be matched by a '?' or '*' wildcard, or by a range like "[.-0]".

Note that wildcard patterns are not regular expressions, although they are a bit similar.

For more information on glob pattern matching refer these resources:

- `Linux Manual <http://man7.org/linux/man-pages/man7/glob.7.html>`_
- `Wildcard Match Documentation <https://facelessuser.github.io/wcmatch/glob/>`_.

You can also import these Python Libraries to practice UNIX style pattern matching:

- `fnmatch <https://docs.python.org/2/library/fnmatch.html>`_ for File Name matching
- `glob <https://docs.python.org/2/library/glob.html#module-glob>`_ for File Path matching

68 changes: 47 additions & 21 deletions docs/source/reference/scancode-cli/cli-help-text-options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,8 +125,6 @@ The following help text is displayed for ScanCode version 32.0.0:
such that all paths have a common root directory.

pre-scan:
--ignore <pattern> Ignore files matching <pattern>.
--include <pattern> Include files matching <pattern>.
--classify Classify files with flags indicating whether the file is a
legal, readme, test or similar file.
--facet <facet>=<pattern> Add the <facet> to files with a path matching
Expand Down Expand Up @@ -169,11 +167,13 @@ The following help text is displayed for ScanCode version 32.0.0:
at the file and directory level.

core:
--ignore <pattern> Ignore files matching <pattern>.
--timeout <seconds> Stop an unfinished file scan after a timeout in
seconds. [default: 120 seconds]
-n, --processes INT Set the number of parallel processes to use. Disable
parallel processing if 0. Also disable threading if
-1. [default: (number of CPUs)-1]
-c, --config-file FILENAME Path to the configuration file.
-q, --quiet Do not print summary or progress.
-v, --verbose Print progress as file-by-file path instead of a
progress bar. Print verbose scan counters.
Expand Down Expand Up @@ -512,7 +512,7 @@ for ScanCode Version 32.0.0.
--------------------------------------------
Plugin: scancode_post_scan:classify class: summarycode.classify_plugin:FileClassifier
codebase_attributes:
resource_attributes: is_legal, is_manifest, is_readme, is_top_level, is_key_file
resource_attributes: is_legal, is_manifest, is_readme, is_top_level, is_key_file, is_community
sort_order: 4
required_plugins:
options:
Expand Down Expand Up @@ -690,6 +690,19 @@ for ScanCode Version 32.0.0.
- packages


--------------------------------------------
Plugin: scancode_post_scan:todo class: summarycode.todo:AmbiguousDetectionsToDoPlugin
codebase_attributes: todo
resource_attributes: for_todo
sort_order: 3
required_plugins:
options:
help_group: post-scan, name: todo: --todo
help: Summarize scans by providing all ambiguous detections which are todo items and needs manual review.
doc:
Summarize a scan by compiling review items of ambiguous detections.


--------------------------------------------
Plugin: scancode_pre_scan:facet class: summarycode.facet:AddFacet
codebase_attributes:
Expand All @@ -705,21 +718,6 @@ for ScanCode Version 32.0.0.
test vs. data, etc.


--------------------------------------------
Plugin: scancode_pre_scan:ignore class: scancode.plugin_ignore:ProcessIgnore
codebase_attributes:
resource_attributes:
sort_order: 100
required_plugins:
options:
help_group: pre-scan, name: ignore: --ignore
help: Ignore files matching <pattern>.
help_group: pre-scan, name: include: --include
help: Include files matching <pattern>.
doc:
Include or ignore files matching patterns.


--------------------------------------------
Plugin: scancode_scan:copyrights class: cluecode.plugin_copyright:CopyrightScanner
codebase_attributes:
Expand Down Expand Up @@ -761,10 +759,23 @@ for ScanCode Version 32.0.0.
Tag a file as generated.


--------------------------------------------
Plugin: scancode_scan:go_symbol class: go_inspector.plugin:GoSymbolScannerPlugin
codebase_attributes:
resource_attributes: go_symbols
sort_order: 100
required_plugins:
options:
help_group: primary scans, name: go_symbol: --go-symbol
help: Collect Go symbols.
doc:
Scan a Go binary for symbols using GoReSym.


--------------------------------------------
Plugin: scancode_scan:info class: scancode.plugin_info:InfoScanner
codebase_attributes:
resource_attributes: date, sha1, md5, sha256, mime_type, file_type, programming_language, is_binary, is_text, is_archive, is_media, is_source, is_script
resource_attributes: date, sha1, md5, sha256, sha1_git, mime_type, file_type, programming_language, is_binary, is_text, is_archive, is_media, is_source, is_script
sort_order: 0
required_plugins:
options:
Expand All @@ -779,7 +790,7 @@ for ScanCode Version 32.0.0.
Plugin: scancode_scan:licenses class: licensedcode.plugin_license:LicenseScanner
codebase_attributes: license_detections
resource_attributes: detected_license_expression, detected_license_expression_spdx, license_detections, license_clues, percentage_of_license_text
sort_order: 4
sort_order: 5
required_plugins:
options:
help_group: primary scans, name: license: -l, --license
Expand All @@ -804,13 +815,15 @@ for ScanCode Version 32.0.0.
Plugin: scancode_scan:packages class: packagedcode.plugin_package:PackageScanner
codebase_attributes: packages, dependencies
resource_attributes: package_data, for_packages
sort_order: 3
sort_order: 4
required_plugins: scan:licenses
options:
help_group: primary scans, name: package: -p, --package
help: Scan <input> for application package and dependency manifests, lockfiles and related data.
help_group: primary scans, name: system_package: --system-package
help: Scan <input> for installed system package databases.
help_group: primary scans, name: package_in_compiled: --package-in-compiled
help: Scan <input> for package and dependency related data in compiled binaries. Currently supported compiled binaries: Go, Rust.
help_group: primary scans, name: package_only: --package-only
help: Scan for system and application package data and skip license/copyright detection and top-level package creation.
help_group: documentation, name: list_packages: --list-packages
Expand All @@ -821,6 +834,19 @@ for ScanCode Version 32.0.0.
level.


--------------------------------------------
Plugin: scancode_scan:rust_symbol class: rust_inspector.plugin:RustSymbolScannerPlugin
codebase_attributes:
resource_attributes: rust_symbols
sort_order: 100
required_plugins:
options:
help_group: primary scans, name: rust_symbol: --rust-symbol
help: Collect Rust symbols from rust binaries.
doc:
Scan a Rust binary for symbols using blint, lief and symbolic.


--------------------------------------------
Plugin: scancode_scan:urls class: cluecode.plugin_url:UrlScanner
codebase_attributes:
Expand Down
50 changes: 50 additions & 0 deletions docs/source/reference/scancode-cli/cli-post-scan-options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,56 @@ To see all plugins available via command line help, use ``--plugins``.

----

.. _cli-classify-option:

``--classify``
--------------

.. admonition:: Sub-option

The options ``--license-clarity-score`` and ``--tallies-key-files`` are sub-options of
``--classify``. ``--license-clarity-score`` and ``--tallies-key-files`` are Post-Scan
Options.

**Example**

.. code-block:: shell

scancode -clpieu --json-pp sample_facet.json samples --classify

This option makes ScanCode further classify scanned files/directories, to determine whether they
fall in these following categories

- legal
- readme
- top-level
- manifest

A manifest file in computing is a file containing metadata for a group of accompanying
files that are part of a set or coherent unit.

- key-file

A KEY file serves as a keystone element, containing essential
information about a software package — such as its dependencies,
versioning, licensing, and more. It often contains the
``primary-license`` or the overall license of the package, among
other package metadata which are general or ecosystem specific.

As in, to the JSON object of each file scanned, these extra attributes are added.

.. code-block:: json

{
"is_legal": false,
"is_manifest": false,
"is_readme": true,
"is_top_level": true,
"is_key_file": true
}

----

.. _cli-mark-source-option:

``--mark-source``
Expand Down
Loading
Loading