Skip to content

Commit 85378f0

Browse files
authored
Merge branch 'main' into fix-duplicate-pr-labels-20250803-143123
2 parents a1cb78f + 5b3f082 commit 85378f0

File tree

8 files changed

+353
-98
lines changed

8 files changed

+353
-98
lines changed

.github/workflows/tests.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ jobs:
3636
include:
3737
# Only test oldest supported and latest python version to reduce
3838
# GitHub API calls, as they can get rate limited
39-
- python-version: 3.9
39+
- python-version: "3.10"
4040
- python-version: 3.x
4141

4242
steps:

README.md

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,23 @@
11
# github-activity
22

3-
Generate simple markdown changelogs for GitHub repositories written in Python.
4-
53
[![continuous-integration](https://github.com/executablebooks/github-activity/actions/workflows/tests.yaml/badge.svg)](https://github.com/executablebooks/github-activity/actions/workflows/tests.yaml)
64

7-
This package provides a CLI to do two things:
8-
9-
**Scrape all GitHub activity over a period of time for a repository**.
10-
Given a GitHub org, repository, an initial git reference or date, use the [GitHub GraphQL API](https://developer.github.com/v4/) to return a Pandas DataFrame of all issue and PR activity for this time period.
11-
12-
**Render this as a markdown changelog**.
13-
Convert this DataFrame to markdown that is suitable for generating changelogs or community updates.
5+
Generate markdown changelogs for GitHub repositories with more control over types of contributions and metadata used to create the changelogs.
146

157
For an example, see the [changelog of this package](<[https://](https://github-activity.readthedocs.io/en/latest/changelog)>).
168

17-
## Use this tool
9+
See [the GitHub Activity Documentation](https://github-activity.readthedocs.io) for our documentation.
10+
11+
## Install and use this tool
1812

19-
Use this tool via the command line like so:
13+
Install and use this tool via the command line like so:
2014

2115
```bash
16+
pip install github-activity
2217
github-activity [<org>/<repo>] --since <date or ref> --until <date or ref>
2318
```
2419

25-
See [the User Guide for details on how to install and use this tool](https://github-activity.readthedocs.io/en/latest/use).
20+
See [the GitHub Activity Documentation](https://github-activity.readthedocs.io) for our documentation.
2621

2722
## Contribute to this package
2823

docs/index.md

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,44 @@
11
# github-activity
22

3-
Generate simple markdown changelogs for GitHub repositories written in Python.
3+
Generate markdown changelogs for GitHub repositories with more control over the kinds of contributions that are included.
44

5-
This package does two things:
5+
GitHub Activity allows you to include more than just "PR author" in your changelogs, such as PR reviewers and issue commenters. This allows you to give credit to a wider group of contributors around your project. Below are some examples, see [](#how-does-this-tool-define-contributions-in-the-reports) for more information.
66

7-
1. Given a GitHub org, repository, an initial git reference or date, use the
8-
[GitHub GraphQL API](https://developer.github.com/v4/) to return a DataFrame
9-
of all issue and PR activity for this time period.
10-
2. A CLI to render this activity as markdown, suitable for generating changelogs or
11-
community updates.
7+
- PR Authors
8+
- PR Reviewers and Mergers
9+
- Issue and PR commenters
1210

13-
## Installation
11+
It also allows you split PRs into sections in your changelog, using either PR labels or PR title metadata (e.g., `[ENH]`). See [](#prefixes-and-tags) for more information.
1412

15-
The easiest way to install this package is to do so directly from GitHub with `pip`:
13+
GitHub Activity uses the [GitHub GraphQL API](https://docs.github.com/en/graphql), along with some basic pagination and caching to efficiently pull data from GitHub.
1614

17-
```
18-
pip install github-activity
15+
```{seealso}
16+
See [the JupyterHub Team changelog](https://github.com/jupyterhub/jupyterhub/blob/5.3.0/docs/source/reference/changelog.md) for an example of this tool in action.
1917
```
2018

21-
```{toctree}
22-
use
23-
contribute
24-
changelog
25-
```
19+
## Installation and basic usage
20+
21+
The easiest way to install this package is to do so directly from GitHub with `pip`:
2622

27-
(how-does-this-tool-define-contributions-in-the-reports)=
23+
```bash
24+
pip install github-activity
25+
```
2826

29-
## How we define contributors in the reports
27+
You can then use it like so:
3028

31-
GitHub Activity tries to automatically determine the unique list of contributors within a given window of time.
32-
There are many ways to define this, and there isn't necessarily a "correct" method out there.
29+
```bash
30+
github-activity [<org>/<repo>] --since <date or ref> --until <date or ref>
31+
```
3332

34-
We try to balance the two extremes of "anybody who shows up is recognized as contributing" and "nobody is recognized as contributing".
35-
We've chosen a few rules that try to reflect sustained engagement in issues/PRs, or contributions in the form of help in _others'_ issues or contributing code.
33+
## Why use this tool?
3634

37-
Here are the rules we follow for finding a list of contributors within a time window. A contributor is anyone who has:
35+
We created `github-activity` because there is a lot that goes into building open source tools than just making a pull request. This tool tries to surface more diverse contributions around a release, like reviews, comments, etc. It tries to paint a more complete picture of all the work that goes into building open source software.
3836

39-
- Contributed to a PR merged in that window (includes opening, merging, committing, commenting, or committing)
40-
- Commented on >= 2 issues that weren't theirs
41-
- Commented >= 6 times on any one issue
42-
- Known bot accounts are generally not considered contributors
37+
You might want to use this tool if you're hoping to give credit and attribution to more people in your open source community. This gives your community a feeling of more appreciation, and can create more incentives for others to contribute.
4338

44-
We'd love feedback on whether this is a good set of rules to use.
39+
```{toctree}
40+
:maxdepth: 2
41+
use
42+
contribute
43+
changelog
44+
```

docs/use.md

Lines changed: 144 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# User guide
22

3+
This tool has two main user interfaces:
4+
5+
1. **A python library**: Given a GitHub org, repository, an initial git reference or date, use the [GitHub GraphQL API](https://developer.github.com/v4/) to return a DataFrame of all issue and PR activity for this time period.
6+
2. **A Command Line Interface** to render this activity as markdown, suitable for generating changelogs or community updates.
7+
38
These sections describe how to control the major functionality of this tool.
49

510
## Generate a markdown changelog
@@ -19,8 +24,7 @@ The `[<org>/<repo>]` argument is **optional**.
1924
If you do not give it, then `github-activity` will attempt to infer this value by running `git remote -v` and using either `upstream` or `origin` (preferring `upstream` if both are available).
2025

2126
The (optional) arguments in `--since` (or `-s`) and `--until` (or `-u`) can either be
22-
a date, or a ref (such as a commit hash or tag). `github-activity` will pull the activity
23-
between the dates corresponding to these values.
27+
a date, or a ref (such as a commit hash or tag). `github-activity` will pull the activity between the dates corresponding to these values.
2428

2529
```{margin}
2630
There are many other options with the `github-activity` CLI, run `github-activity -h`
@@ -41,7 +45,22 @@ You can find the [resulting markdown here](sample_notebook_activity).
4145
For repositories that use multiple branches, it may be necessary to filter PRs by a branch name. This can be done using the `--branch` parameter in the CLI. Other git references can be used as well in place of a branch name.
4246
```
4347

44-
### Split PRs by tags and prefixes
48+
## Choose a date or a tag to filter activity
49+
50+
By default, `github-activity` will pull the activity _after_ the latest GitHub release or git tag. You can choose to manually control the date ranges as well.
51+
52+
To specify a **start date**, use the `-s` (or `--since`) parameter. To specify an **end date**, use the `-u` or `--until` parameter.
53+
54+
Each of these accepts either:
55+
56+
1. A date string. This can be anything that [`dateutil.parser.parse`](https://dateutil.readthedocs.io/en/stable/parser.html) accepts.
57+
2. A git `ref`. For example, a `commit hash` or a `tag`.
58+
59+
If no `-u` parameter is given, then all activity until today will be included.
60+
61+
(prefixes-and-tags)=
62+
63+
## Split PRs by tags and prefixes
4564

4665
Often you wish to split your PRs into multiple categories so that they are easier
4766
to scan and parse. You may also _only_ want to keep some PRs (e.g. features, or API
@@ -68,8 +87,70 @@ left-most column above.
6887

6988
If a pull request has multiple labels that match different categories, it will appear in **only the first matching section** based on the order of categories processed. For example, a PR labeled with both `api-change` and `enhancement` will appear only in the "API and Breaking Changes" section, not in "Enhancements made". The categories are processed in the same order as they show above.
7089

90+
## Include Pull Request reviewers and commenters in your changelog
91+
92+
By default, GitHub Activity will include anybody that _reviews_ or _comments_ in a pull request in the item for that PR. This is included in a list of authors at the end of each item. See [the JupyterHub Changelog](https://jupyterhub.readthedocs.io/en/stable/reference/changelog.html) for examples.
93+
94+
## Include a list of all contributors at the end of your changelog
95+
96+
By default, this tool will include a long list of contributors at the end of your changelog. This is the unique set of all contributors that contributed to the release.
97+
98+
(how-does-this-tool-define-contributions-in-the-reports)=
99+
100+
### How we define contributors in a changelog
101+
102+
GitHub Activity tries to automatically determine the unique list of contributors within a given window of time.
103+
There are many ways to define this, and there isn't necessarily a "correct" method out there.
104+
105+
We try to balance the two extremes of "anybody who shows up is recognized as contributing" and "nobody is recognized as contributing".
106+
We've chosen a few rules that try to reflect sustained engagement in issues/PRs, or contributions in the form of help in _others'_ issues or contributing code.
107+
108+
Here are the rules we follow for finding a list of contributors within a time window. A contributor is anyone who has:
109+
110+
- Contributed to a PR merged in that window (includes opening, merging, committing, or commenting)
111+
- Commented on >= 2 issues that weren't theirs
112+
- Commented >= 6 times on any one issue
113+
- Known bot accounts are generally not considered contributors
114+
115+
We'd love feedback on whether this is a good set of rules to use.
116+
117+
## Strip PR type metadata from the changelog titles
118+
119+
If you follow the [title prefix convention used to split PRs](#prefixes-and-tags), you can remove these prefixes when you generate your changelog, so that they don't clutter the output.
120+
121+
To strip title prefix metadata, use the `--strip-brackets` flag.
122+
123+
For example, `[DOC] Add some documentation` will be rendered as `Add some documentation`.
124+
125+
## Change the heading level for your changelog items
126+
127+
To change the starting heading level for changelog items, use the `--heading-level N` flag. Where `N` is the starting heading level (e.g., `2` corresponds to `##`).
128+
129+
This is useful if you want to _embed_ your changelog into a larger one (e.g., `CHANGELOG.md`).
130+
131+
## Include issues in your changelog
132+
133+
To include closed issues in your changelog, use the `--include-issues` flag.
134+
135+
## Include opened issues in your changelog
136+
137+
To include Issues and Pull Requests that were _opened_ in a time period, use the `--include-opened` flag.
138+
71139
(use:token)=
72140

141+
## Remove bots from the changelog
142+
143+
`github-activity` ships with a known list of bot usernames, but your project may use ones not on our list.
144+
To ignore additional usernames from the changelog, use the `--ignore-contributor` flag:
145+
146+
```
147+
github-activity ... --ignore-contributor robot-one --ignore-contributor 'robot-two*'
148+
```
149+
150+
Wildcards are matched as per [filename matching semantics](https://docs.python.org/3/library/fnmatch.html#fnmatch.fnmatch).
151+
152+
If this is a generic bot username, consider contributing it back to [our list](https://github.com/executablebooks/github-activity/blob/main/github_activity/github_activity.py#L73).
153+
73154
## Use a GitHub API token
74155

75156
`github-activity` uses the GitHub API to pull information about a repository's activity.
@@ -106,3 +187,63 @@ To do so, follow these steps:
106187
- Assign the token to an environment variable called `GITHUB_ACCESS_TOKEN`.
107188
If you run `github-activity` and this variable is defined, it will be used.
108189
You may also pass a token via the `--auth` parameter (though this is not the best security practice).
190+
191+
## Use the Python API
192+
193+
You can do most of the above from Python as well.
194+
This is not as well-documented as the CLI, but should have most functionality available.
195+
196+
### Generate markdown changelogs with the Python API
197+
198+
For generating markdown changelogs from Python, here's an example:
199+
200+
```
201+
from github_activity import generate_activity_md
202+
203+
markdown = generate_activity_md(
204+
target="executablebooks/github-activity",
205+
since="2023-01-01",
206+
until="2023-12-31",
207+
kind=None,
208+
auth="your-github-token",
209+
tags=None,
210+
include_issues=True,
211+
include_opened=True,
212+
strip_brackets=True,
213+
heading_level=1,
214+
branch=None,
215+
)
216+
217+
# Print or save the markdown
218+
print(markdown)
219+
```
220+
221+
### Return GitHub Activity queries as a DataFrame
222+
223+
For scraping GitHub and returning the data as a DataFrame, here's an example:
224+
225+
```python
226+
from github_activity import get_activity
227+
228+
# Get activity data as a DataFrame
229+
from github_activity import get_activity
230+
231+
df = get_activity(
232+
target="executablebooks/github-activity",
233+
since="2023-01-01",
234+
until="2023-12-31",
235+
auth="your-github-token",
236+
kind=None,
237+
cache=None
238+
)
239+
```
240+
241+
In some cases, metadata will be nested inside the resulting dataframe.
242+
There are some helper functions for this. For example, to extract nested comments inside the activity dataframe:
243+
244+
```python
245+
from github_activity import get_activity, extract_comments
246+
247+
df = get_activity(...)
248+
comments_df = extract_comments(df['comments'])
249+
```

github_activity/cli.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
"include-opened": False,
2222
"strip-brackets": False,
2323
"all": False,
24+
"ignore-contributor": [],
2425
}
2526

2627
parser = argparse.ArgumentParser(description=DESCRIPTION)
@@ -130,6 +131,11 @@
130131
action="store_true",
131132
help=("""Whether to include all the GitHub tags"""),
132133
)
134+
parser.add_argument(
135+
"--ignore-contributor",
136+
action="append",
137+
help="Do not include this GitHub username as a contributor in the changelog",
138+
)
133139

134140
# Hidden argument so that target can be optionally passed as a positional argument
135141
parser.add_argument(
@@ -214,6 +220,7 @@ def main():
214220
include_opened=bool(args.include_opened),
215221
strip_brackets=bool(args.strip_brackets),
216222
branch=args.branch,
223+
ignored_contributors=args.ignore_contributor,
217224
)
218225

219226
if args.all:

0 commit comments

Comments
 (0)