scikit-learn
#
# For interactive plots
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import TeX
output_notebook()
A snapshot of the development on the scikit-learn project.
Issues#
query_date = np.datetime64("2020-01-01 00:00:00")
# Load data
with open("devstats-data/scikit-learn_issues.json", "r") as fh:
issues = [item["node"] for item in json.loads(fh.read())]
glue("devstats-data/scikit-learn_query_date", str(query_date.astype("M8[D]")))
New issues#
4803 new issues have been opened since 2020-01-01, of which 3791 (79%) have been closed.
The median lifetime of new issues that were created and closed in this period is 121 hours.
query_date = np.datetime64("2020-01-01 00:00:00")
# Load data
with open("devstats-data/scikit-learn_issues.json", "r") as fh:
issues = [item["node"] for item in json.loads(fh.read())]
glue("scikit-learn_query_date", str(query_date.astype("M8[D]")))
Time to response#
Of the 4801 issues that are at least 24 hours old, 4477 (93%) of them have been commented on. The median time until an issue is first responded to is 8 hours.
First responders#
Contributor | # of times commented first | |
---|---|---|
435 | glemaitre | 895 |
745 | thomasjpfan | 370 |
626 | ogrisel | 320 |
281 | adrinjalali | 278 |
168 | NicolasHug | 206 |
536 | lesteve | 205 |
496 | jeremiedbb | 191 |
503 | jnothman | 190 |
674 | rth | 116 |
324 | betatim | 115 |
Pull Requests#
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[10], line 18
4 # Filters
5
6 # The following filters are applied to the PRs for the following analysis:
(...) 15 # Only look at PRs to the main development branch - ignore backports,
16 # gh-pages, etc.
17 default_branches = {"main", "master"} # Account for default branch update
---> 18 prs = [pr for pr in prs if pr["baseRefName"] in default_branches]
20 # Drop data where PR author is unknown (e.g. github account no longer exists)
21 prs = [pr for pr in prs if pr["author"]] # Failed author query results in None
TypeError: 'NoneType' object is not subscriptable
Merged PRs over time#
A look at merged PRs over time.
PR lifetime#
The following plot shows the “survival” of PRs over time. That means, the plot shows how many PRs are open for at least these many days. This is separated into PRs that are merged and those that are still open (closed but unmerged PRs are not included currently).
Mergeability of Open PRs#
Number of PR participants#
Where contributions come from#
There have been a total of merged PRs[1] submitted by unique authors. of these are “fly-by” PRs, i.e. PRs from users who have contributed to the project once (to-date).
Pony factor#
Another way to look at these data is in terms of the pony factor, described as:
The minimum number of contributors whose total contribution constitutes a majority of the contributions.
For this analysis, we will consider merged PRs as the metric for contribution. Considering all merged PRs over the lifetime of the project, the pony factor is: .