Shixiang Wang

>上士闻道
勤而行之

contribution - A Tiny Contribution Table Generator Based on ggplot2

王诗翔 · 2019-07-13

Categories: r  
Tags: r   package   contribution  

The goal of contribution is to generate contribution table for credit assignment in a project. This is inspired by Nick Steinmetz (see twitter https://twitter.com/SteinmetzNeuro/status/1147241138291527681).

Authors contributions sections are new and still don’t appear in major journals, but can be improved. Rather than text listing each author’s contributions, the same data can be presented as a table with rows corresponding to contributions and columns for each author.

Advantages include:

  1. Graphics are faster and easier to comprehend. They are also easier to locate in the document.

  2. This format can be readily augmented with additional information, such as distinguishing between ‘major’ and ‘minor’ contributions of each type.

  3. This format can be read as easily in either direction, answering both “Who did X?” and “What did person Y do?” equally.

  4. This format lends itself to a nice extension for including your author contribution information on your CV: a similar table where each column is one of your papers.

A difficulty with this is the diversity of terminology used, which would be helped by refinement and more widespread adoption of the CRediT framework (https://www.casrai.org/credit.html ).

Table of Contents

Feature

Installation

You can install the released version of contribution from CRAN with:

install.packages("contribution")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("openbiox/contribution")

Basic example

This is a basic example which shows you how to plot a simple contribution table:

library(contribution)
data("demo")
demo
#> # A tibble: 5 x 4
#>   Class              WWP   NAS   TM   
#>   <chr>              <chr> <chr> <chr>
#> 1 Designed research  <NA>  Major Major
#> 2 Performed research <NA>  Major <NA> 
#> 3 Developed tool     Major Minor <NA> 
#> 4 Analyzed data      Major Minor Minor
#> 5 Wrote the paper    Major Minor Major

The accepted data format is a data.frame whose first column show the role and the other columns show the people or projects. This format is easy to create using R or other tools like Excel.

For a 3-level contribution table, only Minor and Major are valid, a NA value should put in cell for no contribution.

generate(demo)

The white box represents no contribution, the grey box represents minor contribution, and the black box represents major contribution.

The table is nice, easy to read. The result is a ggplot object, so you can modify in your way!

You can also use other colors and scale_fill_* function from ggplot2 to map colors:

library(ggplot2)
generate(demo, text_angle_x = 20, color_map = scale_fill_brewer(palette ="Oranges"))

When it is not easy to see the meaning of color, you can show the legend.

generate(demo, text_angle_x = 20, color_map = scale_fill_brewer(palette ="Set1"), show_legend = TRUE)

GitHub contribution

To show how to generate numeric contribution table, here we use commit number of GitHub project as an example.

To get data from GitHub, I created a function pull_github().

# Github project contributions
my_contr = dplyr::tibble(
           repo = c("UCSCXenaTools", "maftools"),
           owner = c("ShixiangWang", "PoisonAlien"),
           username = "ShixiangWang",
           role = c("Developer", "Contributor"))
my_contr
#> # A tibble: 2 x 4
#>   repo          owner        username     role       
#>   <chr>         <chr>        <chr>        <chr>      
#> 1 UCSCXenaTools ShixiangWang ShixiangWang Developer  
#> 2 maftools      PoisonAlien  ShixiangWang Contributor
contr_tb = pull_github(data = my_contr)
contr_tb
#> # A tibble: 2 x 3
#>   role        `PoisonAlien/maftools` `ShixiangWang/UCSCXenaTools`
#>   <chr>                        <int>                        <int>
#> 1 Contributor                      8                            0
#> 2 Developer                        0                          203

Now we have data format the generate() function can accept, show the plot!

generate(contr_tb, show_legend = TRUE, hjust_x = 0)

Palette

To change colors of contribution graph, I generated a dataset palette according to themes provided by project github-contribution-color-graph.

We can show the palette and choose the color we like.

show_palette()

Just put the name to option palette_name.

generate(contr_tb, show_legend = TRUE, hjust_x = 0,
         palette_name = "psychedelic")

generate(contr_tb, show_legend = TRUE, hjust_x = 0,
         palette_name = "cyan")

More custom options please run ?generate in R console.

CRediT

CRediT can help assign the role, this data has been stored in this package.

data("CRediT")

Show whole table.

knitr::kable(CRediT, format = "html")
Role Definition
Conceptualization Ideas; formulation or evolution of overarching research goals and aims.
Data curation Management activities to annotate (produce metadata), scrub data and maintain research data (including software code, where it is necessary for interpreting the data itself) for initial use and later re-use.
Formal analysis Application of statistical, mathematical, computational, or other formal techniques to analyse or synthesize study data.
Funding acquisition Acquisition of the financial support for the project leading to this publication.
Investigation Conducting a research and investigation process, specifically performing the experiments, or data/evidence collection.
Methodology Development or design of methodology; creation of models.
Project administration Management and coordination responsibility for the research activity planning and execution.
Resources Provision of study materials, reagents, materials, patients, laboratory samples, animals, instrumentation, computing resources, or other analysis tools.
Software Programming, software development; designing computer programs; implementation of the computer code and supporting algorithms; testing of existing code components.
Supervision Oversight and leadership responsibility for the research activity planning and execution, including mentorship external to the core team.
Validation Verification, whether as a part of the activity or separate, of the overall replication/reproducibility of results/experiments and other research outputs.
Visualization Preparation, creation and/or presentation of the published work, specifically visualization/data presentation.
Writing – original draft Preparation, creation and/or presentation of the published work, specifically writing the initial draft (including substantive translation).
Writing – review & editing Preparation, creation and/or presentation of the published work by those from the original research group, specifically critical review, commentary or revision – including pre- or post-publication stages.

More about GitHub stats

contribution package supports the following GitHub stats:

Firstly define our targets.

library(contribution)

my_contr = dplyr::tibble(
           repo = c("UCSCXenaTools", "XenaShiny", "metawho", 
                    "maftools", "DoAbsolute", "contribution"),
           owner = c("ropensci", "openbiox", "ShixiangWang",
                     "PoisonAlien", "ShixiangWang", "openbiox"),
           username = "ShixiangWang",
           role = c("Developer", "Project leader", "Developer",
                    "Contributor", "Developer", "Developer"))
my_contr
#> # A tibble: 6 x 4
#>   repo          owner        username     role          
#>   <chr>         <chr>        <chr>        <chr>         
#> 1 UCSCXenaTools ropensci     ShixiangWang Developer     
#> 2 XenaShiny     openbiox     ShixiangWang Project leader
#> 3 metawho       ShixiangWang ShixiangWang Developer     
#> 4 maftools      PoisonAlien  ShixiangWang Contributor   
#> 5 DoAbsolute    ShixiangWang ShixiangWang Developer     
#> 6 contribution  openbiox     ShixiangWang Developer

Commit number

At default, pull_github() function will pull commit number from GitHub API.

contr_tb = pull_github(data = my_contr)
contr_tb
#> # A tibble: 3 x 7
#>   role  `openbiox/contr… `openbiox/XenaS… `PoisonAlien/ma… `ropensci/UCSCX…
#>   <chr>            <int>            <int>            <int>            <int>
#> 1 Cont…                0                0                8                0
#> 2 Deve…                0                0                0              203
#> 3 Proj…                0                0                0                0
#> # … with 2 more variables: `ShixiangWang/DoAbsolute` <int>,
#> #   `ShixiangWang/metawho` <int>
generate(contr_tb, show_legend = TRUE, hjust_x = 0,
         palette_name = "indigo", text_angle_x = 60)

Number of code line

If we want to get number of code line, we can set the type option.

# Total number
contr_tb1 = pull_github(data = my_contr, report_lines = T,  type = 'all')
contr_tb1
#> # A tibble: 3 x 7
#>   role  `openbiox/contr… `openbiox/XenaS… `PoisonAlien/ma… `ropensci/UCSCX…
#>   <chr>            <int>            <int>            <int>            <int>
#> 1 Cont…                0                0              246                0
#> 2 Deve…                0                0                0           127542
#> 3 Proj…                0                0                0                0
#> # … with 2 more variables: `ShixiangWang/DoAbsolute` <int>,
#> #   `ShixiangWang/metawho` <int>

# Number of additions
contr_tb2 = pull_github(data = my_contr, report_lines = T, type = 'add')
contr_tb2
#> # A tibble: 3 x 7
#>   role  `openbiox/contr… `openbiox/XenaS… `PoisonAlien/ma… `ropensci/UCSCX…
#>   <chr>            <int>            <int>            <int>            <int>
#> 1 Cont…                0                0              210                0
#> 2 Deve…                0                0                0            81687
#> 3 Proj…                0            37233                0                0
#> # … with 2 more variables: `ShixiangWang/DoAbsolute` <int>,
#> #   `ShixiangWang/metawho` <int>

# Number of deletions
contr_tb3 = pull_github(data = my_contr, report_lines = T, type = 'del')
contr_tb3
#> # A tibble: 3 x 7
#>   role  `openbiox/contr… `openbiox/XenaS… `PoisonAlien/ma… `ropensci/UCSCX…
#>   <chr>            <int>            <int>            <int>            <int>
#> 1 Cont…                0                0               36                0
#> 2 Deve…                0                0                0            45855
#> 3 Proj…                0            29053                0                0
#> # … with 2 more variables: `ShixiangWang/DoAbsolute` <int>,
#> #   `ShixiangWang/metawho` <int>
generate(contr_tb1, show_legend = TRUE, hjust_x = 0,
         palette_name = "pink", text_angle_x = 60)

generate(contr_tb2, show_legend = TRUE, hjust_x = 0,
         palette_name = "teal", text_angle_x = 60)

generate(contr_tb3, show_legend = TRUE, hjust_x = 0,
         palette_name = "summer", text_angle_x = 60)

GitHub limit and token management

For unauthenticated requests, the rate limit allows for up to 60 requests per hour. For API requests using Basic Authentication or OAuth, you can make up to 5000 requests per hour. Here we use token to manage this. Obtain a personal access token (PAT) from here: https://github.com/settings/tokens.

You can see how many times you left with pull_github_limit() function.

pull_github_limit()
#> {
#>   "resources": {
#>     "core": {
#>       "limit": 60,
#>       "remaining": 25,
#>       "reset": 1596954378
#>     },
#>     "graphql": {
#>       "limit": 0,
#>       "remaining": 0,
#>       "reset": 1596955038
#>     },
#>     "integration_manifest": {
#>       "limit": 5000,
#>       "remaining": 5000,
#>       "reset": 1596955038
#>     },
#>     "search": {
#>       "limit": 10,
#>       "remaining": 10,
#>       "reset": 1596951498
#>     }
#>   },
#>   "rate": {
#>     "limit": 60,
#>     "remaining": 25,
#>     "reset": 1596954378
#>   }
#> }

Typically, you can set token to GITHUB_PAT variable in your .Renviron file using the following format:

GITHUB_PAT=8c70fd8419398999c9ac5bacf3192882193cadf2

You can also set it in your .Rprofile file using the following format:

Sys.setenv(GITHUB_PAT="8c70fd8419398999c9ac5bacf3192882193cadf2")