Sharing Things

GitHub pages & the R Markdown & quarto family

You have been using it all week!! and before!! All the course material has been developed as an R Markdown based website, distill website to be precise (see here for more), and it is also a great way to publish and share your note book with your team and the broader community. Also checkout Quarto that is a cross-language tool to render documents from code!

Interactive web applications

When you have complex data that you want to not only visualize but also to let the user interact with your data or customize parameters used in an analysis, interactive web applications are a great way to increase engagement. The good news is that you do not need to be a web developer anymore to spin such applications or dashboards.

R Shiny

R Shiny is a very interesting framework that lets you write R code that will then be translated into javascript for you and thus let you develop web application without having to learn any new programming language. Note that you will need a server to host the application.

Check out this gallery of shiny apps: https://shiny.rstudio.com/gallery/

Getting started with Shiny: https://shiny.rstudio.com/tutorial/

Html widgets

Html widgets offer some great interactive data visualization. It is more limited that Shiny because you can not modify parameters to modify the data use, but it has the advantage that it does not need a server to run the widget and it can be inserted directly into an R Markdown.

Here to get started: https://www.htmlwidgets.org/

plotly

plotly enable you to develop great interactive data visualizations. It has the advantage to be available both in R and Python (and javascript). One great thing in R is that if you are a ggplot master, you can write your plot code using ggplot and transform it into a plotly plot with one line of code (here for an example)

Note there are also other python libraries to create interactive plots, here are a few: https://mode.com/blog/python-interactive-plot-libraries/

Code Repositories

GitHub
GitLab (open source!!): https://about.gitlab.com/
Bitbucket: https://bitbucket.org/product
…

Don’t forget to provide information about the versions of software and libraries that were used when running this specific analysis

It is also a great idea to add license to your project so people know how they can use your code: https://choosealicense.com/

Binder & jupyter

Transform your git based repo into an interactive jupyter notebook https://mybinder.org/!! So other researchers can run your code without having to install anything!

Try it: https://mybinder.org/

Citing your code

Note that it is also possible to assign a DOI to cite a specific version of your repository. For example, see here for more information on how to link Zenodo and GitHub.

Data Repositories

As we discussed earlier, code repositories are note necessarily the best home for your data sets, especially if their format is not text based and if their size is large (>100MB). In addition data repositories offer better support to metadata standard that will help you to describe your data and thus make them more discoverable. Like code repositories, data repositories will version your data creating an history of your data sets that you can navigate. In addition, most of the data repositories will also offer to mint a DOI to cite your data (to be precise a specific version of your data) in a convenient and non ubiquitous way.

We will talk more in depth about data repositories in the Fall, but for now we will mention to entryways to environmental data federation:

DataONE: a federation of data repositories, https://www.dataone.org/
re3data: a registry of research data repositories, https://www.re3data.org/

Starting by searching data in your field and see where other researchers are archiving their data is often a great way to determine which data repository could be a good home for your own data.

You as an Author

It is important to be able to reference to yourself as a researcher and as an author of your work in a non ambiguous manner. From their website: ORCID is a great way to create a persistent digital identifier (an ORCID iD) that you own and control, and that distinguishes you from every other researcher. ORCID is also more and more use as an authentication system for many services (e.g. data repositories).

What about your computing environment

Session info

Your analysis was done with specific versions both of the program used but also of all the packages involved, as well as the specifications of Operating System (OS) that was used. The good use is that there ar tools to let you capture this information in a systematic manner.

sessionInfo()

R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.3.1    fastmap_1.2.0     cli_3.6.3        
 [5] tools_4.3.1       htmltools_0.5.8.1 yaml_2.3.10       rmarkdown_2.28   
 [9] knitr_1.48        jsonlite_1.8.8    xfun_0.47         digest_0.6.37    
[13] rlang_1.1.4       evaluate_0.24.0

or even better:

devtools::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16)
 os       Ubuntu 22.04.4 LTS
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  C.UTF-8
 ctype    C.UTF-8
 tz       UTC
 date     2024-09-04
 pandoc   2.9.2.1 @ /usr/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 cachem        1.1.0   2024-05-16 [1] CRAN (R 4.3.1)
 cli           3.6.3   2024-06-21 [1] CRAN (R 4.3.1)
 devtools      2.4.5   2022-10-11 [1] any (@2.4.5)
 digest        0.6.37  2024-08-19 [1] CRAN (R 4.3.1)
 ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.3.1)
 evaluate      0.24.0  2024-06-10 [1] CRAN (R 4.3.1)
 fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.3.1)
 fs            1.6.4   2024-04-25 [1] CRAN (R 4.3.1)
 glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.1)
 htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
 htmlwidgets   1.6.4   2023-12-06 [1] CRAN (R 4.3.1)
 httpuv        1.6.15  2024-03-26 [1] CRAN (R 4.3.1)
 jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.3.1)
 knitr         1.48    2024-07-07 [1] CRAN (R 4.3.1)
 later         1.3.2   2023-12-06 [1] CRAN (R 4.3.1)
 lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.1)
 magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.1)
 memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.1)
 mime          0.12    2021-09-28 [1] CRAN (R 4.3.1)
 miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.1)
 pkgbuild      1.4.4   2024-03-17 [1] CRAN (R 4.3.1)
 pkgload       1.4.0   2024-06-28 [1] CRAN (R 4.3.1)
 profvis       0.3.8   2023-05-02 [1] CRAN (R 4.3.1)
 promises      1.3.0   2024-04-05 [1] CRAN (R 4.3.1)
 purrr         1.0.2   2023-08-10 [1] CRAN (R 4.3.1)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.1)
 Rcpp          1.0.13  2024-07-17 [1] CRAN (R 4.3.1)
 remotes       2.5.0   2024-03-17 [1] CRAN (R 4.3.1)
 rlang         1.1.4   2024-06-04 [1] CRAN (R 4.3.1)
 rmarkdown     2.28    2024-08-17 [1] any (@2.28)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.1)
 shiny         1.9.1   2024-08-01 [1] CRAN (R 4.3.1)
 stringi       1.8.4   2024-05-06 [1] CRAN (R 4.3.1)
 stringr       1.5.1   2023-11-14 [1] CRAN (R 4.3.1)
 urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.1)
 usethis       3.0.0   2024-07-29 [1] CRAN (R 4.3.1)
 vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.1)
 xfun          0.47    2024-08-17 [1] CRAN (R 4.3.1)
 xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.1)
 yaml          2.3.10  2024-07-26 [1] CRAN (R 4.3.1)

 [1] /home/runner/work/_temp/Library
 [2] /opt/R/4.3.1/lib/R/site-library
 [3] /opt/R/4.3.1/lib/R/library

──────────────────────────────────────────────────────────────────────────────

You can save all this content to an session_info.txt file and upload it to your repository.

In python, using pip freeze > requirements.txt or conda list --export > requirements.txt will create a text file listing all the libraries (and their versions) used in a specific python environment. You can actually use this file to (re)install all the packages and specific versions into a new python environment. It is also great practice to add this file to your repository.

Containers

A helpful abstraction for capturing the computing environment is a container, whereby a container is created from a set of instructions in a recipe. For the most common containerisation software, Docker, this recipe is called a Dockerfile. Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure and ship the containers to others. A Docker container can be seen as a computer inside your computer.

http://jsta.github.io/r-docker-tutorial/

A few good readings:

Docker for scientific reproducibility: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008316
The Whole Tale project: combining containers with data repositories https://wholetale.org/
Docker tutorial: http://jsta.github.io/r-docker-tutorial/
Environment Management with Docker: https://environments.rstudio.com/docker
Sharing and Running R code using Docker: https://aboland.ie/Docker.html

Code friendly Presentations

Xarigan

Xarigan is an R package to create slide deck using R Markdown: https://github.com/yihui/xaringan

remotes::install_github('yihui/xaringan')

Here is a good introduction to it: https://www.favstats.eu/post/xaringan_tut/

Quarto Presentations

https://quarto.org/docs/presentations/

https://meghan.rbind.io/blog/quarto-slides/