GitHub pages & the R Markdown & quarto family
You have been using it all week!! and before!! All the course material has been developed as an R Markdown based website, distill website to be precise (see here for more), and it is also a great way to publish and share your note book with your team and the broader community. Also checkout Quarto that is a cross-language tool to render documents from code!
Interactive web applications
When you have complex data that you want to not only visualize but also to let the user interact with your data or customize parameters used in an analysis, interactive web applications are a great way to increase engagement. The good news is that you do not need to be a web developer anymore to spin such applications or dashboards.
R Shiny
R Shiny is a very interesting framework that lets you write R code that will then be translated into javascript for you and thus let you develop web application without having to learn any new programming language. Note that you will need a server to host the application.
Check out this gallery of shiny apps: https://shiny.rstudio.com/gallery/
Getting started with Shiny: https://shiny.rstudio.com/tutorial/
plotly
plotly
enable you to develop great interactive data visualizations. It has the advantage to be available both in R and Python (and javascript). One great thing in R is that if you are a ggplot master, you can write your plot code using ggplot and transform it into a plotly plot with one line of code (here for an example)
Note there are also other python libraries to create interactive plots, here are a few: https://mode.com/blog/python-interactive-plot-libraries/
Code Repositories
Don’t forget to provide information about the versions of software and libraries that were used when running this specific analysis
It is also a great idea to add license to your project so people know how they can use your code: https://choosealicense.com/
Binder & jupyter
Transform your git
based repo into an interactive jupyter notebook https://mybinder.org/!! So other researchers can run your code without having to install anything!
Try it: https://mybinder.org/
Citing your code
Note that it is also possible to assign a DOI to cite a specific version of your repository. For example, see here for more information on how to link Zenodo and GitHub.
Data Repositories
As we discussed earlier, code repositories are note necessarily the best home for your data sets, especially if their format is not text based and if their size is large (>100MB). In addition data repositories offer better support to metadata standard that will help you to describe your data and thus make them more discoverable. Like code repositories, data repositories will version your data creating an history of your data sets that you can navigate. In addition, most of the data repositories will also offer to mint a DOI to cite your data (to be precise a specific version of your data) in a convenient and non ubiquitous way.
We will talk more in depth about data repositories in the Fall, but for now we will mention to entryways to environmental data federation:
- DataONE: a federation of data repositories, https://www.dataone.org/
- re3data: a registry of research data repositories, https://www.re3data.org/
Starting by searching data in your field and see where other researchers are archiving their data is often a great way to determine which data repository could be a good home for your own data.
You as an Author
It is important to be able to reference to yourself as a researcher and as an author of your work in a non ambiguous manner. From their website: ORCID is a great way to create a persistent digital identifier (an ORCID iD) that you own and control, and that distinguishes you from every other researcher. ORCID is also more and more use as an authentication system for many services (e.g. data repositories).
What about your computing environment
Session info
Your analysis was done with specific versions both of the program used but also of all the packages involved, as well as the specifications of Operating System (OS) that was used. The good use is that there ar tools to let you capture this information in a systematic manner.
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] htmlwidgets_1.6.4 compiler_4.3.1 fastmap_1.2.0 cli_3.6.3
[5] tools_4.3.1 htmltools_0.5.8.1 yaml_2.3.10 rmarkdown_2.28
[9] knitr_1.48 jsonlite_1.8.8 xfun_0.47 digest_0.6.37
[13] rlang_1.1.4 evaluate_0.24.0
or even better:
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16)
os Ubuntu 22.04.4 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate C.UTF-8
ctype C.UTF-8
tz UTC
date 2024-09-04
pandoc 2.9.2.1 @ /usr/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
cachem 1.1.0 2024-05-16 [1] CRAN (R 4.3.1)
cli 3.6.3 2024-06-21 [1] CRAN (R 4.3.1)
devtools 2.4.5 2022-10-11 [1] any (@2.4.5)
digest 0.6.37 2024-08-19 [1] CRAN (R 4.3.1)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.1)
evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.3.1)
fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.3.1)
fs 1.6.4 2024-04-25 [1] CRAN (R 4.3.1)
glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.1)
htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.3.1)
httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.3.1)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.1)
knitr 1.48 2024-07-07 [1] CRAN (R 4.3.1)
later 1.3.2 2023-12-06 [1] CRAN (R 4.3.1)
lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.1)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.3.1)
mime 0.12 2021-09-28 [1] CRAN (R 4.3.1)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.3.1)
pkgbuild 1.4.4 2024-03-17 [1] CRAN (R 4.3.1)
pkgload 1.4.0 2024-06-28 [1] CRAN (R 4.3.1)
profvis 0.3.8 2023-05-02 [1] CRAN (R 4.3.1)
promises 1.3.0 2024-04-05 [1] CRAN (R 4.3.1)
purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.1)
Rcpp 1.0.13 2024-07-17 [1] CRAN (R 4.3.1)
remotes 2.5.0 2024-03-17 [1] CRAN (R 4.3.1)
rlang 1.1.4 2024-06-04 [1] CRAN (R 4.3.1)
rmarkdown 2.28 2024-08-17 [1] any (@2.28)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.1)
shiny 1.9.1 2024-08-01 [1] CRAN (R 4.3.1)
stringi 1.8.4 2024-05-06 [1] CRAN (R 4.3.1)
stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.1)
urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.3.1)
usethis 3.0.0 2024-07-29 [1] CRAN (R 4.3.1)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1)
xfun 0.47 2024-08-17 [1] CRAN (R 4.3.1)
xtable 1.8-4 2019-04-21 [1] CRAN (R 4.3.1)
yaml 2.3.10 2024-07-26 [1] CRAN (R 4.3.1)
[1] /home/runner/work/_temp/Library
[2] /opt/R/4.3.1/lib/R/site-library
[3] /opt/R/4.3.1/lib/R/library
──────────────────────────────────────────────────────────────────────────────
You can save all this content to an session_info.txt
file and upload it to your repository.
In python, using pip freeze > requirements.txt
or conda list --export > requirements.txt
will create a text file listing all the libraries (and their versions) used in a specific python environment. You can actually use this file to (re)install all the packages and specific versions into a new python environment. It is also great practice to add this file to your repository.
Containers
A helpful abstraction for capturing the computing environment is a container, whereby a container is created from a set of instructions in a recipe. For the most common containerisation software, Docker, this recipe is called a Dockerfile. Docker is an open platform for developing, shipping, and running applications. Docker enables you to separate your applications from your infrastructure and ship the containers to others. A Docker container can be seen as a computer inside your computer.
A few good readings:
Code friendly Presentations