Week 2: Accessing data in a programmatic way

Monday 10/04/2021 – Lecture

Using APIs to programatically retrieve data


The manual way

Using website to retrieve data is great interactive way to search and explore data of interest. Let’s start our week by searching some data on the data repository DateONE: https://search.dataone.org/data

In case you need some inspiration, look at this dataset about historic precipitation in Alaska: https://doi.org/10.5063/N29VCQ


The programmatic way

Although discovering data through a web interface is convenient and offers a great experience, it is often hard to scale this approach or to integrate it into a reproducible workflow.

Slides

Wednesday 10/06/2021 – Using APIs Lab

USGS dataRetrieval R package to retrieve hydrological data

USGS is managing a vast network of gauges to monitor freshwater across the US.

Exercise 1

Start a new Markdown document to plot the discharge time-series for the Ventura River from 2019-10-01 to 2020-10-05

Webpage: https://waterdata.usgs.gov/nwis/uv?site_no=11118500

Tutorial for the package can be found here: https://cran.r-project.org/web/packages/dataRetrieval/vignettes/dataRetrieval.html#daily-data

Bonus

How would you try to determine when this stream gauge record started using the API?

metajam

The metajam R package relies on the dataONE API to download data and metadata into your R Environment. It is currently supporting KNB, ADC and EDI repositories because they rely on the metadata standard EML.

Short intro to the package

Exercise 2

Let’s determine what percentage of Alaskan household are speaking only English!

The data: https://doi.org/10.5063/F1N58JPP

  1. Read the metadata
  2. Download the data household_language.csv using metajam
  3. Read the data into R using metajam
  4. Write a piece of code that will compute the percentage of Alaskan household speaking only English for the year 2009 to 2015
  5. Create a plot to visualize this data

Bonus

How does it compare to French?