APIs, time-series, and weather Data

API

Application Programming Interface

Climate Metrics

Climate Metrics: ClimdEX

Indices representing extreme aspects of climate derived from daily data:

alt text

Climate Change Research Centre (CCRC) at University of New South Wales (UNSW) (climdex.org).

27 Core indices

For example:

  • FD Number of frost days: Annual count of days when TN (daily minimum temperature) < 0C.
  • SU Number of summer days: Annual count of days when TX (daily maximum temperature) > 25C.
  • ID Number of icing days: Annual count of days when TX (daily maximum temperature) < 0C.
  • TR Number of tropical nights: Annual count of days when TN (daily minimum temperature) > 20C.
  • GSL Growing season length: Annual (1st Jan to 31st Dec in Northern Hemisphere (NH), 1st July to 30th June in Southern Hemisphere (SH)) count between first span of at least 6 days with daily mean temperature TG>5C and first span after July 1st (Jan 1st in SH) of 6 days with TG<5C.
  • TXx Monthly maximum value of daily maximum temperature
  • TN10p Percentage of days when TN < 10th percentile
  • Rx5day Monthly maximum consecutive 5-day precipitation
  • SDII Simple pricipitation intensity index

Weather Data

Climate Data Online

CDO

GHCN

ghcn

Options for downloading data in R

FedData package

  • National Elevation Dataset digital elevation models (1 and 1/3 arc-second; USGS)
  • National Hydrography Dataset (USGS)
  • Soil Survey Geographic (SSURGO) database
  • International Tree Ring Data Bank.
  • Global Historical Climatology Network (GHCN)

NOAA API

noaa api

National Climatic Data Center application programming interface (API).

rNOAA package

Handles downloading data directly from NOAA APIv2.

  • buoy_* NOAA Buoy data from the National Buoy Data Center
  • ghcnd_* GHCND daily data from NOAA
  • isd_* ISD/ISH data from NOAA
  • homr_* Historical Observing Metadata Repository
  • ncdc_* NOAA National Climatic Data Center (NCDC)
  • seaice Sea ice
  • storm_ Storms (IBTrACS)
  • swdi Severe Weather Data Inventory (SWDI)
  • tornadoes From the NOAA Storm Prediction Center

Libraries

Station locations

Download the GHCN station inventory with ghcnd_stations().

id latitude longitude elevation state name gsn_flag wmo_id element first_year last_year
ACW00011604 17.1167 -61.7833 10.1 ST JOHNS COOLIDGE FLD TMAX 1949 1949
ACW00011604 17.1167 -61.7833 10.1 ST JOHNS COOLIDGE FLD TMIN 1949 1949
ACW00011604 17.1167 -61.7833 10.1 ST JOHNS COOLIDGE FLD PRCP 1949 1949
ACW00011604 17.1167 -61.7833 10.1 ST JOHNS COOLIDGE FLD SNOW 1949 1949
ACW00011604 17.1167 -61.7833 10.1 ST JOHNS COOLIDGE FLD SNWD 1949 1949
ACW00011604 17.1167 -61.7833 10.1 ST JOHNS COOLIDGE FLD PGTM 1949 1949

GHCND Variables: 5 core values

  • PRCP Precipitation (tenths of mm)
  • SNOW Snowfall (mm)
  • SNWD Snow depth (mm)
  • TMAX Maximum temperature
  • TMIN Minimum temperature

And ~50 others! For example:

  • ACMC Average cloudiness midnight to midnight from 30-second ceilometer
  • AWND Average daily wind speed
  • FMTM Time of fastest mile or fastest 1-minute wind
  • MDSF Multiday snowfall total

filter() to temperature and precipitation

Map GHCND stations

First, get a global country polygon

Plot all stations:

It’s hard to see all the points, let’s bin them…

Your turn

Produce a binned map (like above) with the following modifications:

  • include only stations with a data record that starts before 1950 and ends after 2000 (keeping only complete records during that time).
  • include only tmax

Download daily data from GHCN

ghcnd() will download a .dly file for a particular station. But how to choose?

geocode in ggmap package useful for geocoding place names

Geocodes a location (find latitude and longitude) using either (1) the Data Science Toolkit (http://www.datasciencetoolkit.org/about) or (2) Google Maps.

## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=University%20at%20Buffalo,%20NY&sensor=false
##         lon      lat
## 1 -78.80388 42.95973

However, you have to be careful:

## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=My%20Grandma's%20house&sensor=false
## Warning: geocode failed with status OVER_QUERY_LIMIT, location = "My
## Grandma's house"
##   lon lat
## 1  NA  NA

But this is pretty safe for well known places.

## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=Buffalo,%20NY&sensor=false
##            lon      lat
## [1,] -78.88642 42.89606

Now use that location to spatially filter stations with a rectangular box.

## # A tibble: 3 x 11
##   id    latitude longitude elevation state name  gsn_flag wmo_id element
##   <chr>    <dbl>     <dbl>     <dbl> <chr> <chr> <chr>    <chr>  <chr>  
## 1 USC0…     42.9     -78.9    -1000. NY    BUFF… ""       ""     TMAX   
## 2 USC0…     42.9     -78.9      177. NY    BUFF… ""       ""     TMAX   
## 3 USW0…     42.9     -78.7      211. NY    BUFF… ""       HCN 7… TMAX   
## # ... with 2 more variables: first_year <int>, last_year <int>

You could also spatially filter using sf package…

See CDO Daily Description and raw GHCND metadata for more details. If you want to download multiple stations at once, check out meteo_pull_monitors()

Quality Control: MFLAG

Measurement Flag/Attribute

  • Blank no measurement information applicable
  • B precipitation total formed from two twelve-hour totals
  • H represents highest or lowest hourly temperature (TMAX or TMIN) or average of hourly values (TAVG)
  • K converted from knots

See CDO Description

Quality Control: QFLAG

  • Blank did not fail any quality assurance check
  • D failed duplicate check
  • G failed gap check
  • I failed internal consistency check
  • K failed streak/frequent-value check
  • N failed naught check
  • O failed climatological outlier check
  • S failed spatial consistency check
  • T failed temporal consistency check
  • W temperature too warm for snow

See CDO Description

Quality Control: SFLAG

Indicates the source of the data…

Summarize QC flags

Summarize the QC flags. How many of which type are there? Should we be more conservative?

## 
##           G     I     S 
## 29026     2     7     4
  • G failed gap check
  • I failed internal consistency check
  • S failed spatial consistency check

Filter with QC data and change units

Plot temperatures

Your Turn

Plot a 30-day rolling “right” aligned sum of precipitation.

Time Series analysis

Temporal autocorrelation

Values are highly correlated!

## Warning: Removed 12 rows containing missing values (geom_point).

Autocorrelation functions

  • autocorrelation \(x\) vs. \(x_{t-1}\) (lag=1)
  • partial autocorrelation. \(x\) vs. \(x_{n}\) after controlling for correlations \(\in t-1:n\)

Autocorrelation

Partial Autocorrelation

Compute temporal aggregation indices

Group by month, season, year, and decade.

How to convert years into ‘decades’?

## [1] 1938
## [1] 1940
## [1] 193
## [1] 1930

Now we can make a ‘decade’ grouping variable.

Calculate seasonal and decadal mean temperatures.

date month year season dec tmax
1938-05-01 5 1938 Spring 1930 14.4
1938-05-02 5 1938 Spring 1930 21.1
1938-05-03 5 1938 Spring 1930 16.7
1938-05-04 5 1938 Spring 1930 20.6
1938-05-05 5 1938 Spring 1930 31.1
1938-05-06 5 1938 Spring 1930 19.4

Timeseries models

How to assess change? Simple differences?

period n tmin tmax prcp
early 13394 4.199753 13.67348 25.09372
late 15645 4.764507 13.75706 28.44032

But be careful, there were lots of missing data in the beginning of the record

## # A tibble: 2 x 2
##    year     n
##   <dbl> <int>
## 1  1938   245
## 2  2017   304

Plot 10-year means (excluding years without complete data):

Look for specific events: was 2017 unusually hot in Buffalo, NY?

Let’s compare 2017 with all the previous years in the dataset. First add ‘day of year’ to the data to facilitate showing all years on the same plot.

Then plot all years (in grey) and add 2017 in red.

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Then ‘zoom’ into just the past few months and add 2017 in red.

So there was an unusually warm spell in late September.

Summarize by season

Linear regression of maximum temperature in fall

## List of 13
##  $ coefficients : Named num [1:2] -23.551 0.0197
##   ..- attr(*, "names")= chr [1:2] "(Intercept)" "year"
##  $ residuals    : Named num [1:79] 0.566 0.187 -0.676 -0.349 -0.147 ...
##   ..- attr(*, "names")= chr [1:79] "1" "2" "3" "4" ...
##  $ effects      : Named num [1:79] -137.232 4.023 -0.752 -0.424 -0.222 ...
##   ..- attr(*, "names")= chr [1:79] "(Intercept)" "year" "" "" ...
##  $ rank         : int 2
##  $ fitted.values: Named num [1:79] 14.7 14.7 14.7 14.7 14.7 ...
##   ..- attr(*, "names")= chr [1:79] "1" "2" "3" "4" ...
##  $ assign       : int [1:2] 0 1
##  $ qr           :List of 5
##   ..$ qr   : num [1:79, 1:2] -8.888 0.113 0.113 0.113 0.113 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:79] "1" "2" "3" "4" ...
##   .. .. ..$ : chr [1:2] "(Intercept)" "year"
##   .. ..- attr(*, "assign")= int [1:2] 0 1
##   ..$ qraux: num [1:2] 1.11 1.17
##   ..$ pivot: int [1:2] 1 2
##   ..$ tol  : num 1e-07
##   ..$ rank : int 2
##   ..- attr(*, "class")= chr "qr"
##  $ df.residual  : int 77
##  $ na.action    : 'omit' Named int 9
##   ..- attr(*, "names")= chr "9"
##  $ xlevels      : Named list()
##  $ call         : language lm(formula = tmin ~ year, data = s1)
##  $ terms        :Classes 'terms', 'formula'  language tmin ~ year
##   .. ..- attr(*, "variables")= language list(tmin, year)
##   .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. ..$ : chr [1:2] "tmin" "year"
##   .. .. .. ..$ : chr "year"
##   .. ..- attr(*, "term.labels")= chr "year"
##   .. ..- attr(*, "order")= int 1
##   .. ..- attr(*, "intercept")= int 1
##   .. ..- attr(*, "response")= int 1
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. ..- attr(*, "predvars")= language list(tmin, year)
##   .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. ..- attr(*, "names")= chr [1:2] "tmin" "year"
##  $ model        :'data.frame':   79 obs. of  2 variables:
##   ..$ tmin: num [1:79] 15.2 14.9 14 14.4 14.6 ...
##   ..$ year: num [1:79] 1938 1939 1940 1941 1942 ...
##   ..- attr(*, "terms")=Classes 'terms', 'formula'  language tmin ~ year
##   .. .. ..- attr(*, "variables")= language list(tmin, year)
##   .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
##   .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. ..$ : chr [1:2] "tmin" "year"
##   .. .. .. .. ..$ : chr "year"
##   .. .. ..- attr(*, "term.labels")= chr "year"
##   .. .. ..- attr(*, "order")= int 1
##   .. .. ..- attr(*, "intercept")= int 1
##   .. .. ..- attr(*, "response")= int 1
##   .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##   .. .. ..- attr(*, "predvars")= language list(tmin, year)
##   .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
##   .. .. .. ..- attr(*, "names")= chr [1:2] "tmin" "year"
##   ..- attr(*, "na.action")= 'omit' Named int 9
##   .. ..- attr(*, "names")= chr "9"
##  - attr(*, "class")= chr "lm"
## 
## Call:
## lm(formula = tmin ~ year, data = s1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.00360 -0.45550 -0.02385  0.57754  1.90919 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -23.551009   8.042522  -2.928  0.00448 ** 
## year          0.019713   0.004066   4.848 6.33e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8298 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.2339, Adjusted R-squared:  0.2239 
## F-statistic: 23.51 on 1 and 77 DF,  p-value: 6.333e-06

Print a summary table:

term estimate std.error statistic p.value
(Intercept) -23.5510090 8.0425217 -2.928311 0.0044803
year 0.0197133 0.0040659 4.848415 0.0000063

Autoregressive models

See Time Series Analysis Task View for summary of available packages/models.

  • Moving average (MA) models
  • autoregressive (AR) models
  • autoregressive moving average (ARMA) models
  • frequency analysis
  • Many, many more…

Other Climate Metrics

Climdex indices

ClimDex

Format data for climdex

Generate the climdex object

Cumulative dry days

Diurnal Temperature Range

Frost Days