Indices representing extreme aspects of climate derived from daily data:
Climate Change Research Centre (CCRC) at University of New South Wales (UNSW) (climdex.org).
For example:
FedData
packageNational Climatic Data Center application programming interface (API).
rNOAA
packageHandles downloading data directly from NOAA APIv2.
buoy_*
NOAA Buoy data from the National Buoy Data Centerghcnd_*
GHCND daily data from NOAAisd_*
ISD/ISH data from NOAAhomr_*
Historical Observing Metadata Repositoryncdc_*
NOAA National Climatic Data Center (NCDC)seaice
Sea icestorm_
Storms (IBTrACS)swdi
Severe Weather Data Inventory (SWDI)tornadoes
From the NOAA Storm Prediction CenterDownload the GHCN station inventory with ghcnd_stations()
.
id | latitude | longitude | elevation | state | name | gsn_flag | wmo_id | element | first_year | last_year |
---|---|---|---|---|---|---|---|---|---|---|
ACW00011604 | 17.1167 | -61.7833 | 10.1 | ST JOHNS COOLIDGE FLD | TMAX | 1949 | 1949 | |||
ACW00011604 | 17.1167 | -61.7833 | 10.1 | ST JOHNS COOLIDGE FLD | TMIN | 1949 | 1949 | |||
ACW00011604 | 17.1167 | -61.7833 | 10.1 | ST JOHNS COOLIDGE FLD | PRCP | 1949 | 1949 | |||
ACW00011604 | 17.1167 | -61.7833 | 10.1 | ST JOHNS COOLIDGE FLD | SNOW | 1949 | 1949 | |||
ACW00011604 | 17.1167 | -61.7833 | 10.1 | ST JOHNS COOLIDGE FLD | SNWD | 1949 | 1949 | |||
ACW00011604 | 17.1167 | -61.7833 | 10.1 | ST JOHNS COOLIDGE FLD | PGTM | 1949 | 1949 |
filter()
to temperature and precipitationFirst, get a global country polygon
Plot all stations:
ggplot(data=st_filtered,aes(y=latitude,x=longitude)) +
facet_grid(element~.)+ #make panels for each variable
geom_sf(data=world,inherit.aes = F,size=.1,fill="grey",colour="black")+
geom_point(size=.1,col="red")+
coord_sf()
It’s hard to see all the points, let’s bin them…
ggplot(st_filtered,aes(y=latitude,x=longitude)) +
geom_sf(data=world,inherit.aes = F,size=.1,fill="grey",colour="black")+
facet_grid(element~.)+
stat_bin2d(bins=100)+
scale_fill_distiller(palette="YlOrRd",trans="log",direction=-1,
breaks = c(1,10,100,1000))+
coord_sf()
Produce a binned map (like above) with the following modifications:
tmax
ggplot(filter(st_filtered,
first_year<=1950 &
last_year>=2000 &
element=="TMAX"),
aes(y=latitude,x=longitude)) +
geom_sf(data=world, inherit.aes = F, size=.1, fill="grey", colour="black")+
stat_bin2d(bins=75)+
scale_fill_distiller(palette="YlOrRd",trans="log",direction=-1,
breaks = c(1,10,50))+
coord_sf()
ghcnd()
will download a .dly
file for a particular station. But how to choose?
geocode
in ggmap package useful for geocoding place namesGeocodes a location (find latitude and longitude) using either (1) the Data Science Toolkit (http://www.datasciencetoolkit.org/about) or (2) Google Maps.
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=University%20at%20Buffalo,%20NY&sensor=false
## lon lat
## 1 -78.80388 42.95973
However, you have to be careful:
## Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=My%20Grandma's%20house&sensor=false
## Warning: geocode failed with status OVER_QUERY_LIMIT, location = "My
## Grandma's house"
## lon lat
## 1 NA NA
But this is pretty safe for well known places.
## Information from URL : http://www.datasciencetoolkit.org/maps/api/geocode/json?address=Buffalo,%20NY&sensor=false
## lon lat
## [1,] -78.88642 42.89606
Now use that location to spatially filter stations with a rectangular box.
bufferwidth=1 #size of rectangular buffer in decimal degrees
dplyr::filter(st_filtered,
grepl("BUFFALO",name)&
between(latitude, coords[2] - bufferwidth, coords[2] + bufferwidth) &
between(longitude, coords[1] - bufferwidth, coords[1] + bufferwidth)&
element=="TMAX")
## # A tibble: 3 x 11
## id latitude longitude elevation state name gsn_flag wmo_id element
## <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
## 1 USC0… 42.9 -78.9 -1000. NY BUFF… "" "" TMAX
## 2 USC0… 42.9 -78.9 177. NY BUFF… "" "" TMAX
## 3 USW0… 42.9 -78.7 211. NY BUFF… "" HCN 7… TMAX
## # ... with 2 more variables: first_year <int>, last_year <int>
You could also spatially filter using sf
package…
See CDO Daily Description and raw GHCND metadata for more details. If you want to download multiple stations at once, check out meteo_pull_monitors()
Measurement Flag/Attribute
See CDO Description
See CDO Description
Indicates the source of the data…
Summarize the QC flags. How many of which type are there? Should we be more conservative?
##
## G I S
## 29026 2 7 4
Plot temperatures
d_rollmean%>%
ggplot(aes(ymax=tmax,ymin=tmin,x=date))+
geom_ribbon(fill="grey")+
geom_line(aes(y=(tmin+tmax)/2),col=grey(0.4),size=.5)+
geom_line(aes(y=tmax.60),col="red")+
geom_line(aes(y=tmax.b60),col="darkred")
Plot a 30-day rolling “right” aligned sum of precipitation.
tp=d_filtered_recent %>%
arrange(date) %>%
mutate(prcp.30 = rollsum(x = prcp, 30, align = "right", fill = NA))
ggplot(tp,aes(y=prcp,x=date))+
geom_line(aes(y=prcp.30),col="black")+
geom_line(col="red")
## Warning: Removed 40 rows containing missing values (geom_path).
## Warning: Removed 11 rows containing missing values (geom_path).
Values are highly correlated!
## Warning: Removed 12 rows containing missing values (geom_point).
How to convert years into ‘decades’?
## [1] 1938
## [1] 1940
## [1] 193
## [1] 1930
Now we can make a ‘decade’ grouping variable.
Calculate seasonal and decadal mean temperatures.
d_filtered2=d_filtered%>%
mutate(month=as.numeric(format(date,"%m")),
year=as.numeric(format(date,"%Y")),
season=ifelse(month%in%c(12,1,2),"Winter",
ifelse(month%in%c(3,4,5),"Spring",
ifelse(month%in%c(6,7,8),"Summer",
ifelse(month%in%c(9,10,11),"Fall",NA)))),
dec=(floor(as.numeric(format(date,"%Y"))/10)*10))
d_filtered2%>%dplyr::select(date,month,year,season,dec,tmax)%>%head()%>%kable()
date | month | year | season | dec | tmax |
---|---|---|---|---|---|
1938-05-01 | 5 | 1938 | Spring | 1930 | 14.4 |
1938-05-02 | 5 | 1938 | Spring | 1930 | 21.1 |
1938-05-03 | 5 | 1938 | Spring | 1930 | 16.7 |
1938-05-04 | 5 | 1938 | Spring | 1930 | 20.6 |
1938-05-05 | 5 | 1938 | Spring | 1930 | 31.1 |
1938-05-06 | 5 | 1938 | Spring | 1930 | 19.4 |
How to assess change? Simple differences?
d_filtered2%>%
mutate(period=ifelse(year<=1976-01-01,"early","late"))%>% #create two time periods before and after 1976
group_by(period)%>% # divide the data into the two groups
summarize(n=n(), # calculate the means between the two periods
tmin=mean(tmin,na.rm=T),
tmax=mean(tmax,na.rm=T),
prcp=mean(prcp,na.rm=T))%>%
kable()
period | n | tmin | tmax | prcp |
---|---|---|---|---|
early | 13394 | 4.199753 | 13.67348 | 25.09372 |
late | 15645 | 4.764507 | 13.75706 | 28.44032 |
But be careful, there were lots of missing data in the beginning of the record
# which years don't have complete data?
d_filtered2%>%
group_by(year)%>%
summarize(n=n())%>%
filter(n<360)
## # A tibble: 2 x 2
## year n
## <dbl> <int>
## 1 1938 245
## 2 2017 304
Plot 10-year means (excluding years without complete data):
Let’s compare 2017 with all the previous years in the dataset. First add ‘day of year’ to the data to facilitate showing all years on the same plot.
Then plot all years (in grey) and add 2017 in red.
ggplot(df,aes(x=doydate,y=tmax,group=year))+
geom_line(col="grey",alpha=.5)+ # plot each year in grey
stat_smooth(aes(group=1),col="black")+ # Add a smooth GAM to estimate the long-term mean
geom_line(data=filter(df,year>2016),col="red")+ # add 2017 in red
scale_x_date(labels = date_format("%b"),date_breaks = "2 months")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
Then ‘zoom’ into just the past few months and add 2017 in red.
ggplot(df,aes(x=doydate,y=tmax,group=year))+
geom_line(col="grey",alpha=.5)+
stat_smooth(aes(group=1),col="black")+
geom_line(data=filter(df,year>2016),col="red")+
scale_x_date(labels = date_format("%b"),date_breaks = "2 months",
lim=c(as.Date("2017-08-01"),as.Date("2017-10-31")))
So there was an unusually warm spell in late September.
## List of 13
## $ coefficients : Named num [1:2] -23.551 0.0197
## ..- attr(*, "names")= chr [1:2] "(Intercept)" "year"
## $ residuals : Named num [1:79] 0.566 0.187 -0.676 -0.349 -0.147 ...
## ..- attr(*, "names")= chr [1:79] "1" "2" "3" "4" ...
## $ effects : Named num [1:79] -137.232 4.023 -0.752 -0.424 -0.222 ...
## ..- attr(*, "names")= chr [1:79] "(Intercept)" "year" "" "" ...
## $ rank : int 2
## $ fitted.values: Named num [1:79] 14.7 14.7 14.7 14.7 14.7 ...
## ..- attr(*, "names")= chr [1:79] "1" "2" "3" "4" ...
## $ assign : int [1:2] 0 1
## $ qr :List of 5
## ..$ qr : num [1:79, 1:2] -8.888 0.113 0.113 0.113 0.113 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:79] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:2] "(Intercept)" "year"
## .. ..- attr(*, "assign")= int [1:2] 0 1
## ..$ qraux: num [1:2] 1.11 1.17
## ..$ pivot: int [1:2] 1 2
## ..$ tol : num 1e-07
## ..$ rank : int 2
## ..- attr(*, "class")= chr "qr"
## $ df.residual : int 77
## $ na.action : 'omit' Named int 9
## ..- attr(*, "names")= chr "9"
## $ xlevels : Named list()
## $ call : language lm(formula = tmin ~ year, data = s1)
## $ terms :Classes 'terms', 'formula' language tmin ~ year
## .. ..- attr(*, "variables")= language list(tmin, year)
## .. ..- attr(*, "factors")= int [1:2, 1] 0 1
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:2] "tmin" "year"
## .. .. .. ..$ : chr "year"
## .. ..- attr(*, "term.labels")= chr "year"
## .. ..- attr(*, "order")= int 1
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(tmin, year)
## .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
## .. .. ..- attr(*, "names")= chr [1:2] "tmin" "year"
## $ model :'data.frame': 79 obs. of 2 variables:
## ..$ tmin: num [1:79] 15.2 14.9 14 14.4 14.6 ...
## ..$ year: num [1:79] 1938 1939 1940 1941 1942 ...
## ..- attr(*, "terms")=Classes 'terms', 'formula' language tmin ~ year
## .. .. ..- attr(*, "variables")= language list(tmin, year)
## .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
## .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. ..$ : chr [1:2] "tmin" "year"
## .. .. .. .. ..$ : chr "year"
## .. .. ..- attr(*, "term.labels")= chr "year"
## .. .. ..- attr(*, "order")= int 1
## .. .. ..- attr(*, "intercept")= int 1
## .. .. ..- attr(*, "response")= int 1
## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. .. ..- attr(*, "predvars")= language list(tmin, year)
## .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
## .. .. .. ..- attr(*, "names")= chr [1:2] "tmin" "year"
## ..- attr(*, "na.action")= 'omit' Named int 9
## .. ..- attr(*, "names")= chr "9"
## - attr(*, "class")= chr "lm"
##
## Call:
## lm(formula = tmin ~ year, data = s1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.00360 -0.45550 -0.02385 0.57754 1.90919
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -23.551009 8.042522 -2.928 0.00448 **
## year 0.019713 0.004066 4.848 6.33e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8298 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.2339, Adjusted R-squared: 0.2239
## F-statistic: 23.51 on 1 and 77 DF, p-value: 6.333e-06
Print a summary table:
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -23.5510090 | 8.0425217 | -2.928311 | 0.0044803 |
year | 0.0197133 | 0.0040659 | 4.848415 | 0.0000063 |
See Time Series Analysis Task View for summary of available packages/models.
climdex