Andy Pickering

Adventures in Data Science

Summer in February?

An analysis of the seemingly-unseasonable temperatures this February. Click the title for more info!

Summer In February?

It’s been unseasonably warm the past few weeks here in Colorado. I wanted to see just how abnormal this weather is based on historical records.

I’ll use the tidyverse and lubridate packages:

rm(list=ls())
suppressPackageStartupMessages(library(tidyverse))
## Conflicts with tidy packages ----------------------------------------------
suppressPackageStartupMessages(library(lubridate))

First we need to get weather data. I’ll use weather data for the Denver airport, which can be downloaded from wunderground via their API. I’ll define a function to download the data for 1 year for a specified station (KDEN is the Denver Airport):

get_yearly_weather <- function (year,st_code){
        url <- paste0("http://www.wunderground.com/history/airport/",st_code,"/",year,"/1/1/CustomHistory.html?dayend=31&monthend=12&yearend=",year,"&req_city=NA&req_state=NA&req_statename=NA&format=1")
        wea <- read_csv(url,skip=1,col_types = cols())
        wea <- wea %>% 
                mutate(date=as.Date(MST)) %>%
                mutate(yday=yday(date)) %>%
                mutate(year=year(date)) %>%
                select(date,year,yday,`Max TemperatureF`,`Mean TemperatureF`,`Min TemperatureF`) %>%
                rename(max_temp=`Max TemperatureF`) %>%
                rename(mean_temp=`Mean TemperatureF`) %>%
                rename(min_temp=`Min TemperatureF`) %>%
                filter(yday<80)
}

den_2017 <- get_yearly_weather(2017,"KDEN")
den_2017
## # A tibble: 54 × 6
##          date  year  yday max_temp mean_temp min_temp
##        <date> <dbl> <dbl>    <int>     <int>    <int>
## 1  2017-01-01  2017     1       46        32       18
## 2  2017-01-02  2017     2       52        34       15
## 3  2017-01-03  2017     3       32        21       10
## 4  2017-01-04  2017     4       12         8        3
## 5  2017-01-05  2017     5        5         2        0
## 6  2017-01-06  2017     6       19         6       -7
## 7  2017-01-07  2017     7       26        14        2
## 8  2017-01-08  2017     8       48        30       12
## 9  2017-01-09  2017     9       62        47       31
## 10 2017-01-10  2017    10       52        42       32
## # ... with 44 more rows

Now we can plot the maximum temperature for each day so far this year:

# Plot the max temps at DIA this year
ggplot(den_2017,aes(yday,max_temp))+
        geom_line()+
        geom_point()+
        ggtitle("Max Temperature (F) at DIA in 2017")
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).

We see that we’ve had some vary warm days, including many over 60 F, and even three above 70 F! But just how unusual is this? Next we’ll get weather data for previous years and see how this year compares to the historical record.

# list of years to get data for
years <- seq(1970,2017)

den<-data_frame()
for (i in seq_along(years)){
        wea <- get_yearly_weather( years[i],"KDEN")
        den<-bind_rows(den,wea)
}

# Missing/bad values are coded as -99999
id<-which(den$max_temp==-99999)
if (length(id)>0){
        den$max_temp[id]<-NA
}

I’ll plot all the historical data as grey dots, and the 2017 values in black so we can compare them. I also want to get a little better idea of how unusual the 2017 temperatures are, so i’ll compute the average max temperature for each day from 1970 to 2016. Now we can see that the three days over 70 F were the warmest days since at least 1970. In January, we had a number of days both higher and lower than the historical average.Almost all of the days in Feburary were higher than the historical average, with many of them close to the 1970-2016 maximum for that day.

avg_by_yday <- den %>%
        filter(year<2017)%>%
        group_by(yday) %>%
        summarise(tavg=mean(max_temp,na.rm=TRUE))

ggplot(den,aes(yday,max_temp))+
        geom_point(color="grey")+
        geom_point(data=filter(den,year==2017),color="black")+
        geom_line(data=avg_by_yday,aes(yday,tavg),color="red")
## Warning: Removed 159 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).

t2017 <- den %>%
        filter(year==2017) %>%
        select(yday,max_temp) 

dat_joined <- left_join(t2017,avg_by_yday,by=c("yday")) 

dat_joined <- dat_joined %>%
        mutate(tdiff=max_temp-tavg)

ggplot(dat_joined,aes(yday,tdiff))+
        geom_col(aes(fill=tdiff>0))+
        theme(legend.position="none")+
        ggtitle("2017 Max Temp. - historical avg. Max Temp (1970-2016)")
## Warning: Removed 1 rows containing missing values (position_stack).

Conclusions

  • The majority of days in February were warmer than the average (1970-2016)
  • 3 days in February had maximum temperatures warmer than any historical (1970-2016) values.
  • January had periods that were both warmer and cooler than average.
  • It’s easy to get weather data and analyze it in R!
Written on February 22, 2017