Weather Cleaning

This page describes the cleaning process for the data derived from visualcrossing's Weather API.

The cleaning module Python script can be found here (Cleaning - Weather Cleaning).

Weather Data

The weather data API provided a few years of daily weather observations across thousands of stations, which were aggregated by the API call into generalized weather observations for coordinates associated with ski resorts. Ultimately, the weather variables were the goal but had a subsequent outcome of data surrounding weather stations.

Snippet of Initial Weather Data

datetime datetimeEpoch tempmax tempmin temp feelslikemax feelslikemin feelslike dew humidity precip precipprob precipcover preciptype snow snowdepth windgust windspeed winddir pressure cloudcover visibility solarradiation solarenergy uvindex sunrise sunriseEpoch sunset sunsetEpoch moonphase conditions description icon stations source resort tzoffset severerisk
2019-01-01 1546326000 16.4 2.0 7.0 10.2 -13.3 -0.6 -1.2 69.1 0.008 100.0 20.83 ['snow'] 0.0 20.7 18.3 10.6 4.9 1014.5 59.0 8.6 116.8 9.9 5.0 07:26:20 1546352780 16:51:51 1546386711 0.85 Snow, Partially cloudy Partly cloudy throughout the day with snow. snow ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] obs Vail NaN NaN
2019-01-02 1546412400 24.3 -0.9 11.4 21.9 -11.9 5.4 -10.5 39.9 0.004 100.0 4.17 ['snow'] 0.0 20.8 NaN 8.7 353.1 1021.4 0.0 9.9 121.6 10.7 5.0 07:26:27 1546439187 16:52:41 1546473161 0.89 Snow Clear conditions throughout the day with morning snow. snow ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] obs Vail NaN NaN
2019-01-03 1546498800 29.0 5.3 17.6 21.9 -4.0 8.6 4.1 56.1 0.004 100.0 4.17 ['rain', 'snow'] 0.2 20.8 32.2 9.8 328.9 1024.7 0.0 9.8 123.3 10.6 5.0 07:26:31 1546525591 16:53:33 1546559613 0.92 Snow, Rain Clear conditions throughout the day with afternoon rain or snow. snow ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] obs Vail NaN NaN
2019-01-04 1546585200 34.0 11.9 23.4 28.7 3.4 17.1 7.0 50.4 0.001 100.0 4.17 ['rain', 'snow'] 0.1 20.8 20.8 9.0 311.0 1025.5 0.0 9.9 123.7 10.7 5.0 07:26:34 1546611994 16:54:26 1546646066 0.96 Snow, Rain Clear conditions throughout the day with afternoon rain or snow. snow ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] obs Vail NaN NaN
2019-01-05 1546671600 34.1 14.3 27.1 29.4 4.3 20.1 1.9 33.9 0.001 100.0 4.17 ['rain', 'snow'] 0.0 20.4 20.8 10.1 243.5 1022.2 19.4 9.7 110.3 9.6 5.0 07:26:34 1546698394 16:55:20 1546732520 0.00 Snow, Rain Clear conditions throughout the day with afternoon rain or snow. rain ['72467523063', '72206103038', 'CACMC', 'DYGC2', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] obs Vail NaN NaN
2019-01-06 1546758000 29.9 18.5 25.9 22.4 5.1 16.1 18.1 72.5 0.035 100.0 58.33 ['rain', 'snow'] 0.6 20.6 33.3 16.9 266.7 1009.3 78.7 6.3 47.3 4.1 2.0 07:26:32 1546784792 16:56:16 1546818976 0.02 Snow, Rain, Partially cloudy Partly cloudy throughout the day with rain or snow. snow ['72467523063', '72206103038', 'CACMC', '72038500419', 'DYGC2', 'KCCU', 'KEGE', 'A0000594076', 'KLXV', 'DJTC2', 'K20V', '72467393009'] obs Vail NaN NaN
2019-01-07 1546844400 24.8 14.7 20.3 12.8 2.5 6.5 13.7 75.2 0.004 100.0 8.33 ['snow'] 0.4 21.3 45.7 27.9 271.2 1015.6 83.7 4.8 35.8 3.0 2.0 07:26:27 1546871187 16:57:13 1546905433 0.06 Snow, Partially cloudy Partly cloudy throughout the day with snow in the morning and afternoon. snow ['72467523063', '72206103038', 'CACMC', 'DYGC2', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] obs Vail NaN NaN
2019-01-08 1546930800 34.6 17.2 25.1 34.6 5.0 17.8 12.0 59.5 0.013 100.0 8.33 ['rain', 'snow'] 0.0 21.3 27.7 15.2 312.1 1029.4 34.5 9.5 122.9 10.5 5.0 07:26:21 1546957581 16:58:11 1546991891 0.09 Snow, Rain, Partially cloudy Clearing in the afternoon with afternoon rain or snow. rain ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] obs Vail NaN NaN
2019-01-09 1547017200 38.3 23.0 28.6 38.3 13.6 22.6 9.9 45.4 0.000 0.0 0.00 NaN 0.0 21.2 23.0 13.0 142.9 1029.6 1.0 9.9 114.0 9.8 5.0 07:26:12 1547043972 16:59:11 1547078351 0.12 Clear Clear conditions throughout the day. clear-day ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] obs Vail NaN NaN
2019-01-10 1547103600 33.7 17.0 26.4 33.7 9.8 22.6 14.3 60.6 0.026 100.0 12.50 ['rain', 'snow'] 0.8 21.4 17.2 8.8 323.7 1023.3 39.9 8.3 75.9 6.6 4.0 07:26:01 1547130361 17:00:11 1547164811 0.16 Snow, Rain, Partially cloudy Partly cloudy throughout the day with late afternoon rain or snow. snow ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] obs Vail NaN NaN

Null Values of Initial Weather Data

Column Null Count
datetime 0
datetimeEpoch 0
tempmax 0
tempmin 0
temp 0
feelslikemax 0
feelslikemin 0
feelslike 0
dew 0
humidity 0
precip 0
precipprob 0
precipcover 0
preciptype 305269
snow 0
snowdepth 0
windgust 1550
windspeed 0
winddir 0
pressure 0
cloudcover 0
visibility 37524
solarradiation 10
solarenergy 10
uvindex 10
sunrise 0
sunriseEpoch 0
sunset 0
sunsetEpoch 0
moonphase 0
conditions 0
description 0
icon 0
stations 0
source 0
resort 0
tzoffset 276275
severerisk 421370

Snippet of Initial Station Data

distance latitude longitude useCount id name quality contribution resort
33491.0 39.560 -105.986 0 SODC2 SODA CREEK CO US 0 0.0 Vail
26493.0 39.467 -106.150 0 72206103038 RED CLIFF PASS, CO US 99 0.0 Vail
35635.0 39.927 -106.545 0 DYGC2 DRY GULCH CO US 0 0.0 Vail
26228.0 39.470 -106.150 0 KCCU KCCU 99 0.0 Vail
39498.0 39.891 -106.037 0 KSEC2 KEYSER RIDGE CO US 0 0.0 Vail
63875.0 39.220 -106.870 0 KASE KASE 100 0.0 Vail
63123.0 39.230 -106.871 0 72467693073 ASPEN PITKIN CO AIRPORT SARDY FIELD, CO US 100 0.0 Vail
44625.0 40.040 -106.370 0 K20V K20V 98 0.0 Vail
47529.0 39.650 -106.917 0 72467523063 EAGLE CO AIRPORT, CO US 100 0.0 Vail
53458.0 39.790 -105.770 0 K0CO K0CO 100 0.0 Vail

Null Values of Initial Station Data

Column Null Count
distance 0
latitude 0
longitude 0
useCount 0
id 0
name 6
quality 0
contribution 0
resort 0

Understanding the Weather Results

Understanding the variables within the weather data is essential to not only applying the data, but also dealing with null values and outliers. Thankfully, visualcrossing provides a breakdown of the variables. The details are as follows:



It should also be noted that the retrieved data was done on observations and hourly, so not all variables appear in the data.

Dropping Columns & Searching for Outliers

Gaining an initial understanding of the data provided insight into columns not applicable to the scope of this timeframe or were essentially duplicate and inferior information of other columns.

Outliers were then searched for in some of the remaining numeric type columns.

"Prior null value processing for numerical weather data."

Boxplot Analysis

Two main phenomenon were illuminated by this visualization. Namely:

Dealing with Null Values

After replacing the visibility outlier with a Null value instead of dropping the entire row, the null values were taken care of with methods which retained rows as well:

Unpacking Lists

There were 2 list type columns which required unpacking.

For preciptype, MultiLabelBinarizer() from scikit-learn was used to create numeric representations of booleans in their own columns. In other words, it was encoded.

For stations, given there were thousands, an encoding approach might not be the best at this point. The column in list type format was left in. However, a basket type dataframe was created and saved from this for possible later use in an apriori method or model.

The Final Weather Data

After the steps throughout the script were taken, an acceptable Weather dataset was formed.

Snippet of Final Weather Data

datetime tempmax tempmin temp feelslikemax feelslikemin feelslike dew humidity precip precipprob precipcover snow snowdepth windgust windspeed winddir pressure cloudcover visibility solarradiation solarenergy uvindex sunrise sunset moonphase icon stations resort tzoffset severerisk type_freezingrain type_ice type_none type_rain type_snow
2019-01-01 16.4 2.0 7.0 10.2 -13.3 -0.6 -1.2 69.1 0.008 100.0 20.83 0.0 20.7 18.30000 10.6 4.9 1014.5 59.0 8.6 116.8 9.9 5.0 07:26:20 16:51:51 0.85 snow ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] Vail 0.0 0.0 0 0 0 0 1
2019-01-02 24.3 -0.9 11.4 21.9 -11.9 5.4 -10.5 39.9 0.004 100.0 4.17 0.0 20.8 29.77377 8.7 353.1 1021.4 0.0 9.9 121.6 10.7 5.0 07:26:27 16:52:41 0.89 snow ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] Vail 0.0 0.0 0 0 0 0 1
2019-01-03 29.0 5.3 17.6 21.9 -4.0 8.6 4.1 56.1 0.004 100.0 4.17 0.2 20.8 32.20000 9.8 328.9 1024.7 0.0 9.8 123.3 10.6 5.0 07:26:31 16:53:33 0.92 snow ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] Vail 0.0 0.0 0 0 0 1 1
2019-01-04 34.0 11.9 23.4 28.7 3.4 17.1 7.0 50.4 0.001 100.0 4.17 0.1 20.8 20.80000 9.0 311.0 1025.5 0.0 9.9 123.7 10.7 5.0 07:26:34 16:54:26 0.96 snow ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] Vail 0.0 0.0 0 0 0 1 1
2019-01-05 34.1 14.3 27.1 29.4 4.3 20.1 1.9 33.9 0.001 100.0 4.17 0.0 20.4 20.80000 10.1 243.5 1022.2 19.4 9.7 110.3 9.6 5.0 07:26:34 16:55:20 0.00 rain ['72467523063', '72206103038', 'CACMC', 'DYGC2', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] Vail 0.0 0.0 0 0 0 1 1
2019-01-06 29.9 18.5 25.9 22.4 5.1 16.1 18.1 72.5 0.035 100.0 58.33 0.6 20.6 33.30000 16.9 266.7 1009.3 78.7 6.3 47.3 4.1 2.0 07:26:32 16:56:16 0.02 snow ['72467523063', '72206103038', 'CACMC', '72038500419', 'DYGC2', 'KCCU', 'KEGE', 'A0000594076', 'KLXV', 'DJTC2', 'K20V', '72467393009'] Vail 0.0 0.0 0 0 0 1 1
2019-01-07 24.8 14.7 20.3 12.8 2.5 6.5 13.7 75.2 0.004 100.0 8.33 0.4 21.3 45.70000 27.9 271.2 1015.6 83.7 4.8 35.8 3.0 2.0 07:26:27 16:57:13 0.06 snow ['72467523063', '72206103038', 'CACMC', 'DYGC2', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] Vail 0.0 0.0 0 0 0 0 1
2019-01-08 34.6 17.2 25.1 34.6 5.0 17.8 12.0 59.5 0.013 100.0 8.33 0.0 21.3 27.70000 15.2 312.1 1029.4 34.5 9.5 122.9 10.5 5.0 07:26:21 16:58:11 0.09 rain ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] Vail 0.0 0.0 0 0 0 1 1
2019-01-09 38.3 23.0 28.6 38.3 13.6 22.6 9.9 45.4 0.000 0.0 0.00 0.0 21.2 23.00000 13.0 142.9 1029.6 1.0 9.9 114.0 9.8 5.0 07:26:12 16:59:11 0.12 clear-day ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] Vail 0.0 0.0 0 0 1 0 0
2019-01-10 33.7 17.0 26.4 33.7 9.8 22.6 14.3 60.6 0.026 100.0 12.50 0.8 21.4 17.20000 8.8 323.7 1023.3 39.9 8.3 75.9 6.6 4.0 07:26:01 17:00:11 0.16 snow ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] Vail 0.0 0.0 0 0 0 1 1

Null Values of Final Weather Data

Column Null Count
datetime 0
tempmax 0
tempmin 0
temp 0
feelslikemax 0
feelslikemin 0
feelslike 0
dew 0
humidity 0
precip 0
precipprob 0
precipcover 0
snow 0
snowdepth 0
windgust 0
windspeed 0
winddir 0
pressure 0
cloudcover 0
visibility 0
solarradiation 0
solarenergy 0
uvindex 0
sunrise 0
sunset 0
moonphase 0
icon 0
stations 0
resort 0
tzoffset 0
severerisk 0
type_freezingrain 0
type_ice 0
type_none 0
type_rain 0
type_snow 0