This page describes the cleaning process for the data derived from visualcrossing's Weather API.
The cleaning module Python script can be found here (Cleaning - Weather Cleaning).
The weather data API provided a few years of daily weather observations across thousands of stations, which were aggregated by the API call into generalized weather observations for coordinates associated with ski resorts. Ultimately, the weather variables were the goal but had a subsequent outcome of data surrounding weather stations.
Snippet of Initial Weather Data
datetime | datetimeEpoch | tempmax | tempmin | temp | feelslikemax | feelslikemin | feelslike | dew | humidity | precip | precipprob | precipcover | preciptype | snow | snowdepth | windgust | windspeed | winddir | pressure | cloudcover | visibility | solarradiation | solarenergy | uvindex | sunrise | sunriseEpoch | sunset | sunsetEpoch | moonphase | conditions | description | icon | stations | source | resort | tzoffset | severerisk |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2019-01-01 | 1546326000 | 16.4 | 2.0 | 7.0 | 10.2 | -13.3 | -0.6 | -1.2 | 69.1 | 0.008 | 100.0 | 20.83 | ['snow'] | 0.0 | 20.7 | 18.3 | 10.6 | 4.9 | 1014.5 | 59.0 | 8.6 | 116.8 | 9.9 | 5.0 | 07:26:20 | 1546352780 | 16:51:51 | 1546386711 | 0.85 | Snow, Partially cloudy | Partly cloudy throughout the day with snow. | snow | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | obs | Vail | NaN | NaN |
2019-01-02 | 1546412400 | 24.3 | -0.9 | 11.4 | 21.9 | -11.9 | 5.4 | -10.5 | 39.9 | 0.004 | 100.0 | 4.17 | ['snow'] | 0.0 | 20.8 | NaN | 8.7 | 353.1 | 1021.4 | 0.0 | 9.9 | 121.6 | 10.7 | 5.0 | 07:26:27 | 1546439187 | 16:52:41 | 1546473161 | 0.89 | Snow | Clear conditions throughout the day with morning snow. | snow | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | obs | Vail | NaN | NaN |
2019-01-03 | 1546498800 | 29.0 | 5.3 | 17.6 | 21.9 | -4.0 | 8.6 | 4.1 | 56.1 | 0.004 | 100.0 | 4.17 | ['rain', 'snow'] | 0.2 | 20.8 | 32.2 | 9.8 | 328.9 | 1024.7 | 0.0 | 9.8 | 123.3 | 10.6 | 5.0 | 07:26:31 | 1546525591 | 16:53:33 | 1546559613 | 0.92 | Snow, Rain | Clear conditions throughout the day with afternoon rain or snow. | snow | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | obs | Vail | NaN | NaN |
2019-01-04 | 1546585200 | 34.0 | 11.9 | 23.4 | 28.7 | 3.4 | 17.1 | 7.0 | 50.4 | 0.001 | 100.0 | 4.17 | ['rain', 'snow'] | 0.1 | 20.8 | 20.8 | 9.0 | 311.0 | 1025.5 | 0.0 | 9.9 | 123.7 | 10.7 | 5.0 | 07:26:34 | 1546611994 | 16:54:26 | 1546646066 | 0.96 | Snow, Rain | Clear conditions throughout the day with afternoon rain or snow. | snow | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | obs | Vail | NaN | NaN |
2019-01-05 | 1546671600 | 34.1 | 14.3 | 27.1 | 29.4 | 4.3 | 20.1 | 1.9 | 33.9 | 0.001 | 100.0 | 4.17 | ['rain', 'snow'] | 0.0 | 20.4 | 20.8 | 10.1 | 243.5 | 1022.2 | 19.4 | 9.7 | 110.3 | 9.6 | 5.0 | 07:26:34 | 1546698394 | 16:55:20 | 1546732520 | 0.00 | Snow, Rain | Clear conditions throughout the day with afternoon rain or snow. | rain | ['72467523063', '72206103038', 'CACMC', 'DYGC2', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | obs | Vail | NaN | NaN |
2019-01-06 | 1546758000 | 29.9 | 18.5 | 25.9 | 22.4 | 5.1 | 16.1 | 18.1 | 72.5 | 0.035 | 100.0 | 58.33 | ['rain', 'snow'] | 0.6 | 20.6 | 33.3 | 16.9 | 266.7 | 1009.3 | 78.7 | 6.3 | 47.3 | 4.1 | 2.0 | 07:26:32 | 1546784792 | 16:56:16 | 1546818976 | 0.02 | Snow, Rain, Partially cloudy | Partly cloudy throughout the day with rain or snow. | snow | ['72467523063', '72206103038', 'CACMC', '72038500419', 'DYGC2', 'KCCU', 'KEGE', 'A0000594076', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | obs | Vail | NaN | NaN |
2019-01-07 | 1546844400 | 24.8 | 14.7 | 20.3 | 12.8 | 2.5 | 6.5 | 13.7 | 75.2 | 0.004 | 100.0 | 8.33 | ['snow'] | 0.4 | 21.3 | 45.7 | 27.9 | 271.2 | 1015.6 | 83.7 | 4.8 | 35.8 | 3.0 | 2.0 | 07:26:27 | 1546871187 | 16:57:13 | 1546905433 | 0.06 | Snow, Partially cloudy | Partly cloudy throughout the day with snow in the morning and afternoon. | snow | ['72467523063', '72206103038', 'CACMC', 'DYGC2', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | obs | Vail | NaN | NaN |
2019-01-08 | 1546930800 | 34.6 | 17.2 | 25.1 | 34.6 | 5.0 | 17.8 | 12.0 | 59.5 | 0.013 | 100.0 | 8.33 | ['rain', 'snow'] | 0.0 | 21.3 | 27.7 | 15.2 | 312.1 | 1029.4 | 34.5 | 9.5 | 122.9 | 10.5 | 5.0 | 07:26:21 | 1546957581 | 16:58:11 | 1546991891 | 0.09 | Snow, Rain, Partially cloudy | Clearing in the afternoon with afternoon rain or snow. | rain | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | obs | Vail | NaN | NaN |
2019-01-09 | 1547017200 | 38.3 | 23.0 | 28.6 | 38.3 | 13.6 | 22.6 | 9.9 | 45.4 | 0.000 | 0.0 | 0.00 | NaN | 0.0 | 21.2 | 23.0 | 13.0 | 142.9 | 1029.6 | 1.0 | 9.9 | 114.0 | 9.8 | 5.0 | 07:26:12 | 1547043972 | 16:59:11 | 1547078351 | 0.12 | Clear | Clear conditions throughout the day. | clear-day | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | obs | Vail | NaN | NaN |
2019-01-10 | 1547103600 | 33.7 | 17.0 | 26.4 | 33.7 | 9.8 | 22.6 | 14.3 | 60.6 | 0.026 | 100.0 | 12.50 | ['rain', 'snow'] | 0.8 | 21.4 | 17.2 | 8.8 | 323.7 | 1023.3 | 39.9 | 8.3 | 75.9 | 6.6 | 4.0 | 07:26:01 | 1547130361 | 17:00:11 | 1547164811 | 0.16 | Snow, Rain, Partially cloudy | Partly cloudy throughout the day with late afternoon rain or snow. | snow | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | obs | Vail | NaN | NaN |
Null Values of Initial Weather Data
Column | Null Count |
---|---|
datetime | 0 |
datetimeEpoch | 0 |
tempmax | 0 |
tempmin | 0 |
temp | 0 |
feelslikemax | 0 |
feelslikemin | 0 |
feelslike | 0 |
dew | 0 |
humidity | 0 |
precip | 0 |
precipprob | 0 |
precipcover | 0 |
preciptype | 305269 |
snow | 0 |
snowdepth | 0 |
windgust | 1550 |
windspeed | 0 |
winddir | 0 |
pressure | 0 |
cloudcover | 0 |
visibility | 37524 |
solarradiation | 10 |
solarenergy | 10 |
uvindex | 10 |
sunrise | 0 |
sunriseEpoch | 0 |
sunset | 0 |
sunsetEpoch | 0 |
moonphase | 0 |
conditions | 0 |
description | 0 |
icon | 0 |
stations | 0 |
source | 0 |
resort | 0 |
tzoffset | 276275 |
severerisk | 421370 |
Snippet of Initial Station Data
distance | latitude | longitude | useCount | id | name | quality | contribution | resort |
---|---|---|---|---|---|---|---|---|
33491.0 | 39.560 | -105.986 | 0 | SODC2 | SODA CREEK CO US | 0 | 0.0 | Vail |
26493.0 | 39.467 | -106.150 | 0 | 72206103038 | RED CLIFF PASS, CO US | 99 | 0.0 | Vail |
35635.0 | 39.927 | -106.545 | 0 | DYGC2 | DRY GULCH CO US | 0 | 0.0 | Vail |
26228.0 | 39.470 | -106.150 | 0 | KCCU | KCCU | 99 | 0.0 | Vail |
39498.0 | 39.891 | -106.037 | 0 | KSEC2 | KEYSER RIDGE CO US | 0 | 0.0 | Vail |
63875.0 | 39.220 | -106.870 | 0 | KASE | KASE | 100 | 0.0 | Vail |
63123.0 | 39.230 | -106.871 | 0 | 72467693073 | ASPEN PITKIN CO AIRPORT SARDY FIELD, CO US | 100 | 0.0 | Vail |
44625.0 | 40.040 | -106.370 | 0 | K20V | K20V | 98 | 0.0 | Vail |
47529.0 | 39.650 | -106.917 | 0 | 72467523063 | EAGLE CO AIRPORT, CO US | 100 | 0.0 | Vail |
53458.0 | 39.790 | -105.770 | 0 | K0CO | K0CO | 100 | 0.0 | Vail |
Null Values of Initial Station Data
Column | Null Count |
---|---|
distance | 0 |
latitude | 0 |
longitude | 0 |
useCount | 0 |
id | 0 |
name | 6 |
quality | 0 |
contribution | 0 |
resort | 0 |
Understanding the variables within the weather data is essential to not only applying the data, but also dealing with null values and outliers. Thankfully, visualcrossing provides a breakdown of the variables. The details are as follows:
Metrics:
Descriptions:
It should also be noted that the retrieved data was done on observations and hourly, so not all variables appear in the data.
Gaining an initial understanding of the data provided insight into columns not applicable to the scope of this timeframe or were essentially duplicate and inferior information of other columns.
Outliers were then searched for in some of the remaining numeric type columns.
Two main phenomenon were illuminated by this visualization. Namely:
After replacing the visibility outlier with a Null value instead of dropping the entire row, the null values were taken care of with methods which retained rows as well:
There were 2 list type columns which required unpacking.
For preciptype, MultiLabelBinarizer()
from scikit-learn was used to
create numeric representations of booleans in their own columns. In other words, it was encoded.
For stations, given there were thousands, an encoding approach might not be the best at this point. The column in list type format
was left in. However, a basket type dataframe was created and saved from this for possible later use in an apriori method or model.
After the steps throughout the script were taken, an acceptable Weather dataset was formed.
Snippet of Final Weather Data
datetime | tempmax | tempmin | temp | feelslikemax | feelslikemin | feelslike | dew | humidity | precip | precipprob | precipcover | snow | snowdepth | windgust | windspeed | winddir | pressure | cloudcover | visibility | solarradiation | solarenergy | uvindex | sunrise | sunset | moonphase | icon | stations | resort | tzoffset | severerisk | type_freezingrain | type_ice | type_none | type_rain | type_snow |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2019-01-01 | 16.4 | 2.0 | 7.0 | 10.2 | -13.3 | -0.6 | -1.2 | 69.1 | 0.008 | 100.0 | 20.83 | 0.0 | 20.7 | 18.30000 | 10.6 | 4.9 | 1014.5 | 59.0 | 8.6 | 116.8 | 9.9 | 5.0 | 07:26:20 | 16:51:51 | 0.85 | snow | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | Vail | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 1 |
2019-01-02 | 24.3 | -0.9 | 11.4 | 21.9 | -11.9 | 5.4 | -10.5 | 39.9 | 0.004 | 100.0 | 4.17 | 0.0 | 20.8 | 29.77377 | 8.7 | 353.1 | 1021.4 | 0.0 | 9.9 | 121.6 | 10.7 | 5.0 | 07:26:27 | 16:52:41 | 0.89 | snow | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | Vail | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 1 |
2019-01-03 | 29.0 | 5.3 | 17.6 | 21.9 | -4.0 | 8.6 | 4.1 | 56.1 | 0.004 | 100.0 | 4.17 | 0.2 | 20.8 | 32.20000 | 9.8 | 328.9 | 1024.7 | 0.0 | 9.8 | 123.3 | 10.6 | 5.0 | 07:26:31 | 16:53:33 | 0.92 | snow | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | Vail | 0.0 | 0.0 | 0 | 0 | 0 | 1 | 1 |
2019-01-04 | 34.0 | 11.9 | 23.4 | 28.7 | 3.4 | 17.1 | 7.0 | 50.4 | 0.001 | 100.0 | 4.17 | 0.1 | 20.8 | 20.80000 | 9.0 | 311.0 | 1025.5 | 0.0 | 9.9 | 123.7 | 10.7 | 5.0 | 07:26:34 | 16:54:26 | 0.96 | snow | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | Vail | 0.0 | 0.0 | 0 | 0 | 0 | 1 | 1 |
2019-01-05 | 34.1 | 14.3 | 27.1 | 29.4 | 4.3 | 20.1 | 1.9 | 33.9 | 0.001 | 100.0 | 4.17 | 0.0 | 20.4 | 20.80000 | 10.1 | 243.5 | 1022.2 | 19.4 | 9.7 | 110.3 | 9.6 | 5.0 | 07:26:34 | 16:55:20 | 0.00 | rain | ['72467523063', '72206103038', 'CACMC', 'DYGC2', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | Vail | 0.0 | 0.0 | 0 | 0 | 0 | 1 | 1 |
2019-01-06 | 29.9 | 18.5 | 25.9 | 22.4 | 5.1 | 16.1 | 18.1 | 72.5 | 0.035 | 100.0 | 58.33 | 0.6 | 20.6 | 33.30000 | 16.9 | 266.7 | 1009.3 | 78.7 | 6.3 | 47.3 | 4.1 | 2.0 | 07:26:32 | 16:56:16 | 0.02 | snow | ['72467523063', '72206103038', 'CACMC', '72038500419', 'DYGC2', 'KCCU', 'KEGE', 'A0000594076', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | Vail | 0.0 | 0.0 | 0 | 0 | 0 | 1 | 1 |
2019-01-07 | 24.8 | 14.7 | 20.3 | 12.8 | 2.5 | 6.5 | 13.7 | 75.2 | 0.004 | 100.0 | 8.33 | 0.4 | 21.3 | 45.70000 | 27.9 | 271.2 | 1015.6 | 83.7 | 4.8 | 35.8 | 3.0 | 2.0 | 07:26:27 | 16:57:13 | 0.06 | snow | ['72467523063', '72206103038', 'CACMC', 'DYGC2', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | Vail | 0.0 | 0.0 | 0 | 0 | 0 | 0 | 1 |
2019-01-08 | 34.6 | 17.2 | 25.1 | 34.6 | 5.0 | 17.8 | 12.0 | 59.5 | 0.013 | 100.0 | 8.33 | 0.0 | 21.3 | 27.70000 | 15.2 | 312.1 | 1029.4 | 34.5 | 9.5 | 122.9 | 10.5 | 5.0 | 07:26:21 | 16:58:11 | 0.09 | rain | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | Vail | 0.0 | 0.0 | 0 | 0 | 0 | 1 | 1 |
2019-01-09 | 38.3 | 23.0 | 28.6 | 38.3 | 13.6 | 22.6 | 9.9 | 45.4 | 0.000 | 0.0 | 0.00 | 0.0 | 21.2 | 23.00000 | 13.0 | 142.9 | 1029.6 | 1.0 | 9.9 | 114.0 | 9.8 | 5.0 | 07:26:12 | 16:59:11 | 0.12 | clear-day | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | Vail | 0.0 | 0.0 | 0 | 0 | 1 | 0 | 0 |
2019-01-10 | 33.7 | 17.0 | 26.4 | 33.7 | 9.8 | 22.6 | 14.3 | 60.6 | 0.026 | 100.0 | 12.50 | 0.8 | 21.4 | 17.20000 | 8.8 | 323.7 | 1023.3 | 39.9 | 8.3 | 75.9 | 6.6 | 4.0 | 07:26:01 | 17:00:11 | 0.16 | snow | ['72467523063', '72206103038', 'CACMC', 'KCCU', 'KEGE', 'KLXV', 'DJTC2', 'K20V', '72467393009'] | Vail | 0.0 | 0.0 | 0 | 0 | 0 | 1 | 1 |
Null Values of Final Weather Data
Column | Null Count |
---|---|
datetime | 0 |
tempmax | 0 |
tempmin | 0 |
temp | 0 |
feelslikemax | 0 |
feelslikemin | 0 |
feelslike | 0 |
dew | 0 |
humidity | 0 |
precip | 0 |
precipprob | 0 |
precipcover | 0 |
snow | 0 |
snowdepth | 0 |
windgust | 0 |
windspeed | 0 |
winddir | 0 |
pressure | 0 |
cloudcover | 0 |
visibility | 0 |
solarradiation | 0 |
solarenergy | 0 |
uvindex | 0 |
sunrise | 0 |
sunset | 0 |
moonphase | 0 |
icon | 0 |
stations | 0 |
resort | 0 |
tzoffset | 0 |
severerisk | 0 |
type_freezingrain | 0 |
type_ice | 0 |
type_none | 0 |
type_rain | 0 |
type_snow | 0 |