Modeling - Association Rule Mining (ARM)


How Does Association Rule Mining Work?

Association rule mining is a technique used to find and quantify relationships within sets, specifically the occurrence of events together. A common use and colloquialism for ARM is market basket analysis, where items purchased together by customers are examined. The goal of this specific analysis is to provide an answer to the question “Given a limited number of items at a location, what items are most associated with each other?” In general, does a set or subset imply another set or subset? This is the idea of a rule. This can also be used for other purposes aside from customer-based studies, as this particular analysis will focus on.

Components of ARM


Association Rule Mining is a measure of cooccurence, not causality, and there are several key components to help analyze the findings of cooccurence.


The Transaction Type Dataset

Mimic a transaction type dataset of movie watchlists for 10 users.
['Interstellar', 'Click', 'The Lord of the Rings', 'Up', 'Scarface']
['The Martian', 'Die Hard', 'Beerfest']
['Dune', 'Scarface', 'Forest Gump']
['ET', 'Toy Story', 'Beerfest', 'Inception', 'Click']
['Interstellar', 'Inception', 'Raiders of the Lost Ark', 'Toy Story', 'Fight Club']
['The Martian', 'The Matrix']
['Shawshank Redemption', 'The Martian', 'Die Hard']
['Shawshank Redemption', 'The Martian']
['Up', 'ET']
['Toy Story', 'ET', 'Scarface', 'The Matrix', 'Inception']
Apriori Pruning Principal

Minimum support threshold of 0.20 for frequent itemsets.
support itemsets
0 0.2 frozenset({'Beerfest'})
1 0.2 frozenset({'Click'})
2 0.2 frozenset({'Die Hard'})
3 0.3 frozenset({'ET'})
4 0.3 frozenset({'Inception'})
5 0.2 frozenset({'Interstellar'})
6 0.3 frozenset({'Scarface'})
7 0.2 frozenset({'Shawshank Redemption'})
8 0.4 frozenset({'The Martian'})
9 0.2 frozenset({'The Matrix'})
10 0.3 frozenset({'Toy Story'})
11 0.2 frozenset({'Up'})
12 0.2 frozenset({'The Martian', 'Die Hard'})
13 0.2 frozenset({'Inception', 'ET'})
14 0.2 frozenset({'Toy Story', 'ET'})
15 0.3 frozenset({'Toy Story', 'Inception'})
16 0.2 frozenset({'The Martian', 'Shawshank Redemption'})
17 0.2 frozenset({'Toy Story', 'Inception', 'ET'})
Apriori Based Rule Generation

Miminum confidence threshold of 0.20 for assocation rules.
antecedents consequents antecedent support consequent support support confidence lift leverage conviction zhangs_metric
0 frozenset({'The Martian'}) frozenset({'Die Hard'}) 0.4 0.2 0.2 0.500000 2.500000 0.12 1.6 1.000000
1 frozenset({'Die Hard'}) frozenset({'The Martian'}) 0.2 0.4 0.2 1.000000 2.500000 0.12 inf 0.750000
2 frozenset({'Inception'}) frozenset({'ET'}) 0.3 0.3 0.2 0.666667 2.222222 0.11 2.1 0.785714
3 frozenset({'ET'}) frozenset({'Inception'}) 0.3 0.3 0.2 0.666667 2.222222 0.11 2.1 0.785714
4 frozenset({'Toy Story'}) frozenset({'ET'}) 0.3 0.3 0.2 0.666667 2.222222 0.11 2.1 0.785714
5 frozenset({'ET'}) frozenset({'Toy Story'}) 0.3 0.3 0.2 0.666667 2.222222 0.11 2.1 0.785714
6 frozenset({'Toy Story'}) frozenset({'Inception'}) 0.3 0.3 0.3 1.000000 3.333333 0.21 inf 1.000000
7 frozenset({'Inception'}) frozenset({'Toy Story'}) 0.3 0.3 0.3 1.000000 3.333333 0.21 inf 1.000000
8 frozenset({'The Martian'}) frozenset({'Shawshank Redemption'}) 0.4 0.2 0.2 0.500000 2.500000 0.12 1.6 1.000000
9 frozenset({'Shawshank Redemption'}) frozenset({'The Martian'}) 0.2 0.4 0.2 1.000000 2.500000 0.12 inf 0.750000
10 frozenset({'Toy Story', 'Inception'}) frozenset({'ET'}) 0.3 0.3 0.2 0.666667 2.222222 0.11 2.1 0.785714
11 frozenset({'Toy Story', 'ET'}) frozenset({'Inception'}) 0.2 0.3 0.2 1.000000 3.333333 0.14 inf 0.875000
12 frozenset({'Inception', 'ET'}) frozenset({'Toy Story'}) 0.2 0.3 0.2 1.000000 3.333333 0.14 inf 0.875000
13 frozenset({'Toy Story'}) frozenset({'Inception', 'ET'}) 0.3 0.2 0.2 0.666667 3.333333 0.14 2.4 1.000000
14 frozenset({'Inception'}) frozenset({'Toy Story', 'ET'}) 0.3 0.2 0.2 0.666667 3.333333 0.14 2.4 1.000000
15 frozenset({'ET'}) frozenset({'Toy Story', 'Inception'}) 0.3 0.3 0.2 0.666667 2.222222 0.11 2.1 0.785714


Frequency Count for Each Movie
Frequency Count for the Mimic Watchlist. (expand image)
Association Rules Visualization
Association Rules for the Mimic Watchlist. (expand image)


Applying Association Rule Mining

This analysis will focus on finding associations between categories returned by the Google Places API. The API itself returns a list of categories associated with each business.
The categories themselves will be analyzed, however, a few labels can be applied to the transaction type data as well to help identify associations. Namely,


Call Category will be of most interest, as this could provide insights of the Google Places API efficacy. The other labels could produce interesting relationships themselves, revealing associations seemingly not relevant during the initialization of the project.

A script for the detailed functions required for preparing data, applying apriori, and illustrating the results with a network can be found here. The application process script can be found here.


Data Preparation

Preparing data for this type of analysis consists of creating the initial transaction-type data, and then allowing for expansion into labels.
The general preparation process:

  1. Obtain the initial dataset with the transaction type data.
  2. Clean the initial dataset (namely steps that were previously applied to the main Google dataset):
    • Drop Duplicates
    • Use ast.literal_eval() to Ensure List Type.
    • Create Keys to Merge back with the Google Data for Label Expansion.
      • Merge back with the Google dataset.


Initial Dataset with Transaction Type Data

The initial dataset containing a column with list values required for the transaction data. Several cleaning steps mentioned above will prepare this data for merging in with the cleaned google places data.


latitude longitude name rating types total_ratings vicinity resort call_category price_level
39.639411 -106.367836 Manor Vail Lodge 4.7 ['bar', 'lodging', 'restaurant', 'food', 'point_of_interest', 'establishment'] 370.0 595 Vail Valley Drive, Vail Vail Restaurants NaN
39.641578 -106.371678 Gravity Haus Vail 4.4 ['gym', 'spa', 'lodging', 'restaurant', 'food', 'point_of_interest', 'health', 'establishment'] 256.0 352 East Meadow Drive, Vail Vail Restaurants NaN
39.642639 -106.377803 Leonora 4.3 ['restaurant', 'food', 'point_of_interest', 'establishment'] 167.0 16 Vail Road, Vail Vail Restaurants 3.0
39.638962 -106.369379 Larkspur Events & Dining 4.5 ['restaurant', 'food', 'point_of_interest', 'establishment'] 198.0 458 Vail Valley Drive, Vail Vail Restaurants 3.0
39.630370 -106.418694 Subway 2.7 ['meal_takeaway', 'restaurant', 'food', 'point_of_interest', 'establishment'] 105.0 2161 North Frontage Road West #11-12, Vail Vail Restaurants 1.0
39.640861 -106.374665 Sweet Basil 4.4 ['bar', 'restaurant', 'food', 'point_of_interest', 'establishment'] 838.0 193 Gore Creek Drive, Vail Vail Restaurants 3.0
39.640228 -106.374381 Elway's 4.3 ['bar', 'restaurant', 'food', 'point_of_interest', 'establishment'] 385.0 Located Upstairs in The Lodge at Vail, 174 Gore Creek Drive, Vail Vail Restaurants 4.0
39.643914 -106.390088 The Little Diner 4.7 ['restaurant', 'food', 'point_of_interest', 'store', 'establishment'] 1390.0 616 West Lionshead Circle, Vail Vail Restaurants 2.0
39.640248 -106.373333 Red Lion 3.9 ['bar', 'restaurant', 'food', 'point_of_interest', 'establishment'] 740.0 304 Bridge Street St.1, Vail Vail Restaurants 2.0
39.641490 -106.397471 Chicago Pizza 3.9 ['meal_delivery', 'meal_takeaway', 'restaurant', 'food', 'point_of_interest', 'establishment'] 216.0 1031 South Frontage Road West, Vail Vail Restaurants 1.0

Google Places Full Data

The cleaned final google places data used across this project. Will be merged into.


Latitude Longitude Name rating total_ratings Resort Call Category Initial Category Secondary Category Tertiary Category
39.639411 -106.367836 Manor Vail Lodge 4.7 370.0 Vail Restaurants bar lodging restaurant
39.641578 -106.371678 Gravity Haus Vail 4.4 256.0 Vail Restaurants gym spa lodging
39.642639 -106.377803 Leonora 4.3 167.0 Vail Restaurants restaurant food point_of_interest
39.638962 -106.369379 Larkspur Events & Dining 4.5 198.0 Vail Restaurants restaurant food point_of_interest
39.630370 -106.418694 Subway 2.7 105.0 Vail Restaurants meal_takeaway restaurant food
39.640861 -106.374665 Sweet Basil 4.4 838.0 Vail Restaurants bar restaurant food
39.640228 -106.374381 Elway's 4.3 385.0 Vail Restaurants bar restaurant food
39.643914 -106.390088 The Little Diner 4.7 1390.0 Vail Restaurants restaurant food point_of_interest
39.640248 -106.373333 Red Lion 3.9 740.0 Vail Restaurants bar restaurant food
39.641490 -106.397471 Chicago Pizza 3.9 216.0 Vail Restaurants meal_delivery meal_takeaway restaurant

Google Merged Data

The ARM-ready dataset. The main transaction data exists in one of the columns while the labels exist for use in expansion functions available in the functions script.


types Call Category Resort Country Pass Region
['bar', 'lodging', 'restaurant', 'food', 'point_of_interest', 'establishment'] Restaurants Vail United States Epic West
['gym', 'spa', 'lodging', 'restaurant', 'food', 'point_of_interest', 'health', 'establishment'] Restaurants Vail United States Epic West
['restaurant', 'food', 'point_of_interest', 'establishment'] Restaurants Vail United States Epic West
['restaurant', 'food', 'point_of_interest', 'establishment'] Restaurants Vail United States Epic West
['meal_takeaway', 'restaurant', 'food', 'point_of_interest', 'establishment'] Restaurants Vail United States Epic West
['bar', 'restaurant', 'food', 'point_of_interest', 'establishment'] Restaurants Vail United States Epic West
['bar', 'restaurant', 'food', 'point_of_interest', 'establishment'] Restaurants Vail United States Epic West
['restaurant', 'food', 'point_of_interest', 'store', 'establishment'] Restaurants Vail United States Epic West
['bar', 'restaurant', 'food', 'point_of_interest', 'establishment'] Restaurants Vail United States Epic West
['meal_delivery', 'meal_takeaway', 'restaurant', 'food', 'point_of_interest', 'establishment'] Restaurants Vail United States Epic West


Transaction-Type Data Isolated

A snippet of the transaction-type data isolated.


['bar', 'lodging', 'restaurant', 'food', 'point_of_interest', 'establishment']
['gym', 'spa', 'lodging', 'restaurant', 'food', 'point_of_interest', 'health', 'establishment']
['restaurant', 'food', 'point_of_interest', 'establishment']
['restaurant', 'food', 'point_of_interest', 'establishment']
['meal_takeaway', 'restaurant', 'food', 'point_of_interest', 'establishment']
['bar', 'restaurant', 'food', 'point_of_interest', 'establishment']
['bar', 'restaurant', 'food', 'point_of_interest', 'establishment']
['restaurant', 'food', 'point_of_interest', 'store', 'establishment']
['bar', 'restaurant', 'food', 'point_of_interest', 'establishment']
['meal_delivery', 'meal_takeaway', 'restaurant', 'food', 'point_of_interest', 'establishment']


Results

Using just the main transaction type data (i.e. no labels included), the Apriori Algorithm was ran to find frequent itemsets and then an Apriori Rule Based Algorithm was ran to find association rules.

Given that this was a large dataset, to capture as many frequent itemsets and association rules as possible, a low support threshold was used for the inital alogrithm and a low confidence threshold was used for the secondary algorithm. The final association rules can always be reduced via filtering on different thresholds if required. The dataset being rather large is relevant to support since this is an initial measure on proportion in relation to the entire dataset. Rarer occurences would be pruned if not.


In reference to creating rules with low thresholds, the rules can now be sorted to examine important findings by metric.

antecedents consequents antecedent support consequent support support confidence lift leverage conviction zhangs_metric
0 frozenset({'point_of_interest'}) frozenset({'establishment'}) 1.000000 1.000000 1.000000 1.000000 1.0 0.0 inf 0.0
1 frozenset({'establishment'}) frozenset({'point_of_interest'}) 1.000000 1.000000 1.000000 1.000000 1.0 0.0 inf 0.0
2 frozenset({'point_of_interest'}) frozenset({'food'}) 1.000000 0.371505 0.371505 0.371505 1.0 0.0 1.0 0.0
3 frozenset({'food'}) frozenset({'establishment', 'point_of_interest'}) 0.371505 1.000000 0.371505 1.000000 1.0 0.0 inf 0.0
4 frozenset({'establishment'}) frozenset({'food', 'point_of_interest'}) 1.000000 0.371505 0.371505 0.371505 1.0 0.0 1.0 0.0
5 frozenset({'point_of_interest'}) frozenset({'establishment', 'food'}) 1.000000 0.371505 0.371505 0.371505 1.0 0.0 1.0 0.0
6 frozenset({'establishment', 'point_of_interest'}) frozenset({'food'}) 1.000000 0.371505 0.371505 0.371505 1.0 0.0 1.0 0.0
7 frozenset({'food', 'point_of_interest'}) frozenset({'establishment'}) 0.371505 1.000000 0.371505 1.000000 1.0 0.0 inf 0.0
8 frozenset({'establishment', 'food'}) frozenset({'point_of_interest'}) 0.371505 1.000000 0.371505 1.000000 1.0 0.0 inf 0.0
9 frozenset({'food'}) frozenset({'establishment'}) 0.371505 1.000000 0.371505 1.000000 1.0 0.0 inf 0.0
10 frozenset({'establishment'}) frozenset({'food'}) 1.000000 0.371505 0.371505 0.371505 1.0 0.0 1.0 0.0
11 frozenset({'food'}) frozenset({'point_of_interest'}) 0.371505 1.000000 0.371505 1.000000 1.0 0.0 inf 0.0
12 frozenset({'point_of_interest', 'store'}) frozenset({'establishment'}) 0.331246 1.000000 0.331246 1.000000 1.0 0.0 inf 0.0
13 frozenset({'establishment'}) frozenset({'store'}) 1.000000 0.331246 0.331246 0.331246 1.0 0.0 1.0 0.0
14 frozenset({'store'}) frozenset({'establishment', 'point_of_interest'}) 0.331246 1.000000 0.331246 1.000000 1.0 0.0 inf 0.0

antecedents consequents antecedent support consequent support support confidence lift leverage conviction zhangs_metric
0 frozenset({'establishment', 'point_of_interest', 'restaurant'}) frozenset({'food'}) 0.241776 0.371505 0.241776 1.000000 2.691751 0.151955 inf 0.828904
1 frozenset({'establishment', 'restaurant'}) frozenset({'food'}) 0.241776 0.371505 0.241776 1.000000 2.691751 0.151955 inf 0.828904
2 frozenset({'food'}) frozenset({'point_of_interest', 'restaurant'}) 0.371505 0.241776 0.241776 0.650801 2.691751 0.151955 2.171324 1.000000
3 frozenset({'food'}) frozenset({'restaurant'}) 0.371505 0.241776 0.241776 0.650801 2.691751 0.151955 2.171324 1.000000
4 frozenset({'restaurant'}) frozenset({'food'}) 0.241776 0.371505 0.241776 1.000000 2.691751 0.151955 inf 0.828904
5 frozenset({'restaurant'}) frozenset({'food', 'point_of_interest'}) 0.241776 0.371505 0.241776 1.000000 2.691751 0.151955 inf 0.828904
6 frozenset({'establishment', 'point_of_interest', 'food'}) frozenset({'restaurant'}) 0.371505 0.241776 0.241776 0.650801 2.691751 0.151955 2.171324 1.000000
7 frozenset({'food', 'point_of_interest'}) frozenset({'restaurant'}) 0.371505 0.241776 0.241776 0.650801 2.691751 0.151955 2.171324 1.000000
8 frozenset({'restaurant'}) frozenset({'establishment', 'food'}) 0.241776 0.371505 0.241776 1.000000 2.691751 0.151955 inf 0.828904
9 frozenset({'food'}) frozenset({'establishment', 'restaurant'}) 0.371505 0.241776 0.241776 0.650801 2.691751 0.151955 2.171324 1.000000
10 frozenset({'point_of_interest', 'restaurant'}) frozenset({'food'}) 0.241776 0.371505 0.241776 1.000000 2.691751 0.151955 inf 0.828904
11 frozenset({'establishment', 'food'}) frozenset({'restaurant'}) 0.371505 0.241776 0.241776 0.650801 2.691751 0.151955 2.171324 1.000000
12 frozenset({'establishment', 'food'}) frozenset({'point_of_interest', 'restaurant'}) 0.371505 0.241776 0.241776 0.650801 2.691751 0.151955 2.171324 1.000000
13 frozenset({'establishment', 'restaurant'}) frozenset({'food', 'point_of_interest'}) 0.241776 0.371505 0.241776 1.000000 2.691751 0.151955 inf 0.828904
14 frozenset({'restaurant'}) frozenset({'establishment', 'point_of_interest', 'food'}) 0.241776 0.371505 0.241776 1.000000 2.691751 0.151955 inf 0.828904

antecedents consequents antecedent support consequent support support confidence lift leverage conviction zhangs_metric
0 frozenset({'establishment', 'point_of_interest', 'store', 'supermarket'}) frozenset({'grocery_or_supermarket'}) 0.026149 0.043170 0.026149 1.0 23.164454 0.025020 inf 0.982522
1 frozenset({'food', 'convenience_store', 'drugstore'}) frozenset({'health'}) 0.011377 0.208704 0.011377 1.0 4.791464 0.009002 inf 0.800401
2 frozenset({'food', 'convenience_store', 'drugstore'}) frozenset({'establishment'}) 0.011377 1.000000 0.011377 1.0 1.000000 0.000000 inf 0.000000
3 frozenset({'finance', 'food', 'store'}) frozenset({'establishment', 'point_of_interest'}) 0.012611 1.000000 0.012611 1.0 1.000000 0.000000 inf 0.000000
4 frozenset({'convenience_store', 'drugstore'}) frozenset({'establishment', 'food'}) 0.011377 0.371505 0.011377 1.0 2.691751 0.007150 inf 0.635727
5 frozenset({'finance', 'food', 'point_of_interest', 'store'}) frozenset({'establishment'}) 0.012611 1.000000 0.012611 1.0 1.000000 0.000000 inf 0.000000
6 frozenset({'finance', 'food', 'store', 'establishment'}) frozenset({'point_of_interest'}) 0.012611 1.000000 0.012611 1.0 1.000000 0.000000 inf 0.000000
7 frozenset({'establishment', 'convenience_store', 'drugstore'}) frozenset({'health'}) 0.011377 0.208704 0.011377 1.0 4.791464 0.009002 inf 0.800401
8 frozenset({'convenience_store', 'drugstore', 'health'}) frozenset({'establishment'}) 0.011377 1.000000 0.011377 1.0 1.000000 0.000000 inf 0.000000
9 frozenset({'convenience_store', 'drugstore'}) frozenset({'establishment', 'health'}) 0.011377 0.208704 0.011377 1.0 4.791464 0.009002 inf 0.800401
10 frozenset({'drugstore', 'pharmacy'}) frozenset({'point_of_interest', 'store', 'health'}) 0.011465 0.037878 0.011465 1.0 26.400466 0.011031 inf 0.973280
11 frozenset({'establishment', 'convenience_store', 'drugstore'}) frozenset({'point_of_interest'}) 0.011377 1.000000 0.011377 1.0 1.000000 0.000000 inf 0.000000
12 frozenset({'convenience_store', 'point_of_interest', 'drugstore'}) frozenset({'establishment'}) 0.011377 1.000000 0.011377 1.0 1.000000 0.000000 inf 0.000000
13 frozenset({'point_of_interest', 'drugstore', 'pharmacy'}) frozenset({'store', 'health'}) 0.011465 0.037878 0.011465 1.0 26.400466 0.011031 inf 0.973280
14 frozenset({'store', 'drugstore', 'pharmacy'}) frozenset({'point_of_interest', 'health'}) 0.011465 0.208704 0.011465 1.0 4.791464 0.009072 inf 0.800473

antecedents consequents antecedent support consequent support support confidence lift leverage conviction zhangs_metric
0 frozenset({'grocery_or_supermarket', 'store', 'supermarket'}) frozenset({'food', 'establishment'}) 0.026149 0.371505 0.026149 1.0 2.691751 0.016434 inf 0.645370
1 frozenset({'finance', 'convenience_store', 'atm'}) frozenset({'establishment', 'store'}) 0.011641 0.331246 0.011641 1.0 3.018903 0.007785 inf 0.676631
2 frozenset({'cafe', 'bakery'}) frozenset({'point_of_interest', 'store'}) 0.011156 0.331246 0.011156 1.0 3.018903 0.007461 inf 0.676299
3 frozenset({'convenience_store', 'atm'}) frozenset({'finance', 'food', 'establishment'}) 0.011641 0.012744 0.011641 1.0 78.470588 0.011493 inf 0.998885
4 frozenset({'cafe', 'point_of_interest', 'bakery'}) frozenset({'store'}) 0.011156 0.331246 0.011156 1.0 3.018903 0.007461 inf 0.676299
5 frozenset({'establishment', 'convenience_store', 'point_of_interest', 'atm'}) frozenset({'finance'}) 0.011641 0.013449 0.011641 1.0 74.354098 0.011485 inf 0.998171
6 frozenset({'establishment', 'convenience_store', 'atm'}) frozenset({'finance', 'point_of_interest'}) 0.011641 0.013449 0.011641 1.0 74.354098 0.011485 inf 0.998171
7 frozenset({'convenience_store', 'point_of_interest', 'atm'}) frozenset({'finance', 'establishment'}) 0.011641 0.013449 0.011641 1.0 74.354098 0.011485 inf 0.998171
8 frozenset({'convenience_store', 'atm'}) frozenset({'finance', 'establishment', 'point_of_interest'}) 0.011641 0.013449 0.011641 1.0 74.354098 0.011485 inf 0.998171
9 frozenset({'restaurant', 'cafe', 'point_of_interest', 'bakery'}) frozenset({'food', 'store'}) 0.010495 0.172017 0.010495 1.0 5.813381 0.008689 inf 0.836765
10 frozenset({'finance', 'establishment', 'convenience_store', 'atm'}) frozenset({'store'}) 0.011641 0.331246 0.011641 1.0 3.018903 0.007785 inf 0.676631
11 frozenset({'establishment', 'convenience_store', 'store', 'atm'}) frozenset({'finance'}) 0.011641 0.013449 0.011641 1.0 74.354098 0.011485 inf 0.998171
12 frozenset({'establishment', 'convenience_store', 'atm'}) frozenset({'finance', 'store'}) 0.011641 0.012964 0.011641 1.0 77.136054 0.011490 inf 0.998662
13 frozenset({'food', 'convenience_store', 'atm'}) frozenset({'finance', 'establishment'}) 0.011641 0.013449 0.011641 1.0 74.354098 0.011485 inf 0.998171
14 frozenset({'convenience_store', 'store', 'atm'}) frozenset({'finance', 'establishment'}) 0.011641 0.013449 0.011641 1.0 74.354098 0.011485 inf 0.998171

antecedents consequents antecedent support consequent support support confidence lift leverage conviction zhangs_metric
0 frozenset({'food', 'drugstore'}) frozenset({'point_of_interest', 'convenience_store', 'store', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
1 frozenset({'food', 'store', 'drugstore'}) frozenset({'establishment', 'convenience_store', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
2 frozenset({'food', 'point_of_interest', 'store', 'drugstore'}) frozenset({'establishment', 'convenience_store', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
3 frozenset({'food', 'drugstore', 'establishment'}) frozenset({'convenience_store', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
4 frozenset({'food', 'drugstore'}) frozenset({'establishment', 'convenience_store', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
5 frozenset({'food', 'point_of_interest', 'drugstore'}) frozenset({'convenience_store', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
6 frozenset({'food', 'drugstore'}) frozenset({'establishment', 'convenience_store', 'point_of_interest', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
7 frozenset({'food', 'drugstore'}) frozenset({'convenience_store', 'point_of_interest', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
8 frozenset({'food', 'store', 'drugstore', 'establishment'}) frozenset({'convenience_store', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
9 frozenset({'food', 'drugstore', 'establishment'}) frozenset({'convenience_store', 'store', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
10 frozenset({'food', 'drugstore', 'establishment'}) frozenset({'point_of_interest', 'convenience_store', 'store', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
11 frozenset({'food', 'store', 'drugstore'}) frozenset({'establishment', 'convenience_store', 'point_of_interest', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
12 frozenset({'food', 'point_of_interest', 'drugstore'}) frozenset({'establishment', 'convenience_store', 'store', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
13 frozenset({'food', 'establishment', 'store', 'point_of_interest', 'drugstore'}) frozenset({'convenience_store', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727
14 frozenset({'food', 'store', 'drugstore', 'establishment'}) frozenset({'convenience_store', 'point_of_interest', 'health'}) 0.011597 0.011641 0.011377 0.980989 84.268406 0.011242 51.987671 0.999727


The top rules without applying the lift parameter show that Establishment and Point of Interest are very common, if not in every rule. Therefore, by applying the lift parameter of greater than 1, rules begin to show associations with more signficant results. In fact, when sorted by descending lift values itself, these illustrate some of the most significant associations.

To further illustrate these associations, networks visualizations were created. Note that these networks are interactive and contain hover information.



Label Expansion - Call Categories

Associations between the returned categories do reveal interesting assocations in there own right. However, insight can be gained into the efficacy of the Google Places API by appending the call category label to the datasets. In other words, when the API was called with a specific business category in mind, what actually was returned?

For this process, the label was appended to the transaction-type data and association rules were made again with the same low thresholds to capture as many associations as possible. Once the rules were created, the antecedents were reduced to only rules with the call category as a single antecedent.

Label Expansion - Call Categories Assocation Rules Snippet (not sorted)

This a snippet of the assocaition rules results from this process (note this is not sorted in any manner):
antecedents consequents antecedent support consequent support support confidence lift leverage conviction zhangs_metric
frozenset({'call_grocery'}) frozenset({'atm'}) 0.150631 0.012435 0.010980 0.072892 5.861883 0.009107 1.065211 0.976497
frozenset({'call_restaurants'}) frozenset({'bakery'}) 0.198033 0.020945 0.012788 0.064574 3.082947 0.008640 1.046640 0.842473
frozenset({'call_bars'}) frozenset({'bar'}) 0.067863 0.115001 0.067863 1.000000 8.695552 0.060059 inf 0.949430
frozenset({'call_restaurants'}) frozenset({'bar'}) 0.198033 0.115001 0.045507 0.229793 1.998176 0.022733 1.149040 0.622898
frozenset({'call_spas'}) frozenset({'beauty_salon'}) 0.075756 0.029897 0.028839 0.380675 12.732968 0.026574 1.566388 0.996992
frozenset({'call_restaurants'}) frozenset({'cafe'}) 0.198033 0.032058 0.023768 0.120018 3.743829 0.017419 1.099957 0.913871
frozenset({'call_bars'}) frozenset({'establishment'}) 0.067863 1.000000 0.067863 1.000000 1.000000 0.000000 inf 0.000000
frozenset({'call_bars'}) frozenset({'food'}) 0.067863 0.371505 0.038540 0.567901 1.528649 0.013328 1.454516 0.371005
frozenset({'call_bars'}) frozenset({'point_of_interest'}) 0.067863 1.000000 0.067863 1.000000 1.000000 0.000000 inf 0.000000
frozenset({'call_bars'}) frozenset({'restaurant'}) 0.067863 0.241776 0.034174 0.503574 2.082810 0.017766 1.527364 0.557729


Additionally, a network illustration with the same interactivity and hover information as with the previous visuals. Note that the call categories were colored differently, and when a return category had an association with multiple call categories, the call category which appeared as the antecedent the majority of the time was responsible for the coloring of the return category node.


(expand image)

As seen with the unlabled networks, Point of Interest and Establishment are central returns for the majority of the categories. However, this also illustrates that there are significant associations between what was called within the API and categories that could be expected as returns.

In other words, this shows that the the Google Places API did perform well in the case of properly returning business types based on a call.


Association Rule Mining Insights

Several results were found within the Google data via Assocation Rule Mining. Most notably:

  • Lift needed to be filtered due the common presence of Establishment and Point of Interest. Given that the search parameters within the API calls were essentially for these overarching categories, this is not surprising, but it also doesn't reveal patterns within the data.
  • After lift was filtered to reveal rules with lift greater than 1 (i.e. a signficant positive correlation), then patterns were able to be revealed. Notably, showing that the top 15 rules changed given the metric.
  • The Top 15 Rules by Support mainly contained rules between the categories Food and Restaurant.
    • This reveals that the categories of Food and Restaurant have a high proportion throughout the data.
  • The Top 15 Rules by Confidence mainly contained rules between categories involving the Store category.
    • This reveals that the category of Store has a high conditional occurence in the rules throughout the data.
  • Top 15 Rules by Lift mainly contained rules between categories involving the Convenience Store and Health categories.
    • This reveals that the categories of Convenience Store and Health have highly significant association rules throughout the data.
  • In the case of expanding the transaction-type data by the label of Call Category, this reassured the efficacy of the Google Places API, showing significant associations between expected return categories. Additionally, this revealed associations between different call categories and return categories. There were multiple return categories which had associations stemming from multiple call categories.
    • Viewing the hover information from this network, acceptable average lift of the nodes can be affirmed.
    • Again, average support isn't displayed due to the size of the dataset and its association with the overall proportion of rules.


Conclusion

Associated Rule Mining, although more commonly applied in market basket analysis with transaction specific data, can be quite useful in finding associations and relationships across many applications. For example, categorizing businesses near ski resorts using Google Place’s application interface. Different types of main business categories are surrounding ski resorts, such as Restaurants, Bars, Shopping Centers, and Medical Services. The actual businesses may contain different subcategories within Google Place’s interface. By looking at the associations between main categories and subcategories of businesses surrounding ski resorts, patterns begin to emerge. For instance, businesses categorized as Food and Restaurant establishments are highly prevalent within this area. In general, Stores will be in the area given that other businesses are nearby. Additionally, where there’s Food and Stores, there’s a general trend of Convenience Stores and Health centers associated with the ski resort location. In summary, multiple businesses offering general amenities are almost certain to exist in these locations. Where there is one, there is likely many.