Association Rule Mining

Companies Know What You Buy

French toast is one of America’s favorite breakfast foods. It’s delicious and can be easily prepared at home using a variety of techniques and toppings. Even though it can be prepared a number of ways, almost all French toast recipes call for at least three things:

  1. bread
  2. milk
  3. eggs

If you’re going to make French toast, you’re going to need bread, you’re going to need milk, and you’re going to need eggs. What does French toast have to do with big data?

Association Rule Mining

An association rule is a link between one set of items and another. Specifically, association rules identify instances in which the appearance of one set of items (the antecedent) imply that another set of items (the consequent) will also appear.

For example:

{X, Y} ⇒ {Z}

This rule can be read as, “If the antecedents (X and Y) appear then it is likely that the consequent (Z) will also appear.”

By using association rules, we can group items together logically and attempt to make predictions. By tracking each of these transactions, tabulating them, and then discovering which pairs (or larger groups) of columns correlate often with one another, association rules may be generated to capture these correlations in the data. This applies to French toast preparation.

For example:

  • If most people who buy milk, bread, and eggs also buy maple syrup, then association rule mining might turn up the following rule:
    • {milk, bread, eggs} ⇒ {syrup}

Walmart can now target store patrons who purchase milk, bread, and eggs to gently suggest that they might like to also buy syrup. The computerized storefront (or physical storefront with a layout determined by computational data mining) does not know that these patrons may be making French toast, they merely have developed association rules to guide product placement. The process of association rule mining is basically “How Target Figured Out a Teen Girl was Pregnant.”

Instructions

Your group has been hired by Data Market, a corporation seeking to open a new chain of stores in your region. Their goal is to provide customers with optimal arrangements of store products, in an attempt to minimize the time and effort required to shop.

You will design a mock store product placement scheme—driven by data collection from competitors’ stores in the area. Use the receipts provided by your teacher (1) to generate association rules that map potentially correlated products, and then (2) sketch an endcap for data-driven product placement targeting potential shoppers in the area.

As you extract data from the receipts, consider the following guiding questions:

  1. What is the best way to use the provided table to organize your data collection?
  2. What trends do you find in the data?
  3. Are there any negative associations between products?
  4. What is the ideal size for sets of antecedents/consequents?
  5. What additional information might be helpful?
  6. Can you imagine scenarios in which sets of products are grouped together?