A Quick Association Rules Example within R

Association rules are used to decided what items would lead to other items’ purchase.

The practice is commonly known as market basket analysis due to the fact that it has been so frequently applied to supermarket data.

The dataset used here was adapted from the Groceries dataset in the arules R package.


library(arules)

## Loading required package: Matrix

##
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
##
##     abbreviate, write

items<-read.transactions("groceries.csv",sep=",")
#use summary to understand the data
# transactions numbers, most frequent items and itemsets per transaction
# with according transaction numbers
summary(items)

## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146
##
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda
##             2513             1903             1809             1715
##           yogurt          (Other)
##             1372            34055
##
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55
##   16   17   18   19   20   21   22   23   24   26   27   28   29   32
##   46   29   14   14    9   11    4    6    1    1    1    1    3    1
##
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   1.000   2.000   3.000   4.409   6.000  32.000
##
## includes extended item information - examples:
##             labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3   baby cosmetics

# a special format from arules
str(items)

## Formal class 'transactions' [package "arules"] with 3 slots
##   ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
##   .. .. ..@ i       : int [1:43367] 29 88 118 132 33 157 167 166 38 91 ...
##   .. .. ..@ p       : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
##   .. .. ..@ Dim     : int [1:2] 169 9835
##   .. .. ..@ Dimnames:List of 2
##   .. .. .. ..$ : NULL
##   .. .. .. ..$ : NULL
##   .. .. ..@ factors : list()
##   ..@ itemInfo   :'data.frame':  169 obs. of  1 variable:
##   .. ..$ labels: chr [1:169] "abrasive cleaner" "artif. sweetener" "baby cosmetics" "baby food" ...
##   ..@ itemsetInfo:'data.frame':  0 obs. of  0 variables

# use inspect to show certain purchases
inspect(items[1:5,])

##     items
## [1] {citrus fruit,
##      margarine,
##      ready soups,
##      semi-finished bread}
## [2] {coffee,
##      tropical fruit,
##      yogurt}
## [3] {whole milk}
## [4] {cream cheese,
##      meat spreads,
##      pip fruit,
##      yogurt}
## [5] {condensed milk,
##      long life bakery product,
##      other vegetables,
##      whole milk}

# use itemFrequency to identify specific frquency of items
#Using this with the itemFrequency() function allows us to
#see the proportion of transactions that contain the item.
#This allows us, for instance, to view the support level for
#the first three items in the grocery data
itemFrequency(items[, 1:3])

## abrasive cleaner artif. sweetener   baby cosmetics
##     0.0035587189     0.0032536858     0.0006100661

itemFrequencyPlot(items, support = 0.08)


# might need somr trial and error here to tune
# Think about the smallest number of transactions. Calculate the
#support level needed to find only the rules matching at least
#that many transactions
myrule<-apriori(items,parameter = list(support =
0.006, confidence = 0.25, minlen = 2))

## Apriori
##
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.25    0.1    1 none FALSE            TRUE       5   0.006      2
##  maxlen target   ext
##      10  rules FALSE
##
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
##
## Absolute minimum support count: 59
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [109 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [463 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

summary(myrule)

## set of 463 rules
##
## rule length distribution (lhs + rhs):sizes
##   2   3   4
## 150 297  16
##
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   2.000   2.000   3.000   2.711   3.000   4.000
##
## summary of quality measures:
##     support           confidence          lift
##  Min.   :0.006101   Min.   :0.2500   Min.   :0.9932
##  1st Qu.:0.007117   1st Qu.:0.2971   1st Qu.:1.6229
##  Median :0.008744   Median :0.3554   Median :1.9332
##  Mean   :0.011539   Mean   :0.3786   Mean   :2.0351
##  3rd Qu.:0.012303   3rd Qu.:0.4495   3rd Qu.:2.3565
##  Max.   :0.074835   Max.   :0.6600   Max.   :3.9565
##
## mining info:
##   data ntransactions support confidence
##  items          9835   0.006       0.25

# check certian rules' liftm higher better
inspect(sort(myrule,by="lift")[1:6])

##     lhs                   rhs                      support confidence     lift
## [1] {herbs}            => {root vegetables}    0.007015760  0.4312500 3.956477
## [2] {berries}          => {whipped/sour cream} 0.009049314  0.2721713 3.796886
## [3] {other vegetables,
##      tropical fruit,
##      whole milk}       => {root vegetables}    0.007015760  0.4107143 3.768074
## [4] {beef,
##      other vegetables} => {root vegetables}    0.007930859  0.4020619 3.688692
## [5] {other vegetables,
##      tropical fruit}   => {pip fruit}          0.009456024  0.2634561 3.482649
## [6] {beef,
##      whole milk}       => {root vegetables}    0.008032537  0.3779904 3.467851

# when weird rules appear check they are actionable, inexplicable and trivial.
# we subset rules with berries as lhs
berryrules <- subset(myrule, items %ain% c("berries") )
inspect(berryrules)

##     lhs          rhs                  support     confidence lift
## [1] {berries} => {whipped/sour cream} 0.009049314 0.2721713  3.796886
## [2] {berries} => {yogurt}             0.010574479 0.3180428  2.279848
## [3] {berries} => {other vegetables}   0.010269446 0.3088685  1.596280
## [4] {berries} => {whole milk}         0.011794611 0.3547401  1.388328

write(myrule,file="myrules.csv",sep=",",quote=T,row.names=F)
myruledf<-as(myrule,"data.frame")
str(myruledf)

## 'data.frame':    463 obs. of  4 variables:
##  $ rules     : Factor w/ 463 levels "{baking powder} => {other vegetables}",..: 340 302 207 206 208 341 402 21 139 140 ...
##  $ support   : num  0.00691 0.0061 0.00702 0.00773 0.00773 ...
##  $ confidence: num  0.4 0.405 0.431 0.475 0.475 ...
##  $ lift      : num  1.57 1.59 3.96 2.45 1.86 ...

Advertisements

1 thought on “A Quick Association Rules Example within R”

  1. […] We know several essential recommenders’ methods. If we want to recommend ourselves a book, we can do it 1. Based on our own exp 2. Based on our friends friends exp 3. Based on the catalog of the library 4. Based on the search engine’s result We already talked a little about the first method here:https://charleshsliao.wordpress.com/2017/03/06/an-quick-association-rules-example-within-r/ […]

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s