weka –Apriori算法关联规则挖掘实验

发布时间：2020-12-14 03:33:54 所属栏目：大数据来源：网络整理

导读：? 一、Apriori算法参数含义本次共进行了9组实验，使用了weka安装目录data文件夹下的contact-lenses.arff数据。 ? ToolsàArffViewer，打开contact-lenses，可以看到实验数据contact-lenses共有24条记录，5个属性值。具体内容如下：结合实验结果阐释下列12

一、Apriori算法参数含义

本次共进行了9组实验，使用了weka安装目录data文件夹下的contact-lenses.arff数据。

ToolsàArffViewer，打开contact-lenses，可以看到实验数据contact-lenses共有24条记录，5个属性值。具体内容如下：

结合实验结果阐释下列12个参数的含义

1.????????car?如果设为真，则会挖掘类关联规则而不是全局关联规则。

2.????????classindex?类属性索引。如果设置为-1，最后的属性被当做类属性。

3.????????delta?以此数值为迭代递减单位。不断减小支持度直至达到最小支持度或产生了满足数量要求的规则。

4.????????lowerBoundMinSupport?最小支持度下界。

5.????????metricType?度量类型。设置对规则进行排序的度量依据。可以是：置信度（类关联规则只能用置信度挖掘），提升度(lift)，杠杆率(leverage)，确信度(conviction)。

在?Weka中设置了几个类似置信度(confidence)的度量来衡量规则的关联程度，它们分别是：

a)????????Lift?：?P(A,B)/(P(A)P(B)) Lift=1时表示A和B独立。这个数越大(>1)，越表明A和B存在于一个购物篮中不是偶然现象,有较强的关联度.

b)????????Leverage :P(A,B)-P(A)P(B)

Leverage=0时A和B独立，Leverage越大A和B的关系越密切

c)????????Conviction:P(A)P(!B)/P(A,!B)?（!B表示B没有发生）?Conviction也是用来衡量A和B的独立性。从它和lift的关系（对B取反，代入Lift公式后求倒数）可以看出，这个值越大,A、B越关联。

6.????????minMtric?度量的最小值。

7.????????numRules?要发现的规则数。

8.????????outputItemSets?如果设置为真，会在结果中输出项集。

9.????????removeAllMissingCols?移除全部为缺省值的列。

10.????significanceLevel?重要程度。重要性测试（仅用于置信度）。

11.????upperBoundMinSupport?最小支持度上界。?从这个值开始迭代减小最小支持度。

12.????verbose?如果设置为真，则算法会以冗余模式运行。

二、实验结果及分析

1.?以其中一组实验为例做详细分析

具体参数设置如下图：

?

完整的实验结果输出及具体分析

=== Run information ===?????//?实验运行信息

Scheme:???????weka.associations.Apriori -I -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.5 -S -1.0 -c -1

Relation:?????contact-lenses???????//数据的名称?contact-lenses

Instances:????24?????????//数据的记录数?24

Attributes:???5?????????属性数目?5以及各属性名称

??????????????age

??????????????spectacle-prescrip

??????????????astigmatism

??????????????tear-prod-rate

??????????????contact-lenses

=== Associator model (full training set) ===

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%??scheme -所选的关联规则挖掘方案：?Apriori算法

%??算法的参数设置：-I -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.5 -S -1.0 -c -1 ;

各参数依次表示：

%??I -?输出项集，若设为false则该值缺省;

%??N 10 -?规则数为10;

%??T 0 –?度量单位选为置信度，(T1-提升度，T2杠杆率，T3确信度);

%??C 0.9 –?度量的最小值为0.9;

%??D 0.05 -?递减迭代值为0.05;

%??U 1.0 -?最小支持度上界为1.0;

%??M 0.5 -?最小支持度下届设为0.5;

%??S -1.0 -?重要程度为-1.0;

%??c -1 -?类索引为-1输出项集设为真

%??(由于car,removeAllMissingCols,verbose都保持为默认值False，因此在结果的参数设置为缺省，若设为True，则会在结果的参数设置信息中分别表示为A,R,V)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Apriori??// Apriori算法运行结果

=======

Minimum support: 0.5 (12 instances)??//最小支持度0.5，即最少需要12个实例

Minimum metric <confidence>: 0.9???最小度量<置信度>: 0.9

Number of cycles performed: 10????进行了10轮搜索

Generated sets of large itemsets:?????生成的频繁项集

Size of set of large itemsets L(1): 7?????频繁1项集：7个

Large Itemsets L(1):????项集(outputItemSets设为True,?因此下面会具体列出)

spectacle-prescrip=myope 12

spectacle-prescrip=hypermetrope 12

astigmatism=no 12

astigmatism=yes 12

tear-prod-rate=reduced 12

tear-prod-rate=normal 12

contact-lenses=none 15

%%%%%%%%%%%%%%%%%%%%%%%%

在上面所示数据界面中，分别点击标签spectacle-prescrip，astigmatismtear-prod-rate和contact-lenses，该列的值会自动进行分类排序，可以很方便的对上面结果进行。点击age标签，其值按pre-presbiopic、presbiopicyoung分类排序，可以看到各属性值的记录数均为8<12，不满足最小支持度，因此属性的所有取值都没有列在上面结果中。

Size of set of large itemsets L(2): 1???2?1Large Itemsets L(2):

tear-prod-rate=reduced contact-lenses=none 12

//tear-prod-rate取值为reduced且?contact-lensesnone?的记录数共有Best rules found:????最佳关联规则

?1. tear-prod-rate=reduced 12 ==> contact-lenses=none 12????conf:(1)

若可以推出的取值为none，该关联规则置信度为100%

2．其它实验设置及部分结果展示

1．?实验中，若其它参数保持为默认值，将最小支持度下界设为0.8，则运行结果会显示”No large itemsets and rules found!”，即找不到满足条件的关联规则。

2．?若其它参数保持为默认值，将最小支持度下界设为0.25，上界设为0.8，度量选为置信度，最小值为0.8，则运行结果找到：频繁1项集10个，频繁2项集18个，频繁3项集4个，找到的最佳关联规则为：

1. tear-prod-rate=reduced 12 ==> contact-lenses=none 12????conf:(1)

?2. spectacle-prescrip=myope tear-prod-rate=reduced 6 ==> contact-lenses=none 6????conf:(1)

?3. spectacle-prescrip=hypermetrope tear-prod-rate=reduced 6 ==> contact-lenses=none 6????conf:(1)

?4. astigmatism=no tear-prod-rate=reduced 6 ==> contact-lenses=none 6????conf:(1)

?5. astigmatism=yes tear-prod-rate=reduced 6 ==> contact-lenses=none 6????conf:(1)

?6. spectacle-prescrip=myope contact-lenses=none 7 ==> tear-prod-rate=reduced 6????conf:(0.86)

?7. astigmatism=no contact-lenses=none 7 ==> tear-prod-rate=reduced 6????conf:(0.86)

?8. contact-lenses=none 15 ==> tear-prod-rate=reduced 12????conf:(0.8)

3．?若其它参数保持为默认值，将最小支持度下界设为0.25，上界设为0.8，度量选为提升度(Lift?：?P(A,B)/(P(A)P(B)))，最小值为1.1，则运行结果找到10条最佳关联规则，前3条如下：

1. tear-prod-rate=reduced 12 ==> spectacle-prescrip=myope contact-lenses=none 6????conf:(0.5) < lift:(1.71)> lev:(0.1) [2] conv:(1.21)

?2. spectacle-prescrip=myope contact-lenses=none 7 ==> tear-prod-rate=reduced 6????conf:(0.86) < lift:(1.71)> lev:(0.1) [2] conv:(1.75)

?3. tear-prod-rate=reduced 12 ==> astigmatism=no contact-lenses=none 6????conf:(0.5) < lift:(1.71)> lev:(0.1) [2] conv:(1.21)

4．?若其它参数保持为默认值，将最小支持度下界设为0.25，上界设为0.8，度量选为杠杆率(Leverage:P(A,B)-P(A)P(B),在下面第一条规则中，[4]表示满足lev:(0.19)的实例数目)，最小值为0.1，则运行结果找到6条最佳关联规则，前3条如下：

1. tear-prod-rate=reduced 12 ==> contact-lenses=none 12????conf:(1) lift:(1.6) < lev:(0.19) [4]> conv:(4.5)

2. contact-lenses=none 15 ==> tear-prod-rate=reduced 12????conf:(0.8) lift:(1.6) < lev:(0.19) [4]> conv:(1.88)

3. tear-prod-rate=reduced 12 ==> spectacle-prescrip=myope contact-lenses=none 6???

5．?若其它参数保持为默认值，将最小支持度下界设为0.25，上界设为0.8，度量选为确信度(Conviction:P(A)P(!B)/P(A,!B))，最小值为1.1，则运行结果找到10条最佳关联规则，前3条如下：

1. tear-prod-rate=reduced 12 ==> contact-lenses=none 12????conf:(1) lift:(1.6) lev:(0.19) [4] < conv:(4.5)>

2. spectacle-prescrip=myope tear-prod-rate=reduced 6 ==> contact-lenses=none 6????conf:(1) lift:(1.6) lev:(0.09) [2] < conv:(2.25)>

3. spectacle-prescrip=hypermetrope tear-prod-rate=reduced 6 ==> contact-lenses=none 6????conf:(1) lift:(1.6) lev:(0.09) [2] < conv:(2.25)>

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!