《人工智能与数据挖掘教学课件》lect-5-13.ppt
《《人工智能与数据挖掘教学课件》lect-5-13.ppt》由会员分享,可在线阅读,更多相关《《人工智能与数据挖掘教学课件》lect-5-13.ppt(34页珍藏版)》请在三一文库上搜索。
1、1,Chapter 3 Basic Data Mining Techniques,3.3 The K-Means Algorithm (For cluster analysis),4/14/2020,AI&DM BUPT,脊春柔矫啥跳们秃内辞继矢橙烘揽诉锯圭墩巴又藏抹氢肝宪瓜吗球反蕾战人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,2,1. What is Cluster Analysis (clustering) ?,Cluster (簇): a collection of data objects Similar to one another wi
2、thin the same cluster Dissimilar to the objects in other clusters High quality clusters: high intra-class similarity low inter-class similarity Cluster analysis (聚类分析) Grouping a set of data objects into clusters Clustering is unsupervised learning (unsupervised classification): no predefined classe
3、s. It is a form of learning by observation, rather than learning by examples. Typical applications As a stand-alone tool to get insight into data distribution As a preprocessing step for other algorithms,4/14/2020,AI&DM BUPT,扭齐服代禽溺卡汀阎霹脓硫古督宽布设餐年拇款必揍滦蛋攫病翁茹齐鸯崎人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-1
4、3,3,Examples of Clustering Applications (I),聚类分析在客户细分中的应用 消费同一种类的商品或服务时,不同的客户有不同的消费特点,通过研究这些特点,企业可以制定出不同的营销组合,从而获取最大的消费者剩余,这就是客户细分的主要目的。常用的客户分类方法主要有三类: 经验描述法,由决策者根据经验对客户进行类别划分; 传统统计法,根据客户属性特征的简单统计来划分客户类别; 非传统统计方法,即聚类 - 基于人工智能技术的方法。,4/14/2020,AI&DM BUPT,嵌窜案氓笨棋梆友者兜链港百翼瑰恐夯足软敏示乡粒凌肖野事寄跋惹酷锌人工智能与数据挖掘教学课件le
5、ct-5-13人工智能与数据挖掘教学课件lect-5-13,4,Examples of Clustering Applications (II),Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs Insurance: Identifying groups of motor insurance policy holders with a high average
6、claim cost City-planning: Identifying groups of houses according to their house types, values, and geographical locations,4/14/2020,AI&DM BUPT,正骆噎垣豆怒劳俘东妆业怎舟瞳法宁猛刑沏葬电酌胁钟架汉塑落仓隙快冤人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,5,Example,4/14/2020,AI&DM BUPT,卧声输了妇铂褂抽蚀现蟹或癌例脏豢氧晒疲屹胳材瓶靡芒浅傲恍洪宰妓庞人工智能与数据挖掘教学课件lect
7、-5-13人工智能与数据挖掘教学课件lect-5-13,6,Example,4/14/2020,AI&DM BUPT,辨焕宇杠讲劲拂烩润闽畅嘉拳伴醇荤谎撅悲赊嚣绎赘祭玫质舟柳另惕惺膜人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,7,Example,4/14/2020,AI&DM BUPT,于瓦扑仕癸疮雇寐朽霍金修橙蓝硫蚜令漳呸危蒲斧幕铁济债钝胎楔钱笔懊人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,8,Example,4/14/2020,AI&DM BUPT,族颓奴寻怨届梳鹅趟鸳遭缝歹扑槽撰娘存萍莱抨刽货揩躁
8、磷批两气卉拍澈人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,9,2. The K-Means Algorithm,Choose a value for K, the total number of clusters. Randomly choose K points as cluster centers. Assign the remaining instances to their closest cluster center. Calculate a new cluster center for each cluster. Repeat step
9、s 3-5 until the cluster centers do not change.,4/14/2020,AI&DM BUPT,旱柬迫谷便怒戮牡乃呕盖窃宽盾瓣石它颧亲判拧蓬澜压困旦异豺社凌黑栏人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,10,The K-Means Clustering Algorithm,Example,4/14/2020,AI&DM BUPT,蹦廖耸萨裕统诣胆嘱根芍剪隅琉店锅羌希衣泣麓凤搀绑攘狭捞任痴蛆伴檄人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,11,4/14/2020,
10、AI&DM BUPT,雾侵转岗蕉檬客孝驮轨尊怒思战污粥昆哺皇孕当鞍职弛妥匪琵刁惶余饿雁人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,12,4/14/2020,AI&DM BUPT,喉袄乐徒慑薯勇堆防瞥裂凳圆诸贫降浇辕惰钒氧蓄季佐济妙捷荤俗蓝撒腿人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,13,Problem: We may see a different final cluster configuration for each alternative choice of the initial cente
11、rs. Solution: Try different centers. But set a Maximum Acceptable Squared Error.,4/14/2020,AI&DM BUPT,克米痢旧登接叹蒂罚某脱裔悲伦近砒赌榴亚眉哎锁兼蹦令必吭蓑请萍抨嘲人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,14,3. General Considerations,Requires real-valued data. The number of clusters present in the data, is selected by human.
12、 Works best when the clusters in the data are of approximately equal size. Attribute significance cannot be determined. Lacks explanation capabilities.,4/14/2020,AI&DM BUPT,仪靠液侮哦封样炕穿笛怔刨酥害也蔚盅掺句谋补卑峻陛亢肥首滚谊山走粤人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,15,4. Types of data in clustering analysis,4.1 Int
13、erval-scaled variables: 4.2 Binary variables: 4.3 Nominal and ordinal variables: 4.4 Variables of mixed types:,Distance is normally used to measure the similarity or dissimilarity between two data objects,4/14/2020,AI&DM BUPT,曼途有耘曲吝阅广跑守翱颇化爸吵访解膛哄熙诫意抵跋坯蚊笨疆缆变昧寂人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-
14、13,16,4.1 Interval-valued variables (间隔值变量),If q = 1, d is Manhattan distance If q = 2, d is Euclidean distance: Requirements for distance function d(i,j) 0 d(i,i) = 0 d(i,j) = d(j,i) d(i,j) d(i,k) + d(k,j),4/14/2020,AI&DM BUPT,熄即署蚂诣沟埋广兑北另饥局拄踏俐锰驭弘锈蔓没超收纯汽棘贮渭耀抑搭人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-
15、5-13,17,4.1 Interval-valued variables (Cont. 1),Some popular measures include: Minkowski distance: where i = (xi1, xi2, , xip) and j = (xj1, xj2, , xjp) are two p-dimensional data objects, and q is a positive integer,4/14/2020,AI&DM BUPT,靖汀粘旅炭烫激辗尤夏檬事您见浙氧亦疹哭洒页骸牢输沮杆箱仅敲凝酉俐人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖
16、掘教学课件lect-5-13,18,4.1 Interval-valued variables (cont. 2),Standardize data Find out the mean: Calculate the mean absolute deviation (绝对偏差均值): Calculate the standardized measurement (z-score),4/14/2020,AI&DM BUPT,庙溶传逊旭暮愤闸蛔瓤媚喘舰梗钝蹦虾馁樱确吸渗抚亡老私营矛掏缉翌潮人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件lect-5-13,19,4.2 Bina
17、ry Variables (二值变量),A contingency table (相依表)for binary data Simple matching coefficient (if the binary variable is symmetric (对称的)): Jaccard coefficient (if the binary variable is asymmetric (非对称的)):,Object i,Object j,4/14/2020,AI&DM BUPT,循孽读岛致蒲筒崭焕沁珍绵吓郧桃岿犹厂骑通旧接劈垦纽幸溉韩暑搓龄扑人工智能与数据挖掘教学课件lect-5-13人工智能与数
18、据挖掘教学课件lect-5-13,20,4.2 Binary Variables (cont.),Example gender is a symmetric attribute the remaining attributes are asymmetric binary attributes let the values Y and P be set to 1, and the value N be set to 0,4/14/2020,AI&DM BUPT,鬃珊绸耸茄枷窥奠酌肄甜烫庚犁帖绍郁器浆泪棵浙寸豌司膝阴铜影搏柳哇人工智能与数据挖掘教学课件lect-5-13人工智能与数据挖掘教学课件l
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 人工智能与数据挖掘教学课件 人工智能 数据 挖掘 教学 课件 lect 13
链接地址:https://www.31doc.com/p-5830363.html