《人工智能与数据挖掘教学课件》lect-3-12.ppt
《《人工智能与数据挖掘教学课件》lect-3-12.ppt》由会员分享,可在线阅读,更多相关《《人工智能与数据挖掘教学课件》lect-3-12.ppt(26页珍藏版)》请在三一文库上搜索。
1、2019/6/30,AI&DM,1,Chapter 3 Basic Data Mining Techniques,3.1 Decision Trees (For classification),2019/6/30,AI&DM,2,Introduction: ClassificationA Two-Step Process,1. Model construction: build a model that can describe a set of predetermined classes Preparation: Each tuple/sample is assumed to belong
2、to a predefined class, labeled by the output attribute or class label attribute This set of examples is used for model construction: training set The model can be represented as classification rules, decision trees, or mathematical formulae Estimate accuracy of the model The known label of test samp
3、le is compared with the classified result from the model Accuracy rate is the percentage of testing set samples that are correctly classified by the model Note: Test set is independent of training set, otherwise over-fitting will occur 2. Model usage: use the model to classify future or unknown obje
4、cts,2019/6/30,AI&DM,3,Classification Process (1): Model Construction,Training Data,Classification Algorithms,IF rank = professor OR years 6 THEN tenured = yes,Classifier (Model),Classification Process (2): Use the Model in Prediction,Classifier,Testing Data,Unseen Data,(Jeff, Professor, 4),Tenured?,
5、2019/6/30,AI&DM,5,1 Example (1): Training Dataset,An example from Quinlans ID3 (1986),2019/6/30,AI&DM,6,1 Example (2): Output: A Decision Tree for “buys_computer”,age?,overcast,student?,credit rating?,no,yes,fair,excellent,=30,40,no,no,yes,yes,yes,3040,2019/6/30,AI&DM,7,2 Algorithm for Decision Tree
6、 Building,Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examples are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on se
7、lected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) Conditions for stopping partitioning All samples for a given node belong to the same class There are no remaining attributes for further partitioning majority voting is employed
8、 for classifying the leaf There are no samples left Reach the pre-set accuracy,2019/6/30,AI&DM,8,Information Gain (信息增益)(ID3/C4.5),Select the attribute with the highest information gain Assume there are two classes, P and N Let the set of examples S contain p elements of class P and n elements of cl
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 人工智能与数据挖掘教学课件 人工智能 数据 挖掘 教学 课件 lect 12
链接地址:https://www.31doc.com/p-3045864.html