分类高级课题AdvancedTopicsonClassification教学课件.ppt
Advanced Topics on Classification,Quan Zou (邹 权) (Ph.D.& Assistant Professor),2019/4/7,http:/datamining.xmu.edu.cn,2/20,Outline,Imbalance Binary Classification Multi Class, Multi Label Classification Multi Instance Classification Semi-supervised and Transductive Classification Ensemble Learning Others,2019/4/7,http:/datamining.xmu.edu.cn,3/20,Imbalance binary classification,Application: Credit Card Cheat Spam Identification Finding Oil Bioinformatics,2019/4/7,http:/datamining.xmu.edu.cn,4/20,Imbalance binary classification,Strategy of sampling Over-sampling Under-sampling Random-sampling Special-sampling (SMOTE) Strategy of cost Equal to above One-class leaning,2019/4/7,http:/datamining.xmu.edu.cn,5/20,Multi Class, Multi Label,Multi Class One vs One (time consuming) One vs All (imbalance) Tree Multi Label JRS (http:/tunedit.org/challenge/JRS12Contest) Text, Image Classification KNN meka, mulan,2019/4/7,http:/datamining.xmu.edu.cn,6/20,mulan,2019/4/7,http:/datamining.xmu.edu.cn,7/20,2019/4/7,http:/datamining.xmu.edu.cn,8/20,meka,2019/4/7,http:/datamining.xmu.edu.cn,9/20,Multi Instance Classification,Drug Design, Image Understanding Package, Instance DD,2019/4/7,http:/datamining.xmu.edu.cn,10/20,2019/4/7,http:/datamining.xmu.edu.cn,11/20,Semi-supervised and Transductive Classification,Semi-supervised Classification Unlabeled samples are important Co-training and Tri-training,2019/4/7,http:/datamining.xmu.edu.cn,12/20,Transductive Classification,2019/4/7,http:/datamining.xmu.edu.cn,13/20,2019/4/7,http:/datamining.xmu.edu.cn,14/20,Ensemble learning,Bagging,2019/4/7,http:/datamining.xmu.edu.cn,15/20,Ensemble learning,Boosting,2019/4/7,http:/datamining.xmu.edu.cn,16/20,Ensemble learning,Random Forest,2019/4/7,http:/datamining.xmu.edu.cn,17/20,Ensemble learning for Class Imbalance Problem,2019/4/7,http:/datamining.xmu.edu.cn,18/20,2019/4/7,http:/datamining.xmu.edu.cn,19/20,Strategy First, the negative set is divided randomly into several subsets equally. Every subset together with the positive set is a class balance training set. Then several different classifiers are selected and trained with these balance training sets. They will vote for the last prediction when facing new samples. The samples will be added to the next two classifiers training sets if they are misclassified. Reference 邹权, 郭茂祖, 刘扬, 王峻. 类别不平衡的分类方法及在生物信息学中的应用. 计算机研究与发展. 2010,47(8):1407-1414 X.-Y. Liu, J. Wu, and Z.-H. Zhou. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, 2009, 39(2): 539-550,2019/4/7,http:/datamining.xmu.edu.cn,20/20,Others,Active learning Lazy learning Parallel learning (mahout) Optimization Features Selection (GA) Parameters Tune (Grid, PSO),2019/4/7,http:/datamining.xmu.edu.cn,21/20,Thanks for patience,Email: zouquanxmu.edu.cn,