1、NN查找问题n给定空间中的一点q,从数据库中找出离q最近的点n例如:例如:n找出离某家剧院最近的饭店n找出离空中某点最近的星球n找出和某张照片最相似的照片1简单方法:n逐点比较nmin=n从集合中取任一点p,求p和q间的距离n If,小于min,则更新minn缺点:效率低2改进策略n对数据库中的数据建索引,如R-tree.n采用排除法,排除掉一些根本不可能成为NN的点。3R-Tree:一种多维空间索引技术n空间数据库n多媒体数据库图象、文本、视频等特征向量 相似性查询n数据挖掘n聚类、异常检测等n空间数据挖掘n多媒体数据挖掘nCAD、分子生物学等4传统的索引方法n哈希表 n数值的精确匹配 n不
2、能进行范围查询 nB-Treesn键值的一维排序 n不能搜索多维空间5Tree-Structured IndexingnTree-structured indexing techniques support both range searches and equality searchesnindex file may still be quite large.But we can apply the idea repeatedly!Data pages6R-Tree Index StructurenA spatial database consists of a collection of
3、objects representing spatial objects,where each object has a unique identifier,which can be used to retrieve it.nAn R-tree is a height-balanced tree similar to a B-tree,with index records in its leaf nodes containing pointers to data objects.nNodes correspond to disk pages if the index is disk-resid
4、ent.7We will transverse through the R-Tree to give some idea of its structureR-tree:Structure(I)8R1We first move to the first layer of the R-tree which consist of the two nodes R1 and R2.R2R-tree:Structure(II)9R1Then we move to the second layer under node R1 which consist of node R3,R4 and R5R2R3R4R
5、5R-tree:Structure(III)10R1Then we move to the third layer under node R3 which consist of node R8,R9 and R10R2R3R4R5R-tree:Structure(III)11Assuming we want to find all objects lying within the yellow query box Q1R-tree:Searching(I)Q112R1We first move to the first layer of the R-tree and observe that
6、the Q1 intercept both R1 and R2.R2R-tree:Search(II)Q113R1Then we move to the second layer of both R1 and R2 and observe that the query box only intercept node R3,R4 and R5.Thus we will not access the children of R6 and R7.R2R3R4R5R-tree:Search(III)Q114R1Then we move to the third layer under R3,R4,R5
7、 and find the only R10,R11,R12,R13 and R14 intercept Q1.Thus we will only access the data objects under these five nodesR2R3R4R5R-tree:Search(IV)Q115Maintenance of R-treenVery much like the maintenance of B+Tree.Take average O(log n)to search,insert and delete.nConcern when splitting nodes:Need to e
8、nsure that the bounding box have the smallest area16使用R-tree进行NN查找nConstruct a index such as R-tree for the datanTo prune off those MBRs which can never contain a nearest neighbor.17R-tree:Finding Nearest NeighbornFirst problem to be solve:nGiven a query point p and a minimum bounding rectangle(MBR)
9、find an upper bound and lower bound for the distance r such that there exists at least one point in the MBR that is within a distance of r from p.nWhy?nTo facilitate some form of ranking to prune off uninteresting MBRsnEg.we can prune off MBR3 and MBR4.MBRprdistanceMBR1MBR3MBR2MBR418R-tree:Finding
10、Lower Bound(MINDIST)MBR(R)pmindistpmindistpmindistt1s1s2t2Note:p is located at(p1,p2)i.e p1 and p2 are the x-y coordinates!19R-tree:Finding Upper Bound(I)nMost trivial solution:Find the point in MBR which is furthest awaynBut we can do better with a certain observation!MBR(R)pmaxdistt1s1s2t220R-tree
11、Finding Upper Bound(II)nImportant observation:Every face of any MBR should contains at least one point in the DB.MBR(R)assuming this is the leftmost pointthen left border can be shifted here21R-tree:Finding Upper Bound(II)nImportant observation:Every face of any MBR should contains at least one poi
12、nt in the DB.MBR(R)assuming this is the leftmost point222324R-tree:Finding Upper Bound(III)nWe can now find a better upper bound based on this property.MBR(R)pt1s1We know that there is at least one point,q,along this face.How do we obtain the maximum distance between p and q?s2t2Answer:Assume q is h
13、ereq25R-tree:Finding Upper Bound(IV)nFind the upper bound MINMAXDIST by performing the following steps:nFor each face of the MBR,pick the location q that is furthest away from p and insert dist(p,q)into a set QSETnPick the minimum out of QSET to be MINMAXDIST,the upper bound for r such that there ex
14、ist at least one point q from the MBR such that dist(p,q)MINMAXDIST(P,R)nDownward pruning:An object O is discarded if there exists an R such that ACTUAL-DIST(P,O)MINIMAXDIST(P,R)nUpward pruning:An MBR R is discarded if an object O is found such that MINDIST(P,R)ACTUAL_DIST(P,O)1-NN Algorithm for R-t
15、ree(I)291-NN Algorithm for R-tree(II)nbest first traversal of the nodes in the R-tree.nA heap is maintained for storing every MBR301-NN Algorithm for R-tree(III)nThe algorithm first starts from the root MBR of the R-tree and places its children MBRs into the heap nWithin the heap,the MBRs are ordere
16、d based on ascending MinDist311-NN Algorithm for R-tree(IV)nthen repeated recursively beginning from the MBR at the top of the heap again by taking its children MBRs and inserting them into the heap.nThe algorithm maintains a variable Best which is initially set to and are updated later32Input:An R-
17、tree R,query point pOutput:NN of p1.Best:=;E=2.Insert the root MBR of R-tree in the heap;3.WHILE exist MBRs in heap DO4.Extract an MBR R in heap;5.IF MinDist(R;p)Best 6.THEN Exit;/Handle next R on heap7.ELSE IF MinMaxDist(R;p)Best8.THEN Best=MinMaxDist(R;p);9.IF R is not a leaf MBR THEN10.Extract al
18、l children of R and add to heap,sort the heap;11.ELSE add all points in R to E12.Compute nearest neighbor q among points in E;13.Output q;33k-Nearest Neighbors nDefinition:Given a query point p,and a distance function dist(),let qk be a point in the database such that count(q|dist(p,q)dist(p,qk),q D
19、)=k-1 The k-nearest neighbors of p are all points q such that dist(p,q)most dense pointnWhat does it mean if NN(p)is not in RNN(p)?What does it mean if NN(p)is in RNN(p)?38数据集n采用数据生成器,产生三种类型数据集:nCorrelatednIndependentnanti-correlated,39数据集n参数:n维数:n数据分布:E,C,An数据大小:40作业步骤n将数据集作为输入,构造R-treen将R-tree和随机查询点p作为输入,查找NN(10分)、查找k-NN(15分)、查找满足某些限制条件的K-NN,如找出离地球至少10光年远的4个最近的星球(20分)41n下载R-tree源代码:http:/ 在该页面上找到在该页面上找到C&C+Code,再,再找到Download R-Tree Templated C+version n注意:要装MSVC 6 SP5 或 MSVC.Net 2003n介绍R-tree的文章:R-Tree:A dynamic index structure for spatial searching,sigmod 198442