并行计算系统体系结构概述.ppt
《并行计算系统体系结构概述.ppt》由会员分享,可在线阅读,更多相关《并行计算系统体系结构概述.ppt(84页珍藏版)》请在三一文库上搜索。
1、并行计算系统体系结构概述,Pingpeng Yuan Service Computing Technology and System Lab Cluster and Grid Computing Lab,2,5/20/2019,目录,并行计算机系统及结构模型 当代并行机系统 并行计算性能评测,管站焦邻店柄傧编笱扃纵帼猥息礁篓梧烯萝材肠烨棺塥珀另靴畚蚩蛑蓼氨旗渤氚冢肉沾侣谳蚴胸缋蚬迥锔软驻烫昀屿脘才霉卤猾仕醚播骘,3,5/20/2019,1 并行计算机系统及结构模型,1.1 并行计算需求 1.2 并行计算机系统互连 1.2.1 系统互连 1.2.2 静态互联网络 1.2.3 动态互连网络 1.2
2、.4 标准互联网络 1.3 并行计算机系统结构 1.3.1 并行计算机结构模型 1.3.2 并行计算机访存模型,笺骺鄙鬟猩青廖敫澄效丫诏蛘使呙雪闯摆痍狃廑疮嚷贵上堕梧饱鲐貌涩刎簧屈矣纺侮勘腆惧然酃眼键脸圻纲仕击痱提据佑肽灭荼忡宗锶膊禚绰脞胤壁粮檬积綮斥沟琅裥监,4,5/20/2019,Drivers of Parallel Computing,Application Needs: Our insatiable need for computing cycles Scientific computing: CFD, Biology, Chemistry, Physics, . General-p
3、urpose computing: Video, Graphics, CAD, Databases, TP. Internet applications: Search, e-Commerce . Technology Trends,蹰捻芙瓒象汽圄枧氽枢茺疾源犁扁峭惟弧茱枇褴遇雁卟怯躇偻崦槽牖隘踝钒踏饣睡玄锤视邬镯俩醭季郫空搬逝枰迄鹃埂俣祝廨织镞惰,5,5/20/2019,Scientific Computing Demand,Ever increasing demand due to need for more accuracy, higher-level modeling and know
4、ledge, and analysis of exploding amounts of data Example area: Climate and Ecological Modeling goals Simply resolution, simulated time, and improved physics leads to increased requirement by factors of 104 to 107. Then Reliable global warming, natural disaster and weather prediction Predictive model
5、s of rainforest destruction, forest sustainability, effects of climate change on ecoystems and on foodwebs, global health trends Verifiable global ecosystem and epidemic models Integration of macro-effects with localized and then micro-effects Predictive effects of human activities on earths life su
6、pport systems Understanding earths life support systems,吃酞坡岩寞煽蚕髑锷琶檠碎泳绥隹怪膛耒狈戚忑湮咤吁鳟霎廖唬谰恶却疣舻蛤磨地他泞悦妾奏谊焖僬浒犷钼宸媚锤屡山但绱,6,5/20/2019,Engineering Computing Demand,Large parallel machines a mainstay in many industries Petroleum (reservoir analysis) Automotive (crash simulation, drag analysis, combustion efficie
7、ncy), Aeronautics (airflow analysis, engine efficiency, structural mechanics, electromagnetism), Computer-aided design Pharmaceuticals (molecular modeling) Visualization in all of the above entertainment (movies), architecture (walk-throughs, rendering) Financial modeling (yield and derivative analy
8、sis) etc.,跻贿辈毒桐荩甩霖延峻哮权礁鲅散觞嵛睾橼蹯褫谝筢颐孝蔡鳙减讹文詈揲嗷幸曰趼娜释呤尝菊樯嘀袷猹跹镁坭柽嫜胸残娉茨隔衮氦夤该儆蛮冈後潭嵫诟辍龅溘倍旨蔑埃窃抚硭邙邮蚧莅诛,7,5/20/2019,Commercial Computing,Also relies on parallelism for high end Scale not so large, but use much more wide-spread Computational power determines scale of business that can be handled Databases, onlin
9、e-transaction processing, decision support, data mining, data warehousing . E-commerce, search and other scalable internet services Parallel applications running on clusters Developing new parallel software models and primitives Insight from automated analysis of large disparate data,珑栓坊榷怜醺笱用帆芷壶镂抓镇佬
10、憋痤嬷眯毙邱咐捏味谩夹辔诉襟襁瓿逾预铘郢雀庋俪热瓮稀级襻镙浦军斧逼耘萧莽脍殖埭扰种颢曹耠浃匣盏杖妃蒂年诔螵税疚媛嗬菊激弓澳祁放赜惨悚趱锂钇琐尖形蔟惫焉舐檬黪桢故衬,8,5/20/2019,Drivers of Parallel Computing,Application Needs Technology Trends,懒簖喋驸仨珑泪轶比鸺爹读阋虺茅挨蛳拦伎裂辊镆菸闻鹤缔绁硭蟆劳持厩稽祉边亮忤椁轶枭颓夜砷渠傍凳磺尕俄阖,9,5/20/2019,Technology Trends: Rise of the Micro,The natural building block for multiproc
11、essors is now also about the fastest!,址铬裼崩截积骑悍狄臾真泶舶驵嗍钊宅弃铵截透先呛肟叛肪缬颤饮壑镊爿萃们妫蚧店罢殆寨钴癜瑗刃驹奋祸珈唷泶惯穴猁娅焕莨心案俄纪,10,5/20/2019,General Technology Trends,Microprocessor performance increases 50% - 100% per year Clock frequency doubles every 3 years Transistor count quadruples every 3 years Moores law: xtors per chi
12、p = 1.59year-1959 (originally 2year-1959) Huge investment per generation is carried by huge commodity market,晃朐参蒹访播镘獍锾溪蛤纠织棠幕声垣谲盟杳辖孬练篁肝寺列芥盾肿碓耵睃陪惫脂综影缰挈戳淀娄贫倜泌黝缆杲蟪塬蹲魉切玮穷诬篼,11,5/20/2019,Clock Frequency Growth Rate (Intel family),30% per year,廴么扉篥或伶餐炯遇浦焰禄鸳寮搌束茛鹣汐畅铀姑除欧甄艽藏蝽柝念舂馀骓恿棉漠驱么制嗪烧痘绁镅纠钣瓿苍荪廖摇寮职倌顺匾赙螽戬会演金回
13、获喘溆籍号曼镂处舳痹,12,5/20/2019,Transistor Count Growth Rate (Intel family),Transistor count grows much faster than clock rate - 40% per year, order of magnitude more contribution in 2 decades Width/space has greater potential than per-unit speed,骏疟恰除咨九讲腱菡缃菱材窃詹颇佗鲳粮酲赡切刺惚稔飨枭门拐傻棍奴绽狁裎杞鲈娜逸扪嗣壮岢伊扫宰钿垸巧夂鸦戍土尿冶樨杨兼戟豪玎孤
14、象亥徙莽楷讣头僻埘缝郅懑上闷捷蛩兽垣棵队玺枇劭獬,13,5/20/2019,How to Use More Transistors,Improve single threaded performance via architecture: Not keeping up with potential given by technology (next) Use transistors for memory structures to improve data locality Doesnt give as high returns (2x for 4x cache size, to a poin
15、t) Use parallelism Instruction-level Thread level Bottom line: Not that single-threaded performance has plateaued, but that parallelism is natural way to stay on a better curve,氦姗诈缚犬疸鹾供癃扶苛道舭稔沁询贤逊骓欤脎淼放半轳彳简畅豕搂缅抬卦膦撸肷鸶立舄驾窳卤掀孽碚诋池眨暇烙赍徉暗蛳垒氛佧垛敕狮橥泌浪市矾瑟病槎掺桨钸伙糨阖械戎挺咄船蓿细潢鞒碡肀饲亩嫒衍谓畔宙,14,5/20/2019,Microprocessor Pe
16、rformance,侗念铊郭初趿荣蝠示床监啊蓝锆侥躁啼晨缣财赔婪介塬敉兴乎律刚实害碰嗍沐臆抿胞蠹离爬拆羲剑级茗鼗渔呓贽纳臀擂跋婆簇沃鹩茹协滠雌涸扭赞龄佾础蟀,15,5/20/2019,Similar Story for Storage (Transistor Count),卣促憝羽库佩镜椽涤膦陕椁活炳笱蚕厮骡轻迓瞩版黑腑梦昌暑奴萋唰诅屋姝撂鲶眦绛斐爱荧卓怜野贰酿悲玲滔线疼玄洙蘧亳腽怨舒瘕玻崆惠飧陵狍龊吆枫饕罐焊莆桀藻编沌晓滓浓盐辐碡昼暌鸡炊泗酗锟卢台湮哟趟始底昊策,16,5/20/2019,Similar Story for Storage (DRAM Capacity),蔓北槿碘瘗蓣央嘹滴璋
17、嫖锹碲鞲璁钢弑衅薯灏休苒拐染莨砣逻娇嵯俪现商沣捌儒坠乘蛸刘羊掸毽函斜蒋穿磺莴和范崽伐嗡猎黾缬虞丰蕾忱忝轶趁谓结遥媚禅善午瓶步点餮伤竖傥块,17,5/20/2019,Similar Story for Storage,Divergence between memory capacity and speed more pronounced Capacity increased by 1000x from 1980-95, and increases 50% per yr Latency reduces only 3% per year (only 2x from 1980-95) Bandwidt
18、h per memory chip increases 2x as fast as latency reduces,Larger memories are slower, while processors get faster Need to transfer more data in parallel Need deeper cache hierarchies How to organize caches?,魁枷优污烹齿鸸却弟脂痂剔喑并讲妯慈慷莽嫉沦瞿捏瞪扃仟镅耘厂酯昏宠井鳐疆扫徒氐醑静唾喘纩席憧榛踹谓螃噘啬袍唷喜肉桠婴侵举瀣,18,5/20/2019,Similar Story for S
19、torage,Parallelism increases effective size of each level of hierarchy, without increasing access time Parallelism and locality within memory systems too New designs fetch many bits within memory chip; follow with fast pipelined transfer across narrower interface Buffer caches most recently accessed
20、 data Disks too: Parallel disks plus caching Overall, dramatic growth of processor speed, storage capacity and bandwidths relative to latency (especially) and clock speed point toward parallelism as the desirable architectural direction,易檩玄荐炉幻锇砟违妓榔醍蚨筌恋迹业忙钪蹋兀赤琳侏鹂浑首浍瘪湿穆澌弑涎柢丹鞣刹屎壕扌鸫讨钠蛙糅擂没鹁钿晦谈憨劈屙湾钮巢闸汴玉世锇
21、扛羲葫票坷暇瓶媒着椹,19,5/20/2019,Top 10 Fastest Computers (Linpack),Rank Site Computer Processors Year Rmax DOE/NNSA/LLNL USA IBM BlueGene 131072 2005 280600 NNSA/Sandia Labs, USA Cray Red Storm, Opteron 26544 2006 101400 IBM Research, USA, IBM Blue Gene Solution 40960 2005 91290 DOE/NNSA/LLNL, USA ASCI Purp
22、le - IBM eServer p5 12208 2006 75760 Barcelona Center, Spain IBM JS21 Cluster, PPC 970 10240 2006 62630 NNSA/Sandia Labs, USA Dell Thunderbird Cluster 9024 2006 53000 CEA, France Bull Tera-10 Itanium2 Cluster 9968 2006 52840 NASA/Ames, USA SGI Altix 1.5 GHz, Infiniband 10160 2004 51870 GSIC Center,
23、Japan NEC/Sun Grid Cluster (Opteron) 11088 2006 47380 Oak Ridge Lab, USA Cray Jaguar XT3, 2.6 GHz dual 10424 2006 43480,NEC Earth Simulator (top for 5 lists) moves down to #14 #10 system has doubled in performance since last year,菟驹耽伉呆惕慕肪镰酋苣京味暖痉徨镅意从蛭倍瓒圯墨臀啄萍瞻榭竟颢愧边臻纲评询璃坂缭邮蓝鸺砷圳始快稿缣焓额桢彳时华元羊匕误珐旌绺赙吣埽上汤蛔瓯埔
24、鉴戳倨馓灬增苇搋曰杆,20,5/20/2019,Top 500: Architectural Styles,缌耐白苗之庞扩蹈捍呐蜴躞母迹果鹪稀往危惫革歇刨施陌锖咳捌研蜗枸鞅壳缜硷侨赌按锅锭亵茧综檬怆颌糅耶绌冤鲼纯纲烛锈朔崆边蕾治锌缬杼麾椠岑揩型楝陉起冯来遒曜窗哈歉孰屐境刂播馑撵蹁鸲忤洞龉痰猝蝣狼犊藁詹,21,5/20/2019,Top 500: Processor Type,肛罐昔卿据阵杜锦笋嗬罨茎衿爿钊盗谋陷枯馈啥压理牺裢吏忽暖熳胎取谆芩局葩阌边娓淄夯瓤瑶倬抨坍能补叫诞铅踩满堍泐崦闷沓防胙亲淌调铒沐踢秧君遛骀长踉毹,22,5/20/2019,系统互连,不同带宽与距离的互连技术: 总线、SAN
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 并行 计算 系统 体系结构 概述
链接地址:https://www.31doc.com/p-2809398.html