基于IBD谱和基因组结构的复杂疾病相关分子标记识别的新策略.pdf
《基于IBD谱和基因组结构的复杂疾病相关分子标记识别的新策略.pdf》由会员分享,可在线阅读,更多相关《基于IBD谱和基因组结构的复杂疾病相关分子标记识别的新策略.pdf(4页珍藏版)》请在三一文库上搜索。
1、论著 Novel strategies to identify relevant molecular signatures for complex hu- man diseases based on data of identical-by-decent profiles and genomic context Chuan-xing LI1,Lei DU1,Xia LI1, 2*,Bin-sheng GONG1,Jie ZHANG1,Shao-qi RAO1, 3* (1. Department of Bioinformatics,Harbin Medical University,Harbi
2、n 150086,China;2. Department of Computer Sci- ence,Harbin Institute of Technology;3. Departments of Cardiovascular Medicine and Molecular Cardiology,Cleveland Clinic Foundation,Cleveland,Ohio 44195,USA) ABSTRACT Objective:To develop novel strategies to identify relevant molecular signatures for comp
3、lex human diseases based on data of identical-by-decent profiles and genomic context. Methods:In the pro- posed strategies,we define four relevancy criteria for mapping SNP-phenotype relationships-point-wise IBD mean difference,averaged IBD difference for window,Z curve and averaged slope for window
4、. Re- sults: Application of these criteria and permutation test to 100 simulated replicates for two hypothetical A- merican populations to extract the relevant SNPs for alcoholism based on sib-pair IBD profiles of pedigrees demonstrates that the proposed strategies have successfully identified most
5、of the simulated true loci. Con- clusion:The data mining practice implies that IBD statistic and genomic context could be used as the in- formatics for locating the underlying genes for complex human diseases. Compared with the classical Haseman-Elston sib-pair regression method,the proposed strateg
6、ies are more efficient for large-scale ge- nomic mining. KEY WORDS Polymorphism,single nucleotide;Medical informatics;Multifactorial inheritance;Ge- nome Single-nucleotide polymorphism( SNP)is the most widespread form of DNA polymorphism in human genome,thus permitting large-scale and high-density g
7、enome-wide profiling. SNPs are generally considered to be ideal genetic markers for genetic investigations, as they are common,stable and increasingly amenable to automated mining methods.Searching for disease relevant SNPs as the landmark (s)to locate disease gene (s)is a critical step for position
8、al cloning of the underlying molecular determinants for complex human traits. Many statistical methods have been developed for identification of disease relevant SNPs based on ei- ther population-based or pedigree-based data,yet no optimal method for analysis of high-dimension SNPs has been found so
9、 far 1. Many complex human diseases such as behaviors of alcoholism investigated by the Genetic Analysis Workshop 14(GAW 14,http: / / www. gaworkshop. org/ )are not simple Mendelian disorders.Instead, they may have mixed contributions of genes,environ- ments and their interactions. A sophisticated m
10、athe- matical model (s)is thus desirable to map the epidemi- ological complexities,but can be prohibitively com- plex. Recent advances in IBD linkage analysis,chro- mosome structure analysis( the Z curve method for computing the G + C content) 2,disease gene min- ing 3, 4,adjacent and co-expressed g
11、enes along chro- mosome discovery(sliding window method) 5,and permutation test 6 give us insights and alternative methods for large-scale association study. In principle, an ideal information measurement or statistic for corre- lations between molecular signatures and disease phe- notypes should be
12、 sought for capturing both the margin- al effects of a signature and its interactions with other feature variables such as nearby SNPs and environmen- tal risk factors. In this study,we define and evaluate several information criteria using both the IBD statistic and genomic context. Then,we apply t
13、hese criteria and permutation test to extract the relevant SNPs for al- coholism based on the simulated pedigree data for GAW14. 1 Materials and Methods Virtually,all the pedigree-based genetic analysis methods rely on the concept-resemblance between rela- tives. The degree of relation between pheno
14、typic re- semblance(e. g. as defined below for alcoholism)and genetic resemblance ( e. g.IBD sharing ) provide means of estimating the strength of association of a SNP (or other genetic variants)with the studied trait. We start with definitions of the resemblance measures for a sib pair. 1. 1 Defini
15、ng the phenotypic resemblance attribute of a sib-pair First,we define the phenotypic attribute of a sib pair,the affection status of a sib pair for alcoholism. For the binary trait,there are three possible attributes, of which two attributes are chosen to be the phenotypes Fund project (基金项目) : Supp
16、orted by the National Natural Sciences Foundation of China(30170515,30370798,30570424 and 30571034) ; The National High Technology Research and Development Program of China (2003AA2Z2051) ; 211 Project; The Tenth“Five-year”Plan; Harbin Medi- cal University and Heilongjiang Province Science and Techn
17、ology Key Project(GB03C602-4 and 1055HG009)国家自然科学基金; 国家高技术研究发 展计划专项经费资助; 国家 “211 工程” 学科建设项目; 国家 “十五” 科技攻关; 哈尔滨医科大学和黑龙江省攻关重点项目 Corresponding author s e-mail,Lixia ems. hrbmu. edu. cn * These authors contributed equally to this work 47 北京大学学报 (医学版) JOURNAL OF PEKING UNIVERSITY (HEALTH SCIENCES) Vol. 3
18、8 No. 1 Feb. 2006 for learning:concordant affected,both sibs in a sib pair are affected;and concordant unaffected,no sibs in a sib pair are affected. 1. 2 Defining the genetic features(genetic resem- blance measure)to be mined The genetic features are defined to be the esti- mated proportions of all
19、eles shared IBD by the sib pair at the SNP positions,computed by the GENIBD of the SAGE package 7. Because our main interest is to ex- plore the utility of the proposed analysis strategies for extracting useful genetic information from the large- scale SNP data,we did not model the second-moment qua
20、ntities of clinical covariates for the sib pairs. 1. 3 Four statistics for association between molecular signatures and phenotypes The IBD values can reflect the proportion of al- leles identical by descent at the putative locus,for sib- ling pairs. The higher the SNP IBD differences be- tween conco
21、rdant affected and concordant unaffected sib pairs are,the stronger the association between the SNP and the disease is. Here,we define four criteria to measure the association of molecular signatures with phenotypes. 1. 3. 1 IBD difference The IBD difference(DF)of a single marker(i)is the discrepanc
22、y in its two means of IBD values in all concordant affected and concordant unaffected sib pairs. It is determined by the equation: DFi= mean (IBDdisease i )- mean (IBDnormal i )(1) 1. 3. 2 Averaged IBD differences for window In ge- netic studies, it is well known that nearby SNP markers are not inde
23、pendent due to close linkage or linkage dis- equilibrium. Furthermore,increasing experimental evi- dence suggests that adjacent,co-expressed and func- tional associated genes are inclined to cluster along the chromosome. The averaged IBD differences for window (ADF)measure these IBD-based genomic co
24、ntexts, by taking into account the association of the signature with disease and its interaction effects with adjacent SNPs. ADF of the ith signature(ADFi)is the mean IBD differences of the SNPs within a window,which con- tains w markers and is centered by the ith signature. The ADF profile for sign
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 基于 IBD 基因组 结构 复杂 疾病 相关 分子 标记 识别 新策略
链接地址:https://www.31doc.com/p-3704175.html