资源预览内容
第1页 / 共18页
第2页 / 共18页
第3页 / 共18页
第4页 / 共18页
第5页 / 共18页
第6页 / 共18页
第7页 / 共18页
第8页 / 共18页
第9页 / 共18页
第10页 / 共18页
亲,该文档总共18页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述
CHAPTER CONTENTS12.1 Genome annotation12.2 Studies of the transcriptome and proteomeChapter contents207Chapter 12Studying GenomesAt the start of the 21st century the emphasis in molecular biology shifted from the study of individual genes to the study of entire genomes. This change in emphasis was prompted by the development during the 1990s of methods for sequencing largegenomes. Genome sequencing predates the 1990swe saw in Chapter 10 how the firstgenome, that of the phage dX174, was completed in 1975but it was not until 20 years later, in 1995, that the first genome of a free-living organism, the bacteriumHaemophilus influenzae, was completely sequenced. The next five years were a water-shed with the genome sequences of almost 50 other bacteria published, along with complete sequences for the much larger genomes of yeast, fruit fly, Caenorhabditis elegans (a nematode worm), Arabidopsis thaliana (a plant), and humans. Today, thesequencing of bacterial genomes has become routine, with over 900 completed, andalmost 100 eukaryotic genomes have also been sequenced.Genome sequencing has led to the development of a new area of DNA research,loosely called post-genomics or functional genomics. Post-genomics includes the use of computer systems in genome annotation, the process by which the genes, controlsequences, and other interesting features are identified in a genome sequence, as well ascomputer-based and experimental techniques aimed at determining the functions of anyunknown genes that are discovered. Post-genomics also encompasses techniquesdesigned to identify which genes are expressed in a particular type of cell or tissue, andhow this pattern of genome expression changes over time.12.1 Genome annotationOnce a genome sequence has been completed, the next step is to locate all the genes anddetermine their functions. It is in this area that bioinformatics, sometimes referred toGene Cloning and DNA Analysis: An Introduction. 6thedition. By T.A. Brown. Published 2010 byBlackwell Publishing.9781405181730_4_012.qxd 1/13/10 9:35 Page 20712353AATTTTAGGGGCAAATCCGCGATATTTAGCATCGAAGCCGATATTAATATTTTAAATTT 3 5456TFigure 12.1A double-stranded DNA molecule has six readingframes.as molecular biology in silico, is proving of major value as an adjunct to conventionalexperiments.Genome annotation is a far from trivial process, even with genomes that have beenextensively studied by genetic analysis and gene cloning techniques prior to completesequencing. For example, the sequence of the yeast Saccharomyces cerevisiae, one ofthe best studied of all organisms, revealed that this genome contains about 6000 genes.Of these, some 3600 could be assigned a function either on the basis of previous studies that had been carried out with yeast or because the yeast gene had a similarsequence to a gene that had been studied in another organism. This left 2400 geneswhose functions were not known. Despite a massive amount of work since the yeastgenome was completed in 1996, the functions of many of these orphans have still notbeen determined.12.1.1 Identifying the genes in a genome sequenceLocating a gene in a genome sequence is easy if the amino acid sequence of the proteinproduct is known, allowing the nucleotide sequence of the gene to be predicted, or ifthe corresponding cDNA has been previously sequenced. But for many genes there is noprior information that enables the correct DNA sequence to be recognized. How canthese genes be located in a genome sequence?Searching for open reading framesThe DNA sequence of a gene is an open reading frame (ORF), a series of nucleotidetriplets beginning with an initiation codon (usually but not always ATG) and ending ina termination codon (TAA, TAG, or TGA in most genomes). Searching a genomesequence for ORFs, by eye or more usually by computer, is therefore the first step ingene location. When carrying out the search it is important to remember that each DNAsequence has six reading frames, three in one direction and three in the reverse direc-tion on the complementary strand (Figure 12.1).The key to the success of ORF scanning is the frequency with which terminationcodons appear in the DNA sequence. If the DNA has a random sequence and a GC con-tent of 50%, then each of the three termination codons will appear, on average, onceevery 43= 64 bp. This means that there should not be many ORFs longer than 3040codons in random DNA, and not all of these ORFs will start with ATG. Most genes aremuch longer than this: the average lengths are 317 codons for Escherichia coli, 483codons for S. cerevisiae, and approximately 450 codons for humans. ORF scanning, inits simplest form, therefore takes a figure of 100 codons as the shortest length of a puta-tive gene and records positive hits for all ORFs longer than this.With bacterial genomes, simple ORF scanning is an effective way of locating most ofthe genes in a DNA sequence. Most bacterial genes are much longer than 100 codonsPart II The Applications of Gene Cl
网站客服QQ:2055934822
金锄头文库版权所有
经营许可证:蜀ICP备13022795号 | 川公网安备 51140202000112号