DIVYA SINGHAL, POOJA SHARMA, MOKSHA SHANDILYA
ABSTRACT
Gene finding typically refers to the area of computational biology that is concern with algorithmically identifying stretches of sequence, usually genomic DNA, that are biologically functional. This specially includes protein coding genes but may also include other functional elements such as RNA gene and regulatory regions. Gene finding is one of the first and most steps in understanding the genome of specie once it has been sequenced. Gene prediction software’s are bioinformatics tools to predict the gene structure of a given sequence in Fasta format. Gene prediction involves determining the number and location of exons (initial, intermediate or terminal), number and location of introns, CDS region, location of promoter and terminal regions (PolyA). In this study, various windows based online gene prediction software’s were compared against genebank sequences for 5 different sequences. Softwares used were: HMMgene, EMBOSS, FGENESH, GENMARK and GENSCAN. Results were analyzed by calculating specificity and sensitivity at nucleotide and exon level. Correlation coefficient, average conditional probability, approximate correlations were calculated and compared to determine most efficient software for use. FgeneSH software found to be best eukaryotic gene prediction software.
Keywords: gene prediction, eukaryotes, software, exons, introns