Abstract:
Analysis of a gene sequence, which is transcribed into RNA and then translated
inti protein, is a difficult task. If this could be achieved, it would make possible
better understand how the organisms are developed from DNA information.
The behavior of gene is highly influenced by promoter sequences residing
up stream or downstream of the Transcription Start Site (TSS). The promoter
recognition pro,
access is a part of the complex process where genes interact with
each other over time and actually regulates the whole working process of a cell.
This paper attempts to develop an efficient algorithm that can successfully
distinguish promoters and non promoters by analyzing statistical data. A
learning model is developed from the known dataset to predict the unknown
ones. Results: We have developed an efficient algorithm that can successfully
distinguish genes from non-gene sequences by analyzing statistical data. A
learning model is initially developed to train the Support Vector Machine
(SVM) to identify distinctive features between gene and non gene. Then this
context was used to predict other foreign sequence by the SVM. Our system
has been tested using standard plant prom data sequence from the EMBL and
the performances are: 0.86 for the Sensitivity and 0.90 for the specificity.
Identification
Description:
This thesis submitted in partial fulfillment of the requirements for the degree of Masters of Science in Computer Science and Engineering of East West University, Dhaka, Bangladesh