| Bioinformatics
Tools |
There
are both standard and customized products to meet the requirements of
particular projects. There are data-mining software that retrieve data
from genomic sequence databases and also visualization tools to analyze
and retrieve information from proteomic databases. These can be classified
as homology and similarity tools, protein functional analysis tools,
sequence analysis tools and miscellaneous tools. Here is a brief description
of a few of these Everyday bioinformatics is done with sequence search
programs like BLAST, sequence analysis programs, like the EMBOSS and
Staden packages, structure prediction programs like THREADER or PHD
or molecular imaging/modelling programs like RasMol and WHATIF.
|
Homology
and Similarity Tools:
|
Homologous
sequences are sequences that are related by divergence from a common
ancestor. Thus the degree of similarity between two sequences can be
measured while their homology is a case of being either true of false.
This set of tools can be used to identify similarities between novel
query sequences of unknown structure and function and database sequences
whose structure and function have been elucidated.
|
Protein
Function Analysis:
|
This
group of programs allow you to compare your protein sequence to the
secondary (or derived) protein databases that contain information on
motifs, signatures and protein domains. Highly significant hits against
these different pattern databases allow you to approximate the biochemical
function of your query protein.
|
Structural
Analysis:
|
This
set of tools allow you to compare structures with the known structure
databases. The function of a protein is more directly a consequence
of its structure rather than its sequence with structural homologs tending
to share functions. The determination of a protein's 2D/3D structure
is crucial in the study of its function.
|
Sequence
Analysis:
|
This
set of tools allows you to carry out further, more detailed analysis
on your query sequence including evolutionary analysis, identification
of mutations, hydropathy regions, CpG islands and compositional biases.
The identification of these and other biological properties are all
clues that aid the search to elucidate the specific function of your
sequence.
|
Examples
of Bioinformatics Tools:
|
BLAST:
|
BLAST (Basic Local Alignment Search Tool) comes under the category of
homology and similarity tools. It is a set of search programs designed
for the Windows platform and is used to perform fast similarity searches
regardless of whether the query is for protein or DNA. Comparison of
nucleotide sequences in a database can be performed. Also a protein
database can be searched to find a match against the queried protein
sequence. NCBI has also introduced the new queuing system to BLAST (Q
BLAST) that allows users to retrieve results at their convenience and
format their results multiple times with different formatting options.
|
blastp
compares an amino acid query sequence against a protein sequence database
|
blastn
compares a nucleotide query sequence against a nucleotide sequence database
|
blastx
compares a nucleotide query sequence translated in all reading frames
against a protein sequence database
|
tblastn
compares a protein query sequence against a nucleotide sequence database
dynamically translated in all reading frames
|
tblastx
compares the six-frame translations of a nucleotide query sequence against
the six-frame translations of a nucleotide sequence database.
|
FASTA:
|
FAST homology search All sequences .An alignment program for protein
sequences created by Pearsin and Lipman in 1988. The program is one
of the many heuristic algorithms proposed to speed up sequence comparison.
The basic idea is to add a fast prescreen step to locate the highly
matching segments between two sequences, and then extend these matching
segments to local alignments using more rigorous algorithms such as
Smith-Waterman.
|
EMBOSS:
|
EMBOSS (European Molecular Biology Open Software Suite) is a software-analysis
package. It can work with data in a range of formats and also retrieve
sequence data transparently from the Web. Extensive libraries are also
provided with this package, allowing other scientists to release their
software as open source. It provides a set of sequence-analysis programs,
and also supports all UNIX platforms.
|
Clustalw:
|
It is a fully automated sequence alignment tool for DNA and protein
sequences. It returns the best match over a total length of input sequences,
be it a protein or a nucleic acid.
|
RasMol:
|
It is a powerful research tool to display the structure of DNA, proteins,
and smaller molecules. Protein Explorer, a derivative of RasMol, is
an easier to use program.
|
PROSPECT:
|
PROSPECT (PROtein Structure Prediction and Evaluation Computer ToolKit)
is a protein-structure prediction system that employs a computational
technique called protein threading to construct a protein's 3-D model.
|
PatternHunter
:
|
PatternHunter, based on Java, can identify all approximate repeats in
a complete genome in a short time using little memory on a desktop computer.
Its features are its advanced patented algorithm and data structures,
and the java language used to create it. The Java language version of
PatternHunter is just 40 KB, only 1% the size of Blast, while offering
a large portion of its functionality.
|
COPIA
:
|
COPIA
(COnsensus Pattern Identification and Analysis) is a protein structure
analysis tool for discovering motifs (conserved regions) in a family
of protein sequences. Such motifs can be then used to determine membership
to the family for new protein sequences, predict secondary and tertiary
structure and function of proteins and study evolution history of the
sequences.
|
Application
of Programmes in Bioinformtics:
|
JAVA
in Bioinformatics:
|
Since research centers are scattered all around the globe ranging from
private to academic settings, and a range of hardware and OSs are being
used, Java is emerging as a key player in bioinformatics. Physiome Sciences'
computer-based biological simulation technologies and Bioinformatics
Solutions' PatternHunter are two examples of the growing adoption of
Java in bioinformatics.
|
Perl
in Bioinformatics:
|
String
manipulation, regular expression matching, file parsing, data format
interconversion etc are the common text-processing tasks performed in
bioinformatics. Perl excels in such tasks and is being used by many
developers. Yet, there are no standard modules designed in Perl specifically
for the field of bioinformatics. However, developers have designed several
of their own individual modules for the purpose, which have become quite
popular and are coordinated by the BioPerl project.
|
Bioinformatics
Projects:
|
BioJava:
|
The BioJava Project is dedicated to providing Java tools for processing
biological data which includes objects for manipulating sequences, dynamic
programming, file parsers, simple statistical routines, etc.
|
BioPerl:
|
The BioPerl project is an international association of developers of
Perl tools for bioinformatics and provides an online resource for modules,
scripts and web links for developers of Perl-based software.
|
BioXML:
|
A part of the BioPerl project, this is a resource to gather XML documentation,
DTDs and XML aware tools for biology in one location.
|
Biocorba:
|
Interface objects have facilitated interoperability between bioperl
and other perl packages such as Ensembl and the Annotation Workbench.
However, interoperability between bioperl and packages written in other
languages requires additional support software. CORBA is one such framework
for interlanguage support, and the biocorba project is currently implementing
a CORBA interface for bioperl. With biocorba, objects written within
bioperl will be able to communicate with objects written in biopython
and biojava (see the next subsection). For more information, see the
biocorba project website at http://biocorba.org/. The Bioperl BioCORBA
server and client bindings are available in the bioperl-corba-server
and bioperl-corba-client bioperl CVS repositories respecitively. (see
http://cvs.bioperl.org/ for more information).
|
Ensembl
:
|
Ensembl
is an ambitious automated-genome-annotation project at EBI. Much of
Ensembl\'s code is based on bioperl, and Ensembl developers, in turn,
have contributed significant pieces of code to bioperl. In particular,
the bioperl code for automated sequence annotation has been largely
contributed by Ensembl developers. Describing Ensembl and its capabilities
is far beyond the scope of this tutorial The interested reader is referred
to the Ensembl website at http://www.ensembl.org/.
|
bioperl-db:
|
Bioperl-db
is a relatively new project intended to transfer some of Ensembl's capability
of integrating bioperl syntax with a standalone Mysql database (http://www.mysql.com)
to the bioperl code-base. More details on bioperl-db can be found in
the bioperl-db CVS directory at http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-db/?cvsroot=bioperl.
It is worth mentioning that most of the bioperl objects mentioned above
map directly to tables in the bioperl-db schema. Therefore object data
such as sequences, their features, and annotations can be easily loaded
into the databases, as in $loader->store($newid,$seqobj) Similarly
one can query the database in a variety of ways and retrieve arrays
of Seq objects. See biodatabases.pod, Bio::DB::SQL::SeqAdaptor, Bio::DB::SQL::QueryConstraint,
and Bio::DB::SQL::BioQuery for examples.
|
Biopython
and biojava:
|
Biopython
and biojava are open source projects with very similar goals to bioperl.
However their code is implemented in python and java, respectively.
With the development of interface objects and biocorba, it is possible
to write java or python objects which can be accessed by a bioperl script,
or to call bioperl objects from java or python code. Since biopython
and biojava are more recent projects than bioperl, most effort to date
has been to port bioperl functionality to biopython and biojava rather
than the other way around. However, in the future, some bioinformatics
tasks may prove to be more effectively implemented in java or python
in which case being able to call them from within bioperl will become
more important. For more information, go to the biojava http://biojava.org/
and biopython http://biopython.org/ websites.
|