Multiple Sequence Alignment
Discuss the difference between the alignments for these two multiple alignment methods (ClustalW+ and PileUp).
ClustalW+ is a program that makes a multiple sequence alignment from a group of similar sequences using progressive and pairwise alignments.
" The simultaneous alignment of many nucleotide or amino acid sequences is now an essential tool in molecular biology. Multiple alignments are used to find diagnostic patterns to characterize protein families; to detect or demonstrate homology between new sequences and existing families of sequences; to help predict the secondary and tertiary structures of new sequences; to suggest oligonucleotide primers for PCR; as an essential prelude to molecular evolutionary analysis. The rate of appearance of new sequence data is steadily increasing and the development of efficient and accurate automatic methods for multiple alignments are, therefore, of major importance.
The multiple alignment procedure begins with the pair wise alignment of the two most similar sequences, producing a clusterof two aligned sequences. This cluster can then be aligned to the next most related sequence or cluster of aligned sequences. Two clusters of sequences can be aligned by a simple extension of the pairwise alignment of two individual sequences. The final alignment is achieved by a series of progressive, pairwise alignments that include increasingly dissimilar sequences and clusters, until all sequences have been included in the final pair wise alignment.
Before alignment, the sequences are first clustered by similarity to produce a dendrogram, or tree representation of clustering relationships. It is this dendrogram that directs the order of the subsequent pairwise alignments."
Source may be found here.
PileUp is a program that creates multiple sequence alignments from a group of similar sequences that uses progressive and pairwise alignments.
" The multiple alignment procedure begins with the pairwise alignment of the two most similar sequences, producing a cluster of two aligned sequences. This cluster can then be aligned to the next most related sequence or cluster of aligned sequences. Two clusters of sequences can be aligned by a simple extension of the pairwise alignment of two individual sequences. The final alignment is achieved by a series of progressive, pairwise alignments that include increasingly dissimilar sequences and clusters, until all sequences have been included in the final pairwise alignment.
Before alignment, the sequences are first clustered by similarity to produce a dendrogram, or tree representation of clustering relationships. It is this dendrogram that directs the order of the subsequent pairwise alignments. PileUp can plot this dendrogram so that you can see the order of the pairwise alignments that created the final alignment.
As a general rule, PileUp can align up to 500 sequences, with any single sequence in the final alignment restricted to a maximum length of 7,000 characters (including gap characters inserted into the sequence by PileUp to create the alignment). However, if you include long sequences in the alignment, the number of sequences PileUp can align decreases. See the RESTRICTIONS topic, below, for a more complete discussion of sequence number and size limitations."
Source may be found here.
How do the alignments change when you alter the default parameters (matrices, gap penalties)?
When default parameters are altered, the results obtained may show a more precise or inaccurate search of the results desired. An understanding of what is ideal for the search will help one attain the best possible results.
What happens if you remove a sequence from your alignments?
When a sequence is removed from the alignment, the programs are able to align them better. With the excess sequences clipped off, it is easier for the consensus sequences to be identified within each of the organisms. This can then show observers of any evolutionary relationships that may exist.
Do the trees generated make any biological sense to you? Why or why not?
The trees make little sense. Revision of the taxonomy would be useful in identifying the relationships between the various sequences.
Compare CLUSTALW from the internet site http://www.ebi.ac.uk/clustalw/ with CLUSTALW+ output generated by SeqWeb.Which do you prefer?Why?
Since there are overwhelmingly more choices on the CLUSTALW site, the CLUSTALW+ site seems to be more user-friendly especially with those who are not familiar with this program.
What is the advantage of the consensus sequence?How is this approach similar to the PSI-BLAST you performed earlier?
Consensus sequences are sequences that are believed to be conserved in all of the sequences in the query. It is the one area that may be similar between all of the sequences. This is advantageous because it may prove to show evolutionary relationships between various organisms.
PSI blast is a search that looks for similar sequences in proteins in various databases. This search is more sensitive than BLAST which means that it is capable of finding distantly related organism(sequences) that BLAST may not have found. It finds similarities between the area of interest and the databases and then produces a gap alignment of the regions. The regions are then calculated.
Since a maximum expectation level is selected each round of the PSI BLAST, the expectation of a sequence is the probability of the current search finding a sequence with a similar score.
What is the advantage to the multiple sequence alignment over the approach from pairwise comparisons?Any disadvantages?
The advantages of using multiple sequence alignment is vast. It has been thought to be better than pairwise alignment because it tries to incorporate more than two sequences at a time. This method tries to align all of the sequences given in a query. It attempts to identify conserved regions over the sequences that are thought to be related. The alignments are then used to make phylogenetic trees to help establish their relationships.
These sequences however, are difficult to computate and produce. The results may lead to NP complete combinatorial optimization problems. This is defined as "non-deterministic polynomial time." To solve this problem, methods such as approximation, probability and heuristic must be implemented.
ClustalW+
Creates a multiple alignment by progressively adding sequences to an alignment.
PileUp
Creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments.
PlotSimilarity
Plots the running average of the similarity among the sequences in a multiple sequence alignment.
Plot Similarity- Altered Parameter
Pretty
Create a multiple sequence alignment and calculate a consensus sequence.