Homepage

IST 444 Assignments

IST 444- FADD

 

Blast

Learning to use BLAST with Your Gene

http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html

http://www.ncbi.nlm.nih.gov/Class/BLAST/blast_course.short.html

http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&PAGE_TYPE=BlastTips#2

Using the MyNCBI service you can register for a private account and each time you log in you can have access to custom searches for BLAST as well as PubMed and other Entrez services.

Nucleotide BLAST:

What is the difference between the megablast, discontiguous megablast and the blastn algorithms?When should you consider using each one?

Megablast is efficient in memory and speed unlike BLAST. It is an input as a FASTA format with DNA query sequences. It only works with DNA sequences and therefore can only support the program blastn.
Discontinguous megablast is a version of Megablast that was created for the comparison of diverged sequences especially those of different organisms. This should be considered when comparing two sequences from different organisms with relatively little identity.
Blastn algorithm helps compare a nucleotide query sequence against a nucleotide sequence database.

What are the choices you can make within the nucleotide collection?When would you try each one of these choices?

All GenBank + RefSeq Nucleotides + EMBL + DDBJ + PDB sequences (excluding HTGS0,1,2, EST, GSS, STS, PAT, WGS). No longer "non-redundant".

The remaining databases can be found here:

http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#nucleotide_databases

Perform a BLASTN search with your gene of interest cDNA using the nucleotide collection.  Save the output for the webpage.

BLASTN search of FADD cDNA using nucleotide collection

Perform a megablast and save the output for your webpage.

Megablast of FADD cDNA

Perform a discontiguous megablast and save the output for your webpage.

Discontinguous Megablast of FADD cDNA

How many hits did you get?
BlastN- 111hits
Megablast-47hits
Discontiguous Megablast- 68hits


Aside from the human genes, what is the most significant hit?

The most significant hit came from the house mouse (Mus musculus)

What is its E-value?
9E-124

Which is the first non-human gene you located?
For the disambiguous megablast, the first non-human gene was from the rhesus monkey (Macaca mulatta similar to FADD protein (FAS-associating death domain) and pig (Sus scrofa mRNA, clone:OVRM10189E12, expressed in ovary)

What % identity did it have?
The rhesus monkey had 91% identity and the pig had 84% identity.

What does an Expect (E) value mean?
The Expect (E) value is the expectation value. It represents the number of different aligntments with scores that are equal or better than the S that is predicted to occur in a search by chance. The higher the E value, the lesser significant the score, and the lower the value, the more significant the score.

How should you interpret these results?
A lot of the values that were less than zero came from animals such as the rhesus monkey, house mouse, pig, cattle, and Norway rat.

Perform a BLASTN search using the Try this search with the human database alone. How many hits did you get this time?Link the results to your webpage.

BLASTN search using Human Database

Try this search with the arabidopisis database (a model plant).The option to limit a search to organism and even taxonomic classification is part of the "Limit by Entrez Search" option on most standard BLAST search pages.There is a pull down menu to select the most common organisms found in GenBank and also a field to input the species name, or classification.Link the results to your webpage.Did you find any related plant genes?Why or why not, do you think?

Arabidopisis Database Search

Perform a refseq RNA BLAST search.What is the advantage of performing this search?Link the results to your webpage.

The benefits of performing the refseq RNA BLAST search is that refseq contains data that has already been curated. This means that the results attained from the BLAST search will be faily accurate.

Refseq RNA BLAST search Results

Use your sequence to perform a BLASTX search.Did you obtain other sequence matches than using the nucleotide sequence alone?

BLASTX sequence

Because the BLASTX uses all the possible reading frames to search, one should get more results.

Use your sequence to perform a Genomic BLAST.This allows the genomic context of a BLAST search to be displayed in the Map Viewer. Did your discontiguous (cross-species) MegaBLAST against the human RefSeq transcript identify any homologs in the rat genome?

 

Use the patent database to search with your sequence.Describe something interesting you learned about commercial applications for your gene.

My gene, FADD, is used as a biomarker for cyclin dependent kinase modulation and is highly involved with detecting various types of cancer. It is also involved in an oligonucleotide library that detects RNA transcripts.

Patent Sequence

Try using Tree View to examine relationships between sequences you have identified.

Tree View of Sequences

Proteins:

Perform a BLASTP search with your gene of interest using the default BLAST parameters and the nonredundant database.Save the output for the webpage. You can right click the output and chose print.This will give you a document in Microsoft image writer which you can save for the internet.

BLASTP search

How many hits did you get?How many were human genes?
123 hits. About 75% were human genes.

Which is the first non-human gene you located?What % similarity and what % identity did it have?
First gene was from Pan troglodytes (chimpanzee)
98% identity.
E: 1e-102

Pull down the format menu.How could you select different formats for your output?
You can select to show the alignment or the bioseq.
There is also different alignment views such as pairwise, pairwise with dots for identities, query-anchored with dots for identities, query-anchored with letters for identities, flat query anchored with dots for identities, flat query anchored with letters for identities, and hit table.

Click Algorithm Parameters.What are the matrix choices you have in performing a BLASTP search?What are the advantages of each choice?What happens when you change matrices?
There are many matrices available to perform such as BLOSUM62, PAM30, PAM70, BLOSUM 80 and BLOSUM45.  When you change matrices, the extension numbers change.

Check out the tree view.Try some of the options to view your tree.You can save the output from the tree by right clicking within the tree and then printing it to Microsoft image writer.This makes a colored page you can post to the web, sort of like Adobe pdf.  Link an easily interpreted option to your webpage. What do you think this tree indicates?

The tree shows the location of the gene and what is included in the same family.

Perform a PSI-BLAST search using your gene.123 hitsHow many more hits did you get on the second round of searching?116 hitsThe third round?131hitsPost your output to the web for the third search.

PSI-BLAST search for FADD gene

Perform your BLASTP search using the RefSeq proteins.

BLASTP with RefSeq Proteins
115hits.

Please identify 6-8 orthologs from different species from this RefSeq BLAST search.  Try to get some related ones like humans and monkeys and some that are more distant, like a bird or fish.We want good alignment across the sequence (no gaps) but that are not identical.Start with proteins and then find the mRNA sequence that goes with it.We are looking for sequences that are <90% identical if possible.We will use these for multiple alignments.

Protein
ID number
Nucleic Acid
ID number
Genus/Species Gene name % identity
GI:4505229 GI:22219473 Homo sapiens/ humans FAS-associated via death domain 100
GI:6753812 GI:158508678 Mus musculus/house mouse Mus musculus FAS (TNFRSF6) associated via death domain 68
GI:147898837 GI:147898836 Xenopus laevis/African clawed frog Xenopus laevis Fas-associated death domain containing protein 39
GI:157278311 GI:157278310 Oryzias latipes (Japanese medaka) Fas-associating death domain-containing protein [Oryzias latipes] 37
GI:157824041 GI:157824040 Rattus norvegicus (Norway rat) receptor (TNFRSF)-interacting serine-threonine kinase 1 [Rattus
norvegicus]
36
GI:45383358 GI:45383357 Gallus gallus (chicken) receptor (TNFRSF)-interacting serine-threonine kinase 1 [Gallus          gallus]. 30
GI:21355807 GI:24648890 Drosophila melanogaster (fruit fly) BG4 CG12297-PA 28