Local BLAST

BLAST is certainly the most widely used (and cited!) sequence similarity search tool. It can be applied to a massive variety of purposes such as gene annotation, search for orthologs and paralogs, phylogenetic inferences…

Web BLAST:
https://blast.ncbi.nlm.nih.gov/Blast.cgi

The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of each alignment.

Basically, BLAST explodes your biological sequence of interest (query) into short words and try to find local alignments for these words in order to obtain larger local alignments.

BLAST has a fantastic web interface providing a wide range of analysis. However, if you have more specific questions to answer and needs to perform analysis of high computational cost, you have to consider performing a local BLAST.  

Below, I tried to go very carefully throughout the command lines required to perform a local BLAST. The general idea of running BLAST is simple: 1) you have a query sequence that is 2) compared to a database. If there is any sequence similar to your query in that database, BLAST will return the sequences alongside the pairwise local alignment statistics. Following this idea, to run a local BLAST you firstly have to:

  1. Create a FASTA file for query sequence(s);
  2. Assembly a local BLAST database;

Please find in the links above more detailed information if you need it. 

Installation of BLAST package for Linux (Debian/Ubuntu/Mint) users:

sudo apt-get install ncbi-blast+

You can also download BLAST software package for Mac and Windows here.

makeblastdb: Creating a local BLAST database

The first thing you need to run a local BLAST is set up a local BLAST database. This database will be the subject of your query sequence. You want to know if there is any sequence similar to your query in that database. The input to create a local BLAST database is a multi-sequence FASTA file.

makeblastdb -in file.fasta -dbtype nucl -out file.blastdb

Note that in the previous example we used nucleotides ‘nucl’ as -dbtype. Use -dbtype as ‘prot’ if you have amino acid sequences.

BLAST search

Now, to search for similarities to your query against the assembled database, you have to use the appropriate BLAST algorithm (blastn, blastp, blastx…) by specifying the parameters for the search:

  • your query FASTA file,
  • your database name (same as used during makeblastdb),
  • a E-value cutoff,
  • the format of output (optional)
  • and a name for the output file.
blastn -db file.blastdb -query file.fas -num_threads 4 -evalue 1e-20 -outfmt 6 -out result.txt


*the -outfmt 6 parameter gives the BLAST output in a tabular table:

  1. query (e.g., gene) sequence id
  2. subject (e.g., reference genome) sequence id
  3. percentage of identical matches
  4. alignment length
  5. number of mismatches
  6. number of gap openings
  7. start of alignment in query
  8. end of alignment in query
  9. start of alignment in subject
  10. end of alignment in subject
  11. expect value
  12. bit score

*** There are more formatting options ( -outfmt ) for the alignment view:

0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1,
10 = Comma-separated values,
11 = BLAST archive format (ASN.1)

For detailed options type “$blastn -help” on terminal.

Advertisements
This entry was posted in Local Tools. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s