BLAST is certainly the most widely used (and cited!) sequence similarity search tool. It can be applied to a massive variety of purposes such as gene annotation, search for orthologs and paralogs, phylogenetic inferences…
The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of each alignment.
Basically, BLAST explodes your biological sequence of interest (query) into short words and try to find local alignments for these words in order to obtain larger local alignments.
BLAST has a fantastic web interface providing a wide range of analysis. However, if you have more specific questions to answer and needs to perform analysis of high computational cost, you have to consider performing a local BLAST.
Below, I tried to go very carefully throughout the command lines required to perform a local BLAST. The general idea of running BLAST is simple: 1) you have a query sequence that is 2) compared to a database. If there is any sequence similar to your query in that database, BLAST will return the sequences alongside the pairwise local alignment statistics. Following this idea, to run a local BLAST you firstly have to:
Please find in the links above more detailed information if you need it.
Installation of BLAST package for Linux (Debian/Ubuntu/Mint) users:
sudo apt-get install ncbi-blast+
You can also download BLAST software package for Mac and Windows here.
makeblastdb: Creating a local BLAST database
The first thing you need to run a local BLAST is set up a local BLAST database. This database will be the subject of your query sequence. You want to know if there is any sequence similar to your query in that database. The input to create a local BLAST database is a multi-sequence FASTA file.
makeblastdb -in file.fasta -dbtype nucl -out file.blastdb
Note that in the previous example we used nucleotides ‘nucl’ as -dbtype. Use -dbtype as ‘prot’ if you have amino acid sequences.
Now, to search for similarities to your query against the assembled database, you have to use the appropriate BLAST algorithm (blastn, blastp, blastx…) by specifying the parameters for the search:
- your query FASTA file,
- your database name (same as used during makeblastdb),
- a E-value cutoff,
- the format of output (optional)
- and a name for the output file.
blastn -db file.blastdb -query file.fas -num_threads 4 -evalue 1e-20 -outfmt 6 -out result.txt
*the -outfmt 6 parameter gives the BLAST output in a tabular table:
- query (e.g., gene) sequence id
- subject (e.g., reference genome) sequence id
- percentage of identical matches
- alignment length
- number of mismatches
- number of gap openings
- start of alignment in query
- end of alignment in query
- start of alignment in subject
- end of alignment in subject
- expect value
- bit score
*** There are more formatting options ( -outfmt ) for the alignment view:
0 = pairwise,
1 = query-anchored showing identities,
2 = query-anchored no identities,
3 = flat query-anchored, show identities,
4 = flat query-anchored, no identities,
5 = XML Blast output,
6 = tabular,
7 = tabular with comment lines,
8 = Text ASN.1,
9 = Binary ASN.1,
10 = Comma-separated values,
11 = BLAST archive format (ASN.1)
For detailed options type “$blastn -help” on terminal.