NJ trees for multiple FASTA files using Phangorn R package

This script intends to iterate with multiple-sequence alignment (MSA) FASTA files in a directory and create Neighbor-Joining (NJ) trees for each of those files. For this, we will use R and the package Phangorn.

Phangorn is described as a package for Phylogenetic analysis in R, and contains methods for estimation of phylogenetic trees and networks using Maximum Likelihood, Maximum Parsimony, distance methods and Hadamard conjugation. Allows to compare trees, models selection and offers visualizations for trees and split networks.

The R function list.files will produce a character vector of the names of files in directory. Then, two new variables will interact with files in a loop to create input and output files names that will subsequently be used by three commands of Phangorn to generate a NJ tree. Finally, the script writes the generated tree in newick format.


myfiles <- list.files(path = "/path/to/msa/fasta/files/", pattern = NULL, all.files = FALSE,
           full.names = FALSE, recursive = FALSE,
           ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
for (fastafile in myfiles) {
infile <- paste(c("/path/to/msa/fasta/files/",fastafile),collapse="")
outfile <- paste(c(fastafile,".nwk"),collapse="")
print (paste(infile))
mytree <- read.phyDat(infile,format="fasta", type = "AA")
dm <- dist.ml(mytree)
treeNJ <- NJ(dm)

#write tree
write.tree(treeNJ, file=outfile) #fastafile.nwk

fasta2NJnewick.R Git link

This entry was posted in Local Tools. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s