“OrthoMCL is an algorithm for grouping proteins into ortholog groups based on their
sequence similarity. “
With more than 3K citations, the OrthoMCL elegantly finds orthologs, co-orthologs, and in-paralogs in protein FASTA files. If all that you need is to find in-paralogs in a set of sequences from a target species, you just need to provide a unique FASTA file as input to OrthoMCL. To find orthologs and co-orthologs you feed the algorithm with FASTA files for each species.
The OrthoMCL user guide describes thirteen steps, from the software dependencies installation to the complete execution of the algorithm and obtention of the four output files: 1) coorthologs.txt, 2) inparalogs.txt, 3) orthologs.txt, and 4) groups.txt.
In the following pipeline, all softwares and dependencies (MCL, Perl, CPAN, MySQL…) are aligned to set the proper environment (Ubuntu) to run the OrthoMCLv2.0.9.
After cloning the Git repository,
git clone https://github.com/hugorody/orthomcl/
all you need is to edit and set the variables at the beginning of the file:
#!/usr/bin/sh #CONFIGURE VARIABLES mysqlpass="user123" # SET root password dependenciesinstall="no" # SET yes to install softwares and dependencies installorthomcl="no" # SET yes to install MCL software fastainput="/path/to/fasta/dir/" # SET your input directory with n FASTA files clusteracro="CLU" #SET an acronym for the groups blastAVAfile="" #LEAVE empty if you don't have the BLASTp all-vs-all file (will run STEP 7)
Then, to run the pipeline: