******************************************************
* The package contains the data and the tools        *
*  required to run the comparison experiment         *
*                                                    *
******************************************************


In order to run the experiment you need the following software/machine available:
    - CCP4 version 7
    - PHENIX 1.14
    - g++ compiler and libraries
    - SPSS for generating the plots
    - A cluster server uses Slurm workload manager (Please read section "Modifying     comparison tool for other cluster managers types" if your cluster server uses different mangers types). It is not recommended to run the comparison experiment on PC/Laptop.
    - Latex editor.
*******************************************************
* Running the comparison experiment on cluster server *
*******************************************************

Follow these steps to run the comparison experiment.

1- Access to the Experiment folder through the command line. 
2- Run prepare_experiment.sh to prepare the datasets and create required scripts. (sh prepare_experiment.sh) 
3- Use All_Qsub.sh to start the pipelines and build the datasets. (sh All_Qsub.sh) 
4- Once all the jobs from step 3 are complete, run All_Qsub_ArpB.sh script to start Arp/wArp after Buccaneer. (sh All_Qsub_ArpB.sh)
5- Some of the pipelines might fail to build some of the datasets for different reasons, such as insufficient memory, more time is needed or crashed. These datasets need to re-run. To run the failed datasets again, run this script (sh rerun_failed_datasets.sh). 
Then do step 3 and 4. You might need to do this step several times until you get same failed datasets in each time. That means these datasets cannot be built due to an error in the pipeline.    
6- Once all the jobs from step 3,4 and 5 are complete, run this script All_Analysers.sh to start creating excels files contain the comparison results.   
7- Once the jobs from step 6 are done, use this script Tables.sh to produce the comparison tables.
8- Use any latex editor to compile these tables and produce a PDF. Access to latex folder then use this command pdflatex Tables.tex or you can compile through Latex editor GUI.  
9- To generate the plots, you need to run SPSS code (Plots_Completness_Rwork_Rfree_IncorrectlyBuilt.sps and Plots_Fmap_executionTimes.sps) from SPSS GUI and use CSV files dataset from Experiment folder.  
9.1- Use the CSV in AllExFaliedCasesExcludedBuccaneerDevSet for Plots_Completness_Rwork_Rfree_IncorrectlyBuilt.sps
9.2- Use the CSV in OrginalBuccEx54ExFaliedCases for Plots_Fmap_executionTimes.sps

All these scripts were tested on Linux. Steps 2 and 5 might take a long time, and instead of running them directly from the command line, you may create a script and submit as a job to your cluster. 

A script example for step 1:   

#!/bin/bash
#SBATCH --time=8:00:00                # Time limit hrs:min:sec
#SBATCH --mem=1000                     # Total memory limit
module load chem/ccp4/7.0.066
module load chem/phenix/1.14-3260
sh prepare_experiment.sh 
sh All_Qsub.sh

A script example for step 2:

#!/bin/bash
#SBATCH --time=48:00:00                # Time limit hrs:min:sec
#SBATCH --mem=1000                     # Total memory limit
module load chem/ccp4/7.0.066
module load chem/phenix/1.14-3260 

sh rerun_failed_datasets.sh 
sh All_Qsub.sh

Then submit as a job to your cluster using sbatch command. You could use the above script code and modify it as you need.

******************************************
* Scripts that you might need to change  *
****************************************** 
1- Loading modules might differ from cluster to another due to differences in modules names. The names of the modules as in our cluster are chem/ccp4/7.0.066 and chem/phenix/1.14-3260.  If the ccp4 or PHENIX modules not correctly loaded because they have different names than the one that we have, then set the correct names in slurm.config file.            
2- If you want to increase the number of CPUs to speed up the Analysers, you need to modify (PipelineName)Analyser.sh for each pipeline.
3- Cluster servers which have limited resources might face issues when running all jobs at the same time so Manager(PipelineName).sh is recommended to use in this situation. Manager(PipelineName).sh submits a one job at a time and wait until the job is running then submit another instead of using Qsub.sh which submit all jobs at once. (working only on Grid Engine)

*****************************************************
* Building a single dataset or run a one pipeline   *
*****************************************************

1- Run prepare_experiment.sh
2- To run a one pipeline, such as Buccaneer i1 for NO-NCS datasets:  
     cd nonncsJobs/Buccaneeri1
     sh Qsub.sh >> this will submit all jobs to the cluster server for only Buccaneer i1  
3- To build a single dataset:
     cd nonncsJobs/Buccaneeri1 
     cluster server : sbatch J1o6a-1.9-parrot-noncs.sh
     not a cluster server: sh  J1o6a-1.9-parrot-noncs.sh
      
***************************************
* Modifying comparison tool for other *
* cluster workload managers types     *    
***************************************

The default in the comparison tool is Slurm, but the tool also supports Grid Engine. To use Grid Engine, you need the first download the source code then removes ClusterServerGrid value ("") in RunningParameter.java and export as Jar file. You need to modify the generated scripts manually if your cluster manager not one of these.  

***************************************
* Important notes                     *
*                                     *    
***************************************
1- Running on a sample from the datasets will cause in wrong results analysis. For example, if you have this PDB 1o6a with its original resolution 1.9 and synthetic resolutions 3.2 and 3.4, and your datasets only included the synthetic resolutions, the analysed results will consider 3.2 as the original resolution because its the lowest resolution in the datasets which here is incorrect. You need to add the original resolution in the excel file and set Built column to F before running Tables.sh to solve the problem.   

2- Reducing the memory for the Analysers will sometimes lead to generating incomplete excels. The default is 10GB.          

3- Structures size groups in execution times plot need to be sorted from plot properties in SPSS.  
***************************************
* Comparison tool source code         *
*                                     *    
***************************************

The source code of the comparison tool available at https://github.com/E-Alharbi/PipelinesComparisonTool