Monitoring:
Debug:
ChromoMapper
Usage
ChromoMapper is a command line tool which uses four main commands: import, calc, plot and report, for dataset analysis and plot production.
Create a new dataset
The import command is used to convert the input data into a pre-processed dataset. A full QUAST output directory or a single QUAST or nucmer alignment file can be directly imported (See the Input section for more informations about required file formats).
Command:
      chromomap -c import -i inputDir -r datasetRepository (Import your data in order to be able to analyze it)
Options:
	[-A --assemblyName]   string - Assembly name. 
	[-D --date]           string - Experiment date. 
	[-e --exp]            string or integer - Experiment to be imported (default=1). 
	[-E --expName]        string - Experiment name. 
	[-G --genomeChrs]     integer or string  - Number of genomic chromosomes (1-100|guess). 
	[-i --inputDir]       string - Input alignment directory or file.
	[-p --pathToReport]   string - Path to report file in quast result folder.
	[-P --pathToConv]     string - Path to conversion file specifying the name and length of the 
				reference genome chromosomes (incompatible with G option).
	[-r --repository]     string - Path to dataset repository directory.
	[-R --reference]      string - Name of the Reference genome.
	[-S --source]         string - Alignment source type(quast|quastalign|nucmer|quastcustomdir|nucmercustomdir; 
				default=quast).
Make a report
The report command produces complete reports in the form of a folder containing a html file with links to tables and plots. Different report types may be produced by choosing among four different complexity levels.
Commands:
       chromomap -c report -d dataset -o outputDir    (Make a report containing all global tables)
       chromomap -c reportcalc -d dataset -o outputDir   (Make a report containing all tables)
       chromomap -c reportplot -d dataset -o outputDir   (Make a report containing all plots)
       chromomap -c reportfull -d dataset -o outputDir    (Make a report containing all tables and plots, plus single chromosome plots)

Calculate a table
The calc command produces different results in form of tables, starting from an imported dataset, extended by calculating additional parameters. Results include: blocks, i.e., the imported alignment blocks; blockstats, cntgstats, chromostats, which compute statistics on aligned blocks, contigs and chromosomes; cntg, chromo and exchromo, which provide a rapid view of the assembly contigs, reference chromosomes or sequences not assembled into chromosomes in the reference genome.
Commands:
       chromomap -c calc_blocks -d dataset -o outputDir   (Calculate blocks table)
       chromomap -c calc_blockstats -d dataset -o outputDir   (Calculate block statistics table)
       chromomap -c calc_chromostats -d dataset -o outputDir   (Calculate chromosome statistics table)
       chromomap -c calc_chromo -d dataset -o outputDir   (Calculate chromosome table)
       chromomap -c calc_cntg -d dataset -o outputDir   (Calculate contig table)
       chromomap -c calc_cntgstats -d dataset -o outputDir   (Calculate contig statistics table)
       chromomap -c calc_conversion -d dataset -o outputDir   (Calculate conversion table)
       chromomap -c calc_exchromo -d dataset -o outputDir   (Calculate extrachromosome table)
Options:
	[-f --outputFormat]   string - Output format for tables (tsv|csv|html|txt; default=tsv). 
	[-k --skipRows]       integer - Number of rows to skip at the start of the output table (default=0).
	[-n --nRows]          integer - Number of rows in output table.
Build a plot
The plot command uses the available data to calculate the results necessary to produce the various plots, which include: chromolen, i.e., a bar plot representing chromosome lengths; contigsonchrs, blocksonchrs and gapsonchrs, which plot contigs, alignment blocks or gaps on all reference chromosomes; blocktotlen and nblock, bar plots representing the distribution of total length and number of blocks; chromomap and contigsonchr which plot the contigs mapping onto the defined chromosome; mappedblocks which represents the assembly/reference alignment in form of bubble plots.
Commands:
       chromomap -c plot_blocksonchrs -d dataset -o outputDir   (Build blocks on chromosomes plot)
       chromomap -c plot_blocktotlen -d dataset -o outputDir   (Build block total lenght plot)
       chromomap -c plot_chromolen -d dataset -o outputDir   (Build bar plot representing chromosome lengths)
       chromomap -c plot_chromomap -d dataset -o outputDir   (Build chromomap plot)
       chromomap -c plot_contigsonchr -d dataset -o outputDir   (Build contigs on chromosome plot )
       chromomap -c plot_contigsonchrs -d dataset -o outputDir   (Build contigs on genome plot)
       chromomap -c plot_gapsonchrs -d dataset -o outputDir   (Build gaps on genome plot)
       chromomap -c plot_mappedblocks -d dataset -o outputDir   (Build bubble plot with mapped blocks)
       chromomap -c plot_nblock -d dataset -o outputDir   (Build block lenght distribution plot)
Options:
	[-a --above]          		integer - Minimum threshold length used to filter blocks in mappedblocks.
	[-b --below]          		integer - Maximum threshold length used to filter blocks in mappedblocks.
	[-B --bgColor]          	string - Plot background color for mappedblocks 
						(darker|standard|lighter|lightest; default=standard).
	[-z --noZero]         		string - Use a minimum size to draw too small bubbles in mappedblocks.
	[-T --colorPalette]         	string or integer - Color palette to use for contigsonchrs, blocksonchrs, 
						contigsonchr and chromomap plots (std|gs|prot|deut|trit; default=std).
Further options used by more than:
Options for calc, plot and report:
	[-d --dataset]        string - Path to dataset to be analysed. 
	[-C --chr]            string or integer - Chromosome to be analysed for chromomap and contigsonchr plots. 
	[-e --exp]            string or integer - Experiment to be analysed (default=1)	. 
	[-g --sizeGroup]      string - Group of alignment block to be used for block statistics 
				(above|in|below; default=above). 
	[-l --blockMinLen]    integer - Minimun length of alignment blocks 	(default=10000).
	[-o --outDir]         string - Path to output directory.
	[-O --blockOn]        string - Blocks on which reference sequence has to be used
				(all|chromosomes|extraChromo; default=all).
	[-t --blockType]      string - Block type(all|unique|repeated|alternative|main; 
	default=all).
Global options:
	[-h --help]           Show help 	
	[-v --version]        Display version 
	[-V --verbose]        string or integer - Run the program in verbose mode (true/false or 1/0).


Install the program
Dependencies
ChromoMapper was written using the PHP programming language so, before starting using Chromomapper, you need to have PHP installed on your machine.
The program runs on PHP 8 or later.
To check if PHP is already installed on your machine run the command php -v in your terminal. If PHP is not found you have to install it. You will find full installation instructions on PHP manual

Note: Remember the path where php is installed on your machine because you will need it to configure ChromoMapper.
Install ChromoMapper
- Download ChromoMapper folder here.
- Unzip the folder (if not already unzipped after download).
- You can test if the installation was successful by running a command as
     /path/to/php/php /path/to/chromomapper/chromomap -h
     or
     /path/to/php/php /path/to/chromomapper/chromomap -v
- Optionally, if it is not already present there, add the path to the global variable $PATH, to use PHP directly without specifying the full path each time.
- Optionally, if you want to directly run ChromoMapper as an executable, edit chromomap execute file by changing the path to php in the first line.
- Finally, you can test the program by downloading sample data (see below to know more about samples) or you can use directly your data.

Sample commands
Sample commands
- chromomap -h
     Read complete program help.
- chromomap -c import -i inputDir -r datasetRepository
     Import your data in order to be able to analyze it.
- chromomap -c import -i inputDir -r datasetRepository -P /path/to/conversion.tsv
     You can provide a conversion.tsv file specifying the length correspondence of the the aligned genome chromosomes and the reference ones.
- chromomap -c calc_conversion -d test_bombus -o outputDir
    Obtain the conversion file from test_bombus; it can be used as a template. (Alternatively, it can be downloaded from this link)
- chromomap -c calc_chromo -d test_bombus -o outputDir
     Build the chromosome table.
- chromomap -c plot_mappedblocks -d test_bombus -o outputDir
     Build the mappedblocks plot.
- chromomap -c reportcalc -d test_homo14 -o outputDir
     Build a report comprehensing all the obtainable tables.
- chromomap -c reportplot -d test_homo14 -o outputDir
     Build a report comprehensing all the obtainable plots.
- chromomap -c reportfull -d test_homo14 -o outputDir
     Build a report comprehensing all the obtainable outputs.

Using one of the keywords 'test_bombus', 'test_homo14' or 'test_staphylo' as argument to option -d (dataset), makes the command work on sample data, without having to specify the complete path. An active internet connection is required for this feature to work.

Sample data
The sample experiment list was generated by using assemblies from Bombus impatiens, human chromosome 14 and Staphylococcus aureus, produced by Salzberg et al. (doi: 10.1101/gr.131383.111) and comparing them by QUAST with the corresponding reference genome available on ncbi (GCA_043295415.1, GCF_000001405.40, GCF_000013425.1).
test_bombus (B. impatiens)
test_homo14 (H. sapiens chr14)
test_staphylo (S. aureus)

Input
The following arguments to the -s option are used to specity the organization of the input files/folder:
- quast: a standard QUAST output directory containing one or more all_alignments_name_.tsv files. report.tsv is also read to extract assembly stats.
- quastalign: a single QUAST alignment file.
- nucmer: a single nucmer alignment file.
- quastcustomdir: a custom directory containing QUAST alignment file.
- nucmercustomdir: a custom directory containing nucmer alignment file.
A conversion file may be specified with the -P option to provide the name and length of the reference genome chromosomes (See Sample commands section to know more about obtaining a converion file template).

Output
The results are output as reports, tables and plots.

Reports
A report output is a folder containing tables, plots, a log file and a html file providing information about the analyzed dataset and links to tables and plots generated by the analysis. Different report types may be produced:
- report: contains all tables and genome plots.
- reportcalc: contains only tables.
- reportplot: contains only plots.
- reportfull: contains all tables and plots, plus single chromosome plots (chromomap and contigsonchr, see below).

Tables
Tables are used to describe alignment blocks, contigs, chromosomes and extra-chromosomes:

blocks - provides for each block of the imported alignment:
- the contig and reference sequence ID (ctgID and refID);
- whether contig block has a reverse orientation compared to reference (Rev);
- whether block is positioned at the contig start (cntSt) or end (cntEnd);
- new start and end block position in contig coordinates according to 5'-3' reference direction (S3 and E3);
- contig block length (Length);
- reference block length (LenOnChr);
- difference and ratio between block length on contigs and on reference (LenDiff and LenExcess);
- block-end annotations: translocation (Transloc), indel (Indel), local misassembly (LocMis);
- related block parameters: number of unrelated blocks, i.e. those mapping on the same reference contig/scaffold but on different positions (nUnrel); number of blocks mapping exactly on the same reference position (nSame); number of alternative blocks, completely contained within the analysed block (nAlts); number of larger blocks which contain the analysed block (nLarger); number of blocks whose alignment overlaps on the left or on the right witht the alignment of the test block (n_ovLeft and n_ovRight);
- start and end positions on reference (S1, E1), as well as on contig (S2, E2);
- reference and contig names (Reference, Contig);
- identity percent between contig and reference sequence (IDY);
- additional block-end annotations obtained from result of the alignment tool (Ambiguous, Best_group, BlockEnd, ContigName, ContigLength, Comment).

blockstats - provides statistics about alignment blocks, organized in groups according to their length. The following parameters are calculated: total bases (totLength) with corresponding genome fraction (genomeFrac), number of all (nBlocks), unique (nUnique) and repeated blocks (nRepeated), average block length (avgLength), N50, L50, N90, L90 values, number of assembly (nContigs) and reference (nRef) sequences on which blocks map, with corresponding average number of blocks (avgNBlockPerContig and avgNBlockPerRef); identity percentage between the assembly and the reference (IDY);

cntg - lists the contigs. At each contig corresponds a name (Contig), an ID and a length (LenCntg), as well as the sum of the lengths of all blocks used to generate it (Length), the percent of identity with the reference sequence (IDY), alternative blocks (nAlt) and larger contigs which contain that block (nLarger), the number of blocks used to generate the contig (nBlocks), how many are reverse orientated (nRev), start and end positions in contig coordinates (S3 and E3), contig-end reason (Type);

cntgstats - provides a statistical evaluation of contig features. Average, min and max value are reported for contig length (Lenght), contig length difference between the assembly and reference sequences (LenDiff), number of blocks employed to build the configs (nBlocks), identity percent between the assembly and reference sequences (IDY);

chromostats - provides a statistical evaluation of reference chromosome features. Average, min and max value are reported for chromosomes coverage percent by contigs (Coverage), number of mapped contigs (nContigs), identity percent between the assembly and reference sequences (IDY), number of mapping blocks (nBlocks), total length of the mapping blocks (Length);

chromo - lists all reference chromosomes and, for each of them, provides parameters describing their coverage by assembly contigs:
- reference sequence ID (refID), which is also a link which goes to the Chromomap table corresponding to the selected chromosome;
- leftmost block start (S1) and rightmost block end (E1);
- fraction od chromosome covered by contigs (Coverage);
- number of contigs mapping on each chromosome (nContigs) and their L90 and L50 (L90 and L50);
- percentage identity between contig and chromosome sequences (IDY);
- number of aligned blocks (nBlocks);
- total aligned block length (Length);
- reference chromosome (RefLen).

exchromo - describes coverage of reference sequences other than chromosomes. For each of them, the table reports the corresponding reference sequence (refID), how many blocks compose the alignment, start and stop coordinates on reference (S1 and E1) and the percent of identity with contigs (IDY), the length of mapped sequence (length) and the difference between it and the corresponding reference sequence (lenDiff). Other annotations are provided: translocation (nTransloc), local misassembly (nLocMis), indel, number of alternative blocks completely contained within the alignment (nAlts), number of larger blocks overlapping the same reference sequence (nLarger).

Plots
Plots provide graphical representations of the mapped assembly. They are based on optional parameters provided as input to the program during their generation. The following plots are available:

chromolen - bar plot displaying the cumulated length of the reference chromosome regions covered by assembly contigs.

contigsonchrs - describes the chromosomes (y axis) by means of the ordered sequence of alignment blocks reported as rectangles at the chromosome position (x axis) at which they map. The colours of the blocks correspond to the contig to which they belong. Horizontal lines connect two contiguous blocks from the same contig, interruptions are regions of the reference genome not covered by contigs.

blocksonchrs - same as above, except that the blocks are highlighted by vertical segments representing their start and stop positions.

gapsonchrs - displays the distribution of the gaps in each chromosome (y axis). Gaps are plotted as black rectangles located at the position they map on the reference chromosome (Mbases).

blocktotlen - bar plot describing the total length of blocks. The lenghts are calculated by limiting the analysis to blocks longer than, smaller than or between thresholds defined by the relative options (-a and -b) during plot generation.

nblock - bar plot describing the number of blocks. The numbers are calculated by limiting the analysis to blocks as before.

contigsonchr - a plot displaying the contigs (y axis) on a chromosome through the corresponding alignment blocks. The blocks are reported as coloured rectangles located at reference chromosome position (Mbases) on which they are mapped. Thinner lines connect not contiguous blocks of the same contigs.

chromomap - dotplot-like representation of contigs along the chromosome. Segments tagged with start (circles) and stop (triangles) are blocks located in the plot according to their position on contig (y axis) and chromosome (x axis). Dotted lines connect non-contiguous blocks on the reference chromosome.

mappedblocks - customized bubble plot where contigs are represented as bubbles with size depending on the number of bases involved in the contig/chromosome alignment. A white-red-blue-black gradient indicates highly to lowly integrated alignments.

Contact
Giovanni Paolella (giovanni.paolella@unina.it)

ChromoMapper web app

Export cli package




ChromoMapper 1.5.4 CEINGE-Biotecnologie Avanzate Franco Salvatore Università degli Studi di Napoli Federico II