© 2022 Janghyun Choi
Docer CC BY-NC-SA 4.0

Efficient Analysis and Visualization of ChIP-seq Data via Genomic Databases

The ‘ngs.plot’ tool is a specialized resource designed to enhance the visualization and analysis of next-generation sequencing (NGS) data. This tool is particularly adept at quickly accessing and integrating large genomic databases to plot dense NGS data against a backdrop of functional genomic elements. One of the primary advantages of ‘ngs.plot’ is its ability to manage and visualize complex datasets with ease and precision. Users can rapidly generate high-quality plots of genomic regions of interest, which is critical for interpreting interactions and expression levels in various genomic studies, such as ChIP-seq and RNA-seq analyses. This protocol was created based on ngs.plot version 2.61 running on a system equipped with an Intel 10th generation i9-10910 processor and 48GB of memory. The test environment includes Python version 3.8.5, Perl version 5.26.2, and R version 4.4.0 under macOS 12.4 environment.

Step-by-Step Installation

EnvTest gith version Status

1. Download ngs.plot package from GDrive (Official) to a desired folder and exract it as follow.

$ cd downloads
$ tar xvfz ngsplot-2.61.tar.gz
$ mv ngsplot-2.61 /Users/jchoi/ngsplot

2. Add ngsplot executables to your path under zshrc.

$ nano ~/.zshrc

# Add syntax as follows;
export PATH="/your/path/to/ngsplot/bin:$PATH"
export NGSPLOT="/your/path/to/ngsplot"

# For example
export PATH="/Users/Desktop/ChIP_seq/ngsplot/bin:$PATH"
export NGSPLOT="/Users/Desktop/ChIP_seq/ngsplot"

3. save and reload shell environment as follow.

$ source ~/.zshrc

4. Install ngs.plot dependent libraries in R.

install.packages("doMC", dep=T)
install.packages("caTools", dep=T)
install.packages("utils", dep=T)
BiocManager::install("BSgenome")
BiocManager::install("Rsamtools")
BiocManager::install("ShortRead")

5. Prepare Genome Database.

  • Download desired genome database from GDrive (Official).
  • Install this file as follows:
$ ngsplotdb.py list # check current list
ID       Assembly     Species      EnsVer   NPVer    InstalledFeatures
hg19	GRCh37	homo_sapiens	75.0	3.0	cgi,exon,genebody,tss,tes
IRGSP-1	IRGSP-1.0	oryza_sativa	21.0	3.0	exon,genebody,tss,tes

$ ngsplotdb.py install /Users/jchoi/Downloads/ngsplotdb_mm10_75_3.00.tar.gz # install databse

$ ngsplotdb.py list # check the installed database
ID       Assembly     Species      EnsVer   NPVer    InstalledFeatures
hg19	GRCh37	homo_sapiens	75.0	3.0	cgi,exon,genebody,tss,tes
IRGSP-1	IRGSP-1.0	oryza_sativa	21.0	3.0	exon,genebody,tss,tes
mm10	GRCm38	mus_musculus	75.0	3.0	cgi,exon,genebody,tss,tes

This tool may occasionally experience malfunctions. These are predominantly attributed to issues with Python syntax, and the author swiftly addresses and resolves these concerns by issuing updated source code. Should the tool cease to function abruptly, it is advisable to consult the creator’s GitHub repository. gith

Running ngs.plot

  • Use the following command to visulize enriched peaks from ChIP-seq data with ngs.plot:

      $ ngs.plot.r -G <genome> -R <region> -C <IndexedInput.bam> -O <outputFolder> <OptionalParameters>
    
  • In these commands,

    ParameterDescription
    -G <Genome>Specifies the genome name. Use ngsplotdb.py list to show available genomes. See the Step-by-step Installation section.
    -R <region>Specifies genomic regions to plot. Possible values for region are tss, tes, genebody, exon, cgi, and enhancer.
    -C <IndexedInput.bam>Specifies the input file. This file reqires an indexed BAM file for single plot or a specific configuration file for multiplot (See the below).
    -O <outputFolder>Specifies the output file. Several files will be generated.
  • Prepare a configuration file and save as plaintext (name.txt) as follow:

      # If you want to specify the gene list as "genome", use "-1".
      # Use TAB to separate the three columns: coverage file<TAB>gene list<TAB>title
      # "title" will be shown in the figure's legend.
      hesc.H3k4me3.rmdup.sort.bam     high_expressed_genes.txt         "High"
      hesc.H3k4me3.rmdup.sort.bam     medium_expressed_genes.txt       "Med"
      hesc.H3k4me3.rmdup.sort.bam     low_expressed_genes.txt          "Low"
    
  • Major Optional Parameters (<OptionalParameter>)

    ParameterDescription
    -T <Title>Specifies image title (Default: NoName).
    -L <int>Specifies size in bps. By default, when -R tss, tes, genebody, -L 2000; when -R exon, cgi, -L 500.
    -P <int>Specifies 0 to use all CPUs that are detected on your machine.
    -FL <int>This parameter calculates physical instead of read coverage. This will produce figures that contain more accurate representation of ChIP enrichment.
    You should set this value equal to the average fragment length in your sequencing library (Default: 150).
    -LEG <int>Control the display of legend: 1 (default) or 0 (No legend).
    -VLN <int>Control the display of vertical lines: 1 (default) or 0 (No line).
  • For more detailed parameter options, see the ProgramArguments101.

Example Code

  • Here is an example command to depict peaks distribution:

      # Single plot
      $ ngs.plot.R -G mm10 -R genebody -C /Users/jchoi/Desktop/R_FINAL.bam -O /Users/jchoi/Desktop/NFIB \
      -T NFIB -L 2000 -SE 0 -FL 100 -LEG 0 -VLN 0
    
      # Multiplot
      ## Configuration file, H3K4_K27.txt
      Final.sort.H2O2_H3K27me3.bam:Final.sort.Unt_IPT.bam	-1	"H3K27me3"
      Final.sort.H2O2_H3K4me3.bam:Final.sort.Unt_IPT.bam	-1	"H3K4me3"
      ## Command
      $ ngs.plot.r -G hg19 -R genebody -C H3K4_K27.txt -O /Users/jchoi/Desktop/Bivalency \
      -L 3000 -SE 0 -FL 100 -LEG 0 -VLN 0 -YAS -0.4,1.5 -SC -2,4
    

Example Plot

The graph created using the second (multiplot) of the example codes above, refined using Illustrator 2024.

ngsplot

Citation

ngsplot
  1. Shen, L., Shao, N., Liu, X., & Nestler, E. (2014). ngs. plot: Quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC genomics, 15, 1-14. DOI