http://sites.google.com/site/nextgengenomics/ingap
inGAP is an integrated platform for
next-generation sequencing project, the core function of which is to
detect SNPs and indels using a Bayesian algorithm.
(1) It does not
have any read length restriction. It can handle 454 sequencing and/or
Illumina Solexa sequencing and/or Sanger sequencing data sets.
(2)
It can detect most small indels in either single-end or paired-end data
sets. Using the simulated data sets, inGAP could successfully identify
85%-98% of small indels with high accuracy (>99%).
(3) It has a
strong capability to identify variants based on a relatively divergent
reference genome, which bring it to a much wider application other than
resequencing projects.
(4) It provides a user-friendly graphic
interface, through which users can browse, search, check, classify, and
even edit the identified variants.
(5) It can be used to detect
intraspecific polymorphisms (including SNPs and indels) based on a
pairwise comparison of multiple whole genomes.
(6) It employs a global heuristic searching approach to layout contigs based on one or more reference genomes.
(7) It also provides a handful of bioinformatic tools for read simulation, mutation incorporation, format conversion, etc.
(8) inGAP-sv (structural variation detection) will come soon.
• Installation
inGAP is distributed under the GNU General Public License. The latest version can be downloaded at http://sourceforge.net/projects/ingap/
.
inGAP has been tested on PC, Mac, and Linux systems. To install InGap, type the following commands.
% tar xvfz inGAP_*.tar.gz
• Demo
We provide a demo for the three applications of inGAP. Please click the "Demo" button.
• Getting started
1. Double click the icon “inGAP” at inGAP_HOME or start it from a command line:
% inGAP_HOME/inGAP
or
% java -mx2000m -jar inGAP_HOME/inGAP.jar
Then you will see the following panel:
2. To start from a new project, click on the “Read mapping and SNP detection”.
Then you will see the following panel:
2.1
Click on the “create project” to create a new project. Then you need to
select a folder as the working spaces. All the generated files will be
stored in this folder.
2.2 Import the read file or files
(required), the reference genome file (required), and the annotation
file for the reference genome (optional).
Note: The read file
may be in the FASTQ format. If the read file is in the FASTA format,
then make sure both the sequence file and the quality file use the same
name. The reference sequence should be in the FASTA format. The
reference gene table should be in the PTT format.
If users input
multiple read files, inGAP will detect whether these read files are
paired-end data automatically. Moreover, users are not required to
provide the insert length for the paired-end data. inGAP will
dynamically determine the insert length during the mapping process.
2.3 Set parameters for short read mapping.
Two
mapping options are provided: one is BLASTN (slower but more accurate};
the other is BLAT (faster but may miss divergent reads).
min contig length:
the minimum length for the reference genome. If users want to map reads
to a unfinished genome (e.g. a collection of contigs), this parameter
is used to discard short contigs.
min (match_len/read_len): minimum matching length divided by the read length
min alignment identity: minimum matching identity
Then you will see the progress bar for the running process.
2.4 After running, a pop up window will remind you to click on the “display”.
Then a AlignViewer will be loaded.
2’. To load an old project, click on “Open”.
Then select your log file “PROJECT.log”.
2’’. To load a demo project, click on “Demo”.
• Application 1: SNP detection and viewer
In the AlignViewer, you can browse, classify, and edit the identified SNPs or Indels.
A: find a short sequence in the mapping reads; go to a certain position; zoom in; zoom out
B: ORFs in the reference genome
C: mapping reads
D: tandem repeat
E: SNPs and Indels as indicated by red triangle
F: position on the reference genome
G: the sliding window displayed in H
H: sequence alignment between the mapping reads and the reference sequence. SNPs and Indels are marked by red triangles.
I: overview of the mapping results; open and export files; the reference genome list.
J: options for display
K: SNPs summary
L: circular display for the read mapping
• Application 2: Multiple genome mapping
Input: multiple FastA-formated genome files
Output: The identified SNPs and indels will be used to build a phylogenetic tree.
• Application 3: Comparative genome assembly
Input: a FastA-formatted contig file and a single FastA-formatted reference genome
Output:
Possible assemblies of these contigs using genetic algorithm
optimization (Zhao et al. 2008. Nucl. Acids Res. 36, 3455-3462). Users
can switch among five possible assemblies in the “Navigation” panel.
• Tools
inGAP provides a handful of bioinformatic tools for read simulation, mutation incorporation, format conversion, etc.
• Comparison between inGAP v1.9.3 and MAQ v0.7.1