inGAP－La Folie, Quo Vadis, Evo-Devo?

http://sites.google.com/site/nextgengenomics/ingap

inGAP is an integrated platform for next-generation sequencing project, the core function of which is to detect SNPs and indels using a Bayesian algorithm.
(1) It does not have any read length restriction. It can handle 454 sequencing and/or Illumina Solexa sequencing and/or Sanger sequencing data sets.
(2) It can detect most small indels in either single-end or paired-end data sets. Using the simulated data sets, inGAP could successfully identify 85%-98% of small indels with high accuracy (>99%).
(3) It has a strong capability to identify variants based on a relatively divergent reference genome, which bring it to a much wider application other than resequencing projects.
(4) It provides a user-friendly graphic interface, through which users can browse, search, check, classify, and even edit the identified variants.
(5) It can be used to detect intraspecific polymorphisms (including SNPs and indels) based on a pairwise comparison of multiple whole genomes.
(6) It employs a global heuristic searching approach to layout contigs based on one or more reference genomes.
(7) It also provides a handful of bioinformatic tools for read simulation, mutation incorporation, format conversion, etc.
(8) inGAP-sv (structural variation detection) will come soon.

•    Installation

inGAP is distributed under the GNU General Public License. The latest version can be downloaded at http://sourceforge.net/projects/ingap/
.
inGAP has been tested on PC, Mac, and Linux systems. To install InGap, type the following commands.
% tar xvfz inGAP_*.tar.gz

•    Demo

We provide a demo for the three applications of inGAP. Please click the "Demo" button.

•    Getting started

1.    Double click the icon “inGAP” at inGAP_HOME or start it from a command line:
% inGAP_HOME/inGAP
or
% java -mx2000m -jar inGAP_HOME/inGAP.jar

Then you will see the following panel:

2. To start from a new project, click on the “Read mapping and SNP detection”.

Then you will see the following panel:

2.1 Click on the “create project” to create a new project. Then you need to select a folder as the working spaces. All the generated files will be stored in this folder.

2.2 Import the read file or files (required), the reference genome file (required), and the annotation file for the reference genome (optional).

Note: The read file may be in the FASTQ format. If the read file is in the FASTA format, then make sure both the sequence file and the quality file use the same name. The reference sequence should be in the FASTA format. The reference gene table should be in the PTT format.
If users input multiple read files, inGAP will detect whether these read files are paired-end data automatically. Moreover, users are not required to provide the insert length for the paired-end data. inGAP will dynamically determine the insert length during the mapping process.

2.3 Set parameters for short read mapping.
Two mapping options are provided: one is BLASTN (slower but more accurate}; the other is BLAT (faster but may miss divergent reads).
min contig length: the minimum length for the reference genome. If users want to map reads to a unfinished genome (e.g. a collection of contigs), this parameter is used to discard short contigs.
min (match_len/read_len): minimum matching length divided by the read length
min alignment identity: minimum matching identity

Then you will see the progress bar for the running process.

2.4 After running, a pop up window will remind you to click on the “display”.
Then a AlignViewer will be loaded.

2’.    To load an old project, click on “Open”.
Then select your log file “PROJECT.log”.

2’’.    To load a demo project, click on “Demo”.

•    Application 1: SNP detection and viewer

In the AlignViewer, you can browse, classify, and edit the identified SNPs or Indels.

A: find a short sequence in the mapping reads; go to a certain position; zoom in; zoom out
B: ORFs in the reference genome
C: mapping reads
D: tandem repeat
E: SNPs and Indels as indicated by red triangle
F: position on the reference genome
G: the sliding window displayed in H
H: sequence alignment between the mapping reads and the reference sequence. SNPs and Indels are marked by red triangles.
I: overview of the mapping results; open and export files; the reference genome list.
J: options for display
K: SNPs summary
L: circular display for the read mapping

• Application 2: Multiple genome mapping

Input: multiple FastA-formated genome files
Output: The identified SNPs and indels will be used to build a phylogenetic tree.

• Application 3: Comparative genome assembly

Input: a FastA-formatted contig file and a single FastA-formatted reference genome
Output: Possible assemblies of these contigs using genetic algorithm optimization (Zhao et al. 2008. Nucl. Acids Res. 36, 3455-3462). Users can switch among five possible assemblies in the “Navigation” panel.