OTU finder (cd-hit-otu-miseq)

cd-hit (http://cd-hit.org/) is a very widely used program for clustering and comparing large sets of protein sequences. This program clusters miseq sequences to find Operantional Taxonomic Units (OTUs) through multiple steps of clustering and filtering process to remove sequences with high sequencing errors.

The input need to be processed high quality reads (single end) or merged sequences from paired reads.

Inputs:
DNA FASTA file (required), can be in .gz format
Outputs:
output.zip will be produced with a README file describing the output files and format
Sequence file to upload (required):
Email (optional):
Parameters
show description
    -c OTU cutoff, default 0.97
    -m whether to perform chimera checking (true/false), default true
    -a abundance cutoff, default 0.00005
       small clusters < this size will be considiered as noise and will be removed
       if total input sequence is 50,000, then clusters < 2 (i.e. singletons) are removed
EXAMPLE
Sequence file to upload (required): input.fasta
Email (optional): you@example.com
Parameters -c 0.97 -a 0.00005 -m true
show description
    -c OTU cutoff, default 0.97
    -m whether to perform chimera checking (true/false), default true
    -a abundance cutoff, default 0.00005
       small clusters < this size will be considiered as noise and will be removed
       if total input sequence is 50,000, then clusters < 2 (i.e. singletons) are removed
Show an example
Submitting......
Program/Database References
1. "Ultrafast Clustering Algorithms for Metagenomic Sequence Analysis", W. Li, L. Fu, B. Niu, S. Wu & J. Wooley Briefings in Bioinformatics, (2012) 13 (6):656-668. doi: 10.1093/bib/bbs035
2. "WebMGA: a Customizable Web Server for Fast Metagenomic Sequence Analysis", S. Wu, Z. Zhu, L. Fu, B. Niu & W. Li BMC Genomics 2011, 12:444. PDF Pubmed Citations
Program/Database Version
Program: cd-hit-otu-miseq 1.0.0
Database: N/A