cdhit_454: Identify artificial duplicates from metagenomic samples
Introduction
cdhit_454
meta-simulation
result
454 metagenomic reads file
In current version, the maximu size of uploaded files is 50MB
Upload a Fasta or Fastq file of 454 metagenomic reads :
Sequence Identity Parameters
Sequence identity cut-off
Sequence identity cut-off should be a float between 0.9 to 1, 0.98 means 98% identity
Algorithm Parameters
-g: sequence is clustered to the first cluster that meet the threshold
No
Yes
By cd-hit's default algorithm, a sequence is clustered to the first cluster that meet the threshold (fast mode). If set to 1, the program will cluster it into the most similar cluster that meet the threshold (accurate but slow mode)
-b: bandwidth of alignment
Band width of alignment, should be a positive integer
-D: max size per indel
max size per indel, default 1
-n: word_length
word_length, default 10, see user's guide for choosing it
Alignment Coverage Parameters
aL = R
a
/ R, AL = R- R
a
aS = S
a
/ S, AS = S- S
a
s = S
a
/ R
a
, S = R - S
-aL: minimal alignment coverage (fraction) for the longer sequence
-AL: maxium unaligned part (amino acids/bases) for the longer sequence
-aS: minimal alignment coverage (fraction) for the shorter sequence
-AS: maxium unaligned part (amino acids/bases) for the shorter sequence
Mail address for job checking
Your job running stat link will be sent to your mail box
Give your mail address:
Reference:
Beifang Niu, Limin Fu, Shulei Sun and Weizhong Li. Artificial and natural duplicates in pyrosequencing reads of metagenomic data.
BMC Bioinformatics 2010 11:187 doi:10.1186/1471-2105-11-187.
download PDF
Contact @
Beifang Niu
&
Weizhong Li