CD-HIT Suite: Biological Sequence Clustering and Comparison
- this server at UCSD is not under regular maintenance, you may try the server if there is an issue


Sequence file and databases.
    Load Query Fasta file from your computer:
       Incorporate annotation info at header line.

Sequence Identity Parameters
Sequence identity cut-off.

Algorithm Parameters
-G: use global sequence identity. NoYes
-g: sequence is clustered to the best cluster that meet the threshold. NoYes
-b: bandwidth of alignment.

Alignment Coverage Parameters.
-aL: minimal alignment coverage (fraction) for the longer sequence
-AL: maximum unaligned part (amino acids/bases) for the longer sequence
-aS: minimal alignment coverage (fraction) for the shorter sequence
-AS: maximum unaligned part (amino acids/bases) for the shorter sequence
-s: minimal length similarity (fraction)
-S: maximum length difference in amino acids/bases(-S)

Mail address for job checking.
Give your mail address:
    


Reference:
  1. Ying Huang, Beifang Niu, Ying Gao, Limin Fu and Weizhong Li. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics, 2010(26): 680-682.full text
  2. Weizhong Li and Adam Godzik. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics, 2006(22): 1658-1659. full text
  3. Weizhong Li, Lukasz Jaroszewski and Adam Godzik. Tolerating some redundancy significantly speeds up clustering of large protein databases. Bioinformatics, 2002(18): 77-82. full text
  4. Weizhong Li, Lukasz Jaroszewski and Adam Godzik. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics, 2001(17): 282-283. full text
Contact @Weizhong Li