CD-HIT downloads

CD-HIT's source code is hosted on Google Code. This site is for users who want new feacures and for developers.

CD-HIT has been hosted at This site has the tested and stable releases.

CD-HIT-454 download

Please first download CD-HIT from Google Code. CD-HIT package include cd-hit-454 clustering program. In addition, please also download the consensus building program, which is used after the clustering. The current release is cdhit-cluster-consensus-2013-03-27.tgz


Many users have been querying psi-cd-hit, which can be used to cluster proteins at very low identity cutoff (e.g. 30%) and cluster very long DNA input (e.g. complete genomes or assembled sequences). How you are invited to test this beta software, download psi-cd-hit-2012-0914.tar.gz here, Some basic usages are listed below. Important: you need to have BLAST programs (blastall, formatdb etc) installed in your $PATH.

For protein sequences
(1) Please use regular cd-hit to cluster your database down to 60% using commands such as
cd-hit -i db    -o db_90 -c 0.9 -n 5 -g 1 -G 0 -aS 0.8  -d 0 -p 1 -T 16 -M 0 > db_90.log
cd-hit -i db_90 -o db_60 -c 0.6 -n 4 -g 1 -G 0 -aS 0.8  -d 0 -p 1 -T 16 -M 0 > db_60.log

(a) option for local execution on a single computer:
  ./ -i db_60 -o db_30 -c 0.3 -ce 1e-6 -aS 0.8 -G 0 -g 1 -exec local -core 16
  ./ -i db_60 -o db_30 -c 0.3 -ce 1e-6 -aS 0.8 -G 0 -g 1 -exec local -core 16 -restart db_30.restart (to restart if program crash)
  ./ -help to see all options
(b) option for running on a cluster using a queueing system (qsub)
  ./ -i db_60 -o db_30 -c 0.3 -ce 1e-6 -aS 0.8 -G 0 -g 1 -exec qsub -host 8 -core 8 -shf qsub_sh_template

For very long DNA sequences
  ./ -i db.fna -o db90.fna -c 0.9 -G 1 -g 1 -prog megablast -s "-F F -e 0.000001 -b 100000 -v 100000" -exec local -core 32
  ./ -i db.fna -o db90.fna -c 0.9 -G 1 -g 1 -prog blastn -circle 1 -exec local -core 32
  ./ -help to see all options