CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences. CD-HIT was originally developed by Dr. Weizhong Li at Dr. Adam Godzik's Lab at the Burnham Institute (now Sanford-Burnham Medical Research Institute)

CD-HIT is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset.

The CD-HIT package has CD-HIT, CD-HIT-2D, CD-HIT-EST, CD-HIT-EST-2D, CD-HIT-454, CD-HIT-PARA, PSI-CD-HIT, CD-HIT-OTU and over a dozen scripts.

The usage of other programs and scripts can be found in CD-HIT user's guide.

CD-HIT is currently maintained by the Dr. Li's group (http://weizhong-lab.ucsd.edu/) with support from National Center for Research Resources (Grant # 1R01RR025030).

NEWS


(June 2011) cd-hit-otu is a special cd-hit extension for clustering rRNA tags into OTUs. It is very fast and very accurate.
(July 2010) cdhit@GoogleCode is a new Google Code project created for releasing the latest development version of CDHIT. Usually new minor versions will be released as soon as bug fixings or improvements become available.
(Oct. 2009) CDHIT-454 is a new program to identify exact duplicates and near identical duplicates in pyrosequencing reads: CDHIT-454 (websever), CDHIT-454 (standalone).
(September 2009) CD-HIT web server is now available to run cd-hit or download some pre-calculated clusters.
(December 2006) I made some major updates including several very useful new options for clustering such as alignment coverage control, switch between local and global sequence identity. Please check the newest release and have a try. Weizhong Li.
(February 2006) I recently developed several new programs based on CD-HIT's algorithm: CD-HIT-2D, CD-HIT-EST and CD-HIT-EST-2D. CD-HIT-2D compares two protein sets and report similar matches between them. CD-HIT-EST and CD-HIT-EST-2D are nucleotide versions. Weizhong Li.