Software and servers
CD-HIT is a one of the most widely used bioinformatics programs today for clustering and comparing large sets of protein or nucleotide sequences. It can be 2 to 3 orders of magnitude faster than other methods like BLASTCLUST. It is routinely used in many popular databases, such as UniProt and PDB.
CD-HIT Web Sever provides a web interface to run the CD-HIT programs and also to view and explore the results. It also provides pre-calculated sequence clusters for several widely used public databases (NR, Swissprot, PDB etc) on a regular basis for download.
WebMGA is a web server dedicated for fast metagenomic analysis. It provides a large variety of analyses including clustering, rRNA/tRNA prediction, orf prediction, function/pathway annotation, sequence information, quality control, filtering sequence, taxonomy binning, OTU finder and format conversion. Users can use either web browser/script for single analysis or script for complicated analysis.
CD-HIT-OTU is a very fast and accurate method in clustering 16S rRNA tags from either 454 sequences or Illumina reads into OTUs.
RAMMCAP (Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline) is a metagenomic analysis package developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort.
CD-HIT-454 is a program to identify artificial duplicates from raw 454 sequencing reads, including exact duplicates and near identical duplicates.
MGAviewer is a graphic interface for visualization of alignment results between metagenomic samples and reference microbial genomes.
Meta-RNA is a program for the identification of rRNA genes from metagenomic fragments based on hidden Markov models (HMMs). This program provides rRNA gene predictions with high sensitivity and specificity.
CD-HIT-FP is a very fast clustering program to find similar compounds from a very large small molecule library. It just takes hours to handle a library with milliions of compounds. Each compound in the library is represented by its 2D fingerprint, a long binary bit string of "1" or "0", with each bit representing the presence or absence of certain 2D features.
FR-HIT is a Very Fast Program to Recruit Metagenomic Reads to Homologous Reference Genomes.
FuLinkA is a recently developed sequence assembler based on a probabilistic model.