Hydrophobic clusters, which are binary patterns constrained by the connectivity distance, are much more informative than simple binary patterns as they allow to reveal the 2D context in which the binary pattern is embedded (Hennetin, et al., 2003).
The 2D support is thus a convenient mean for revealing the 2D neighborhood of each amino acid. This minimal number of non-hydrophobic amino acids is called the connectivity distance and is linked to the distance separating an amino acid from its furthest close neighbor. Hence, two hydrophobic amino acids participate in two distinct clusters, if they are separated by at least 4 four non-hydrophobic amino acids or a proline, in the case of an alpha-helical support. The two-dimensional support dictates the segmentation rules of a sequence into clusters, exactly as spacers separate words in a text. Several 2D supports have been tested, revealing that the alpha-helix provides the best correspondence between the positions of hydrophobic clusters and regular secondary structures (Woodcock, et al., 1992), as well as the maximal 2D compactness (Callebaut, et al., 1997). Symbols are used to represent amino acids with peculiar structural properties (star for proline, black diamond for glycine, square and dotted square for threonine and serine, respectively, which may be either exposed or buried). Sequence segments separating hydrophobic clusters (at least 4 non hydrophobic amino acids or a proline) mainly correspond to loops (or hinge regions between globular domains). A dictionary of hydrophobic clusters, gathering the main structural features of the most frequent hydrophobic clusters has been published (Eudes, et al., 2007), helping the interpretation of HCA plots (). Hence, horizontal and vertical clusters are mainly associated with alpha helices and beta strands, respectively. The shape of the clusters is often typical of the associated secondary structures. These form clusters, which were shown to mainly correspond to regular secondary structures (alpha helices and beta strands), as examplified here with the 3D representation of the two clusters (only the hydrophobic side chains are represented). Hydrophobic amino acids (V, I, L, F, M, Y, W) are encircled and their contours are joined.
This one is cut along the horizontal axis and unrolled, in order to get the full environment of each amino acid, as it exists on the 1D sequence.
The protein sequence (1D), in which hydrophobic amino acids are represented as white letters, is written on a alpha-helix, displayed on a cylinder. Statistical studies performed on experimental 3D structures have shown that hydrophobic clusters mainly correspond to regular secondary structures, and have supported the relevance of the chosen hydrophobic alphabet, as well as the alpha-helix as 2D support for revealing this structural information (Woodcock, et al., 1992).įigure 1: Principle of the HCA plot, illustrated on a sequence segment of the alpha1-antitrypsin (adapted from (Callebaut, et al., 1997)). Hydrophobic Cluster Analysis (HCA) is based on a two-dimensional representation of the protein sequence, in which hydrophobic amino acids congregate into clusters (Callebaut, et al., 1997 Gaboriaud, et al., 1987 Figure 1). Hydrophobic Cluster Analysis GUIDLINES TO HYDROPHOBIC CLUSTER ANALYSIS (HCA) This guideline has been adapted from the supplementary data of the article by Faure and Callebaut, Bioinformatics (2013) in press.