Heatmaps

Functions for plotting heat- and clustermaps with seaborn. They are rather specific, and there might be better options in R or with PyComplexHeatmap.

# Block that has to be executed for all.
import Heatmaps
import seaborn as sns
out_dir = 'docs/gallery/'
penguin_df = sns.load_dataset('penguins')   # Example data from seaborn.
Heatmaps.heatmap_cols(plot_df, cmap_cols, plot_out, row_label_col=None, column_labels=None, class_col=None, x_size=20, y_size=40, title='', annot_cols=None, width_ratios=None, wspace=0.4, rasterized=True, annot_s=10, ticksize=14, heat_ticksize=14, square=False, x_rotation=70, y_rotation=0, ax_fontweight='normal', row_label_first=False, formats=['pdf'])

Multiple heatmaps side-by-side but the same rows. Allows to show several metrics for the same rows with different colourmaps etc. E.g. for a list of top differential genes first a heatmap of baseline expression coloured by TPM, followed by a separate heatmap-block with the log2FC for the same genes.

Parameters:
  • cmap_cols

    Dictionary with one entry for each block. The keys don’t matter as long as they are unique. E.g. {0: {‘cols’: [‘Mean_Control_FM’, ‘Mean_FM_Mock_Ctrl’, ‘Mean_Tcf15_FM’, ‘Mean_FM_Tcf15_OE’],

    ’centre’: 0, (optional) ‘cmap’: ‘mako’, ‘cbar_label’: ‘TPM’, ‘vmax’: 200, (optional) ‘vmin’: 0, (optional) }

  • row_label_col – Column where to fetch the row-strings from.

  • column_labels – Alternative to using the column names as indicated in cmap_cols.

  • class_col – Column that should be added as separate first heatmap-block, should be categorical.

  • annot_cols – Dictionary of {“column”: “column with annotation string”} to write the strings in the value into the cells of columns.

  • width_ratios – Ratios of the widths of each heatmap-block.

  • wspace – Additional horizontal space between blocks.

  • rasterized – Whether to draw thin white lines around cells.

  • square – Whether cells should be squares.

  • row_label_first – Only write the row names for the first entry and skip for the others.

# Create a heatmap with several blocks that allow separate metrics to be shown side-by-side.
# For an example pick a few random rows and give them names so we can recognize them in the heatmaps.
sub_penguin_df = penguin_df.sample(5, random_state=12)
sub_penguin_df['Name'] = ['Pesto', 'Pickles', 'Popcorn', 'Pretzel', 'Pudding']
# Now we need to define which heatmap-blocks we want to show.
cmap_cols = {0: {'cols': ['bill_length_mm', 'bill_depth_mm'],
                 'cmap': 'mako',
                 'cbar_label': 'mm'},
             1: {'cols': ["body_mass_g"],
                 'cmap': 'viridis',  # Just for the sake of a different colour.
                 'cbar_label': 'g'}
             }
# We can additionally choose to add labels into cells, can be any other column.
annot_cols = {"body_mass_g": 'island'}
Heatmaps.heatmap_cols(sub_penguin_df, cmap_cols=cmap_cols, plot_out=out_dir+"SubPenguins", row_label_col='Name', class_col='species',
                 x_size=14, y_size=5, annot_cols=annot_cols, width_ratios=None, wspace=0.8,
                 annot_s=10, ticksize=14, heat_ticksize=14, square=False, x_rotation=0, formats=['png'])
_images/SubPenguins_MultiColHeatmap.png
Heatmaps.clustermap(plot_df, columns, row_column, cbar_label, class_col='', class_row='', title='', plot_out='', vmin=None, vmax=None, annot_cols=None, cmap='viridis', x_size=12, y_size=10, y_dendro=False, x_dendro=True, column_labels=None, row_cluster=True, col_cluster=True, centre=None, tick_size=12, mask=None, metric='euclidean', z_score=None, class_col_colour=None, formats=['pdf'])

Create a heatmap that can be additionally clustered with seaborn. CARE: the class_col_order and class_row parameters are not properly tested.

Parameters:
  • columns – List of the columns, should be present in the plot_df.

  • row_column – Column with which rows should be taken, set to ‘index’ to take the df.index.

  • class_col – Add a column into the plot with a categorical value that will be coloured and gets a separate colourbar.

  • class_col_colour – Allows a list that will be taken iteratively, or a dict with {label: colour}.

  • class_row – Same as class_col but add a row instead.

  • annot_cols – Dictionary {col: other-col} to add text into the entries from col taken from other_col.

  • y_dendro – Whether to plot the dendrogram on y.

  • x_dendro – Whether to plot the dendrogram on x.

  • column_labels – List that will replace the names from columns if given.

  • row_cluster – Whether to cluster the rows.

  • col_cluster – Whether co cluster the columns.

  • centre – Centre for the colormap, e.g. 0 for bwr.

  • mask – Must match the dimensions of the plot_df. If given will not show data where entries are True.

  • metric – Metric for clustering for the scipy function, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html#scipy.spatial.distance.pdist.

  • z_score – Whether to do z-score normalization before clustering. If None won’t do z-scoring, otherwise takes 0 or 1 for the axis along which the normalization should be done.

# Let's cluster some penguins based on their measured features, after removing entries with NAs.
non_na_penguins = penguin_df[~penguin_df.isna().values.any(axis=1)]
# Once with the original values and once with z-scoring the columns (flag takes None or the axis).
# The row names will be the index numbers, which are not meaningful here, but suffices for illustration.
for do_z in [None, 0]:
    cmap = 'mako' if do_z is None else 'bwr'
    centre = None if do_z is None else 0
    Heatmaps.clustermap(non_na_penguins, columns=['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm'],
                        row_column='index', z_score=do_z, centre=centre, cbar_label='mm', class_col='species',
                        title="Clustered penguins", plot_out=out_dir+"Penguins_"+str(do_z)+'ZScore', cmap=cmap,
                        x_size=8, y_size=10, row_cluster=True, col_cluster=True, formats=['png'])

clustermap1 clustermap2