Classification Performance Curves

Functions to create Precision-Recall and receiver operator characteristic curves and getting the respective area under the curve.

ClassificationPerformanceCurves.classification_plotter(df, sig_col, score_cols, add_random=False, steps=10000, mode='pr', output_path='', colours='glasbey', no_plot=False, recall_start=0, zorder=None, title_tag='', legend_s=18, legend_out=False, x_size=8, y_size=8, font_s=14, line_styles=None, colour_by_threshold=False, threshold_cmap='viridis', formats=['pdf'])

Plots a Precision-Recall curve or a receiver operator characteristic curve based on a DataFrame and calculates the area under the curve.

Parameters:
  • df – Pandas DataFrame with each row being an entry that should be classified.

  • sig_col – Name of the column that identifies the true entries.

  • score_cols – List of column names in the DataFrame for which the curves should be plotted, all in the same plot. It is assumed that a high score means a higher predicted likelihood to be true.

  • add_random – If a curve should be added that randomly orders the entries. For specifying the colour of ‘Random’, add a colour to the colour list, otherwise it will be grey.

  • steps – Number of steps into which the range between the lowest and highest score will be separated, and for each the performance calculated.

  • mode – ‘pr’ to get a Precision-Recall curve, otherwise a ROC curve.

  • no_plot – To only get the list of performance values.

  • recall_start – In case it is known that a certain range of the recall is not covered, limit the whole calculation and plotting to [recall_start, 1].

  • zorder – List of integers defining the zorder of the score_cols.

  • colour_by_threshold – If True, do the plot as scatter, and colour each dot by the threshold it was taken from. Uses the range from all score_cols.

  • threshold_cmap – The colourmap which should be used for the scatter when colour_by_threshold is True.

Returns:

  • auc_output: List of the score_cols and the respective AUPRC.

  • performance_dict: Dictionary with {score_col: [Recall, Precision, threshold] for mode ‘pr’, otherwise [FPR, TPR, threshold] for all tested thresholds}.

Return type:

tuple

*ClassificationPerformanceCurves.classification_plotter*

pic1 pic2 pic3