Classification Performance Curves¶

Functions to create Precision-Recall and receiver operator characteristic curves and getting the respective area under the curve.

ClassificationPerformanceCurves.classification_plotter(df, sig_col, score_cols, add_random=False, steps=10000, mode='pr', output_path='', colours='glasbey', no_plot=False, recall_start=0, zorder=None, title_tag='', legend_s=18, legend_out=False, x_size=8, y_size=8, font_s=14, line_styles=None, colour_by_threshold=False, threshold_cmap='viridis', formats=['pdf'])¶

Plots a Precision-Recall curve or a receiver operator characteristic curve based on a DataFrame and calculates the area under the curve.

Parameters:

df – Pandas DataFrame with each row being an entry that should be classified.
sig_col – Name of the column that identifies the true entries.
score_cols – List of column names in the DataFrame for which the curves should be plotted, all in the same plot. It is assumed that a high score means a higher predicted likelihood to be true.
add_random – If a curve should be added that randomly orders the entries. For specifying the colour of ‘Random’, add a colour to the colour list, otherwise it will be grey.
steps – Number of steps into which the range between the lowest and highest score will be separated, and for each the performance calculated.
mode – ‘pr’ to get a Precision-Recall curve, otherwise a ROC curve.
no_plot – To only get the list of performance values.
recall_start – In case it is known that a certain range of the recall is not covered, limit the whole calculation and plotting to [recall_start, 1].
zorder – List of integers defining the zorder of the score_cols.
colour_by_threshold – If True, do the plot as scatter, and colour each dot by the threshold it was taken from. Uses the range from all score_cols.
threshold_cmap – The colourmap which should be used for the scatter when colour_by_threshold is True.

Returns:

auc_output: List of the score_cols and the respective AUPRC.
performance_dict: Dictionary with {score_col: [Recall, Precision, threshold] for mode ‘pr’, otherwise [FPR, TPR, threshold] for all tested thresholds}.

Return type:

tuple

*ClassificationPerformanceCurves.classification_plotter*