Syed Ahsan Subhan — Architecting Intelligence in FinTech & Health

Context

Last week, I explored how changing the decision threshold affects model behaviour.

This week, I removed the threshold entirely — and something clicked.

The model never actually made decisions.

It only produced scores.

Each prediction is a probability — a measure of how likely something is to be positive.

The “decision” only appears when we impose a threshold on top of those scores.

That reframe changes how models should be evaluated.

The important question is not:

“Is the prediction correct?”

It is:

“How well does the model rank what matters?”

Instead of focusing on a fixed decision threshold, I looked at the model’s raw outputs.

ROC Curve

Precision Recall Curve

These curves evaluate model behaviour across many possible thresholds, rather than locking evaluation to a single decision point.

In systems like fraud detection, credit risk, and affordability:

A fraud system, for example, doesn’t decide “fraud or not.”

It ranks transactions by risk — and different actions are applied at different levels.

A model can look accurate, behave differently across thresholds, and still be poor at ranking.

Accuracy alone does not capture that.

Next, I’ll explore ROC-AUC and Precision–Recall AUC more explicitly — and how they help compare models beyond a single decision point.