ArrowModel Blog

Agile Scoring



Agile Scoring Blog



Predictive Modeling from the Trenches


Receiver Operating Characteristic
Feb 23, 2007 | /jeff | Link

ROC curves were first used during World War II to graphically show the separation of radar signals from background noise. They are commonly used to graphically show the added value of any predictive model. To plot the receiver operating characteristic, or ROC curve, one plots B(s) vs. G(s) for all values of s. This curve goes from (0, 0) to (1, 1). The curve of an ideal model (complete separation) goes through (0, 1), while the curve of a totally useless model (no separation) is a straight diagonal line. The curve looks like a banana, hence the nickname banana chart.

Very strong separation Weak separation
Excellent model Mediocre model

The KS query from this post can be easily modified to return coordinates of the points on the ROC curve:

SELECT s
     , cdf.b "Sensitivity"
     , cdf.g "1-Specificity"
FROM ( SELECT a.s                                          "s"
            , SUM(distr.bad_cnt) /
              ( SELECT COUNT(*) FROM t WHERE outcome = 1 ) "b"
            , SUM(distr.good_cnt) /
              ( SELECT COUNT(*) FROM t WHERE outcome = 0 ) "g"
       FROM ( SELECT DISTINCT s FROM t ) a
       JOIN (
              SELECT s                "s"
                   , SUM(outcome)     "bad_cnt"
                   , SUM(1 - outcome) "good_cnt"
              FROM t
              GROUP BY s 
            ) distr
         ON distr.s <= a.s
         GROUP BY a.s 
     ) cdf
;

In the context of an ROC plot, B(s) is often called sensitivity or true positive fraction, and G(s) is called 1-specificity or false positive fraction.