plotRisk               package:rattle               R Documentation

_P_l_o_t _a _r_i_s_k _c_h_a_r_t

_D_e_s_c_r_i_p_t_i_o_n:

     Plots a Rattle Risk Chart. Such a chart has been developed in a
     practical context to present the performance of data mining models
     to clients, plotting a caseload against performance, allowing a
     client to see the tradeoff between coverage and performance.

_U_s_a_g_e:

     plotRisk(cl, pr, re, ri = NULL, title = NULL,
         show.legend = TRUE, xleg = 60, yleg = 55,
         optimal = NULL, optimal.label = "", chosen = NULL, chosen.label = "",
         include.baseline = TRUE, dev = "", filename = "", show.knots = NULL,
         risk.name = "Revenue", recall.name = "Adjustments",
         precision.name = "Strike Rate")

_A_r_g_u_m_e_n_t_s:

      cl: a vector of caseloads corresponding to different probability
          cutoffs. Can be either percentages (between 0 and 100) or
          fractions (between 0 and 1).

      pr: a vector of precision values for each probability cutoff. Can
          be either percentages (between 0 and 100) or fractions
          (between 0 and 1).

      re: a vector of recall values for each probability cutoff. Can be
          either percentages (between 0 and 100) or fractions (between
          0 and 1).

      ri: a vector of risk values for each probability cutoff. Can be
          either percentages (between 0 and 100) or fractions (between
          0 and 1).

   title: the main title to place at the top of the plot.

show.legend: whether to display the legend in the plot.

    xleg: the x coordinate for the placement of the legend.

    yleg: the y coordinate for the placement of the legend.

 optimal: a caseload (percentage or fraction) that represents an
          optimal performance point which is also plotted. If instead
          the value is 'TRUE' then the optimal point is identified
          internally (maximum valud for
          '(recall-casload)+(risk-caseload)') and plotted.

optimal.label: a string which is added to label the line drawn as the
          optimal point.

  chosen: a caseload (percentage or fraction) that represents a user
          chosen optimal performance point which is also plotted.

chosen.label: a string which is added to label the line drawn as the
          chosen point.

include.baseline: if TRUE (the default) then display the diagonal
          baseline.

     dev: a string which, if supplied, identifies a device type as the
          target for the plot. This might be one of 'wmf' (for
          generating a Windows Metafile, but only available on
          MS/Windows), 'pdf', or 'png'.

filename: a string naming a file. If 'dev' is not given then the
          filename extension is used to identify the image format as
          one of those recognised by the 'dev' argument.

show.knots: a vector of caseload values at which a vertical line should
          be drawn. These might correspond, for example, to individual
          paths through a decision tree, illustrating the impact of
          each path on the caseload and performance.

risk.name: a string used within the plot's legend that gives a name to
          the risk. Often the risk is a dollar amount at risk from a
          fraud or from a bank loan point of view, so the default is
          'Revenue'.

recall.name: a string used within the plot's legend that gives a name
          to the recall. The recall is often the percentage of cases
          that are positive hits, and in practise these might
          correspond to known cases of fraud or reviews where some
          adjustment to perhaps a incom tax return or application for
          credit had to be made on reviewing the case, and so the
          default is 'Adjustments'.

precision.name: a string used within the plot's legend that gives a
          name to the precision. A common name for precision is 'Strike
          Rate', which is the default here.

_D_e_t_a_i_l_s:

     Caseload is the percentage of the entities in the dataset covered
     by the model at a particular probability cutoff, so that with a
     cutoff of 0, all (100%) of the entities are covered by the model.
     With a cutoff of 1 (0%) no entities are covered by the model. A
     diagonal line is drawn to represent a baseline random performance.
     Then the percentage of positive cases (the recall) covered for a
     particular caseload is plotted, and optionally a measure of the
     percentage of the total risk that is also covered for a particular
     caseload may be plotted. Such a chart allows a user to select an
     appropriate tradeoff between caseload and performance. The charts
     are similar to ROC curves. The precision (i.e., strike rate) is
     also plotted.

_A_u_t_h_o_r(_s):

     Graham.Williams@togaware.com

_R_e_f_e_r_e_n_c_e_s:

     Package home page: <URL: http://rattle.togaware.com>

_S_e_e _A_l_s_o:

     'evaluateRisk', 'genPlotTitleCmd'.

_E_x_a_m_p_l_e_s:

     ## this is usually used in the context of the evaluateRisk function
     ## Not run: ev <- evaluateRisk(predicted, actual, risk)

     ## imitate this output here
     ev <- NULL
     ev$Caseload  <- c(1.0, 0.8, 0.6, 0.4, 0.2, 0)
     ev$Precision <- c(0.15, 0.18, 0.21, 0.25, 0.28, 0.30)
     ev$Recall    <- c(1.0, 0.95, 0.80, 0.75, 0.5, 0.0)
     ev$Risk      <- c(1.0, 0.98, 0.90, 0.77, 0.30, 0.0)

     ## plot the Risk Chart
     plotRisk(ev$Caseload, ev$Precision, ev$Recall, ev$Risk,
              chosen=60, chosen.label="Pr=0.45")

     ## Add a title
     eval(parse(text=genPlotTitleCmd("Sample Risk Chart")))

