Refer to the ROC curve:
As you move along the curve, what changes?
D
When mean imputation is performed on data after the data is partitioned for honest assessment,
what is the most appropriate method for handling the mean imputation?
B
An analyst generates a model using the LOGISTIC procedure. They are now interested in getting the
sensitivity and specificity statistics on a validation data set for a variety of cutoff values.
Which statement and option combination will generate these statistics?
B
In partitioning data for model assessment, which sampling methods are acceptable? (Choose two.)
A,C
Which SAS program will divide the original data set into 60% training and 40% validation data sets,
stratified by county?
C
Refer to the lift chart:
At a depth of 0.1, Lift = 3.14. What does this mean?
A
Refer to the lift chart:
What does the reference line at lift = 1 corresponds to?
B
Suppose training data are oversampled in the event group to make the number of events and non-
events roughly equal. A logistic regression is run and the probabilities are output to a data set NEW
and given the variable name PE. A decision rule considered is, "Classify data as an event if probability
is greater than 0.5." Also the data set NEW contains a variable TG that indicates whether there is an
event (1=Event, 0= No event).
The following SAS program was used.
What does this program calculate?
B
Refer to the exhibit:
The plots represent two models, A and B, being fit to the same two data sets, training and validation.
Model A is 90.5% accurate at distinguishing blue from red on the training data and 75.5% accurate at
doing the same on validation dat
a. Model B is 83% accurate at distinguishing blue from red on the training data and 78.3% accurate at
doing the same on the validation data.
Which of the two models should be selected and why?
D
Assume a $10 cost for soliciting a non-responder and a $200 profit for soliciting a responder. The
logistic regression model gives a probability score named P_R on a SAS data set called VALID. The
VALID data set contains the responder variable Pinch, a 1/0 variable coded as 1 for responder.
Customers will be solicited when their probability score is more than 0.05.
Which SAS program computes the profit for each customer in the data set VALID?
A
In order to perform honest assessment on a predictive model, what is an acceptable division
between training, validation, and testing data?
D
Refer to the exhibit:
Based upon the comparative ROC plot for two competing models, which is the champion model and
why?
B
A marketing campaign will send brochures describing an expensive product to a set of customers.
The cost for mailing and production per customer is $50. The company makes $500 revenue for each
sale.
What is the profit matrix for a typical person in the population?
C
A confusion matrix is created for data that were oversampled due to a rare target.
What values are not affected by this oversampling?
D
This question will ask you to provide missing code segments.
A logistic regression model was fit on a data set where 40% of the outcomes were events (TARGET=1)
and 60% were non-events (TARGET=0). The analyst knows that the population where the model will
be deployed has 5% events and 95% non-events. The analyst also knows that the company's profit
margin for correctly targeted events is nine times higher than the company's loss for incorrectly
targeted non-event.
Given the following SAS program:
What X and Y values should be added to the program to correctly score the data?
B