What is a key consideration when preparing a presentation intended for analysts?
A.
Describe how to implement the model
B.
Provide talking points to promote or evangelize the project
C.
Emphasize the business benefits of implementing the model
D.
Focus on clean simple-to-understand visuals
Answer:
D
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 2
A logistic regression model is built to determine the probability of a credit card borrower defaulting on a credit loan. A threshold value of 0.3 is selected. Which statement can be used to predict a borrower will default?
A.
If probability > 0.1, then predict the borrower will default
B.
If probability < 0.1, then predict the borrower will default
C.
If probability > 0.3, then predict the borrower will default
D.
If probability < 0.3, then predict the borrower will default
Answer:
C
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 3
What are the two data categories that represent qualitative data?
A.
Ordinal and interval
B.
Nominal and ordinal
C.
Ratio and interval
D.
Nominal and ratio
Answer:
B
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 4
In hypothesis testing, when does a Type I error occur?
A.
Null hypothesis is rejected when it is actually false
B.
Null hypothesis is rejected when it is actually true
C.
Null hypothesis is accepted when it is actually false
D.
Null hypothesis is accepted when it is actually true
Answer:
B
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 5
You have been given a task to improve sales force compensation of your organization. As a result of a study, your team decides to classify personnel as follows: ● Did not meet quota ● Met quota ● Exceeded 150% of quota In which data analytics lifecycle phase should you define these categories for analysis purposes?
A.
Model building
B.
Communicate results
C.
Operationalize
D.
Model planning
Answer:
D
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 6
A decision tree is being built. An internal node is being evaluated for partitioning on variables A and B. The entropy of the internal node is 0.8. The entropy for each of the variables is as follows: ● Variable A: 0.5 ● Variable B: 0.4 Which variable will be used to partition the data and what is the information gain?
A.
Variable B; information gain is 0.1
B.
Variable B; information gain is 0.4
C.
Variable A; information gain is 0.1
D.
Variable A; information gain is 0.5
Answer:
B
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 7
In association rules, given items X and Y, what does lift measure?
A.
Percentage of transactions that contain an itemset with X
B.
Percentage of transactions with Xthat also contain Y
C.
Difference in the probability ofX and Y appearing together compared with expectations as if they were statistically independent
D.
How many times more often X and Y occur together than expected if they were statistically independent, expressed as a ratio
Answer:
D
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 8
What are categorized as cluster and workflow management tools for Hadoop?
A.
Flume, Sqoop, and Storm
B.
Drill, Hive, and HBase
C.
Spark, Tez, and Cassandra
D.
Ambari, Oozie, and Zookeeper
Answer:
D
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 9
Consider the following text: “Stop!” he shouted. “Don’t go there!” What set of words result from using a tokenizer for punctuation on the text?
A.
Stop, he, shouted, don. t. go. there
B.
Stop, he shouted, don. t go there
C.
Stop, he shouted, dpnt go there
D.
Stop, he, shouted, dpnt. go. there
Answer:
D
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 10
Which phase of the data analytic lifecycle includes conducting project sponsor interviews and drafting a problem statement?
A.
Operationalize
B.
Model planning
C.
Model building
D.
Discovery
Answer:
D
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 11
In time series analysis, what statement describes a MA(q) process?
A.
Current deviation from the time series mean depends on the q previous deviations
B.
Current deviation from the time series mean depends on the quotient q
C.
Current time series value depends on the q previous values
D.
Current time series value depends on the fitted polynomial of order q
Answer:
A
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 12
After running a density plot you realize that the data has a long tail to the right. What can you do to make the dataset more normally distributed?
A.
Use a scatter plot to obtain a better picture
B.
Use a histogram to obtain a better picture
C.
Apply a square transformation
D.
Apply a logarithmic transformation
Answer:
D
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 13
Which R function plots a distribution of a single variable along two different axes?
A.
table()
B.
summaryQ
C.
density ()
D.
rug()
Answer:
D
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 14
What action occurs during feature selection in the model building phase of the data analytics lifecycle?
A.
Create new combinations of attributes
B.
Overfit the model to improve prediction accuracy
C.
Identify the most useful input variables
D.
Select a superset of variables to shorten training times
Answer:
C
User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
0/ 1000
Question 15
Which Hadoop service responds to requests for compute and memory resources?