A Data Scientist is training a multilayer perception (MLP) on a dataset with multiple classes. The target class of interest is
unique compared to the other classes within the dataset, but it does not achieve and acceptable recall metric. The Data
Scientist has already tried varying the number and size of the MLPs hidden layers, which has not significantly improved the
results. A solution to improve recall must be implemented as quickly as possible.
Which techniques should be used to meet these requirements?
C
A Machine Learning Specialist is working with a large company to leverage machine learning within its products. The
company wants to group its customers into categories based on which customers will and will not churn within the next 6
months. The company has labeled the data available to the Specialist.
Which machine learning model type should the Specialist use to accomplish this task?
B
Explanation:
The goal of classification is to determine to which class or category a data point (customer in our case) belongs to. For
classification problems, data scientists would use historical data with predefined target variables AKA labels (churner/non-
churner) answers that need to be predicted to train an algorithm. With classification, businesses can answer the following
questions:
Will this customer churn or not?
Will a customer renew their subscription?
Will a user downgrade a pricing plan?
Are there any signs of unusual customer behavior?
Reference: https://www.kdnuggets.com/2019/05/churn-prediction-machine-learning.html
A company that manufactures mobile devices wants to determine and calibrate the appropriate sales price for its devices.
The company is collecting the relevant data and is determining data features that it can use to train machine learning (ML)
models. There are more than 1,000 features, and the company wants to determine the primary features that contribute to the
sales price.
Which techniques should the company use for feature selection? (Choose three.)
C D F
Explanation:
Reference: https://towardsdatascience.com/an-overview-of-data-preprocessing-features-enrichment-automatic-feature-
selection-60b0c12d75ad
https://towardsdatascience.com/feature-selection-using-python-for-classification-problem-
b5f00a1c7028#:~:text=Univariate%20feature%20selection%20works%20by,analysis%20of%20variance%20
(ANOVA).&text=That%20is%20why%20it%20is%20called%20'univariate' https://arxiv.org/abs/2101.04530
A telecommunications company is developing a mobile app for its customers. The company is using an Amazon SageMaker
hosted endpoint for machine learning model inferences.
Developers want to introduce a new version of the model for a limited number of users who subscribed to a preview feature
of the app. After the new version of the model is tested as a preview, developers will evaluate its accuracy. If a new version
of the model has better accuracy, developers need to be able to gradually release the new version for all users over a fixed
period of time.
How can the company implement the testing model with the LEAST amount of operational overhead?
D
A Machine Learning Specialist is assigned to a Fraud Detection team and must tune an XGBoost model, which is working
appropriately for test data. However, with unknown data, it is not working as expected. The existing parameters are provided
as follows.
Which parameter tuning guidelines should the Specialist follow to avoid overfitting?
B
A company uses a long short-term memory (LSTM) model to evaluate the risk factors of a particular energy sector. The
model reviews multi-page text documents to analyze each sentence of the text and categorize it as either a potential risk or
no risk. The model is not performing well, even though the Data Scientist has experimented with many different network
structures and tuned the corresponding hyperparameters.
Which approach will provide the MAXIMUM performance boost?
C
A Machine Learning Specialist has completed a proof of concept for a company using a small data sample, and now the
Specialist is ready to implement an end-to-end solution in AWS using Amazon SageMaker. The historical training data is
stored in Amazon RDS.
Which approach should the Specialist use for training a model using that data?
B
A Machine Learning Specialist is building a logistic regression model that will predict whether or not a person will order a
pizza. The Specialist is trying to build the optimal model with an ideal classification threshold.
What model evaluation technique should the Specialist use to understand how different classification thresholds will impact
the model's performance?
A
Explanation:
Reference: https://docs.aws.amazon.com/machine-learning/latest/dg/binary-model-insights.html
The chief editor for a product catalog wants the research and development team to build a machine learning system that can
be used to detect whether or not individuals in a collection of images are wearing the company's retail brand. The team has
a set of training data.
Which machine learning algorithm should the researchers use that BEST meets their requirements?
D
A data scientist uses an Amazon SageMaker notebook instance to conduct data exploration and analysis. This requires
certain Python packages that are not natively available on Amazon SageMaker to be installed on the notebook instance.
How can a machine learning specialist ensure that required packages are automatically available on the notebook instance
for the data scientist to use?
B
Explanation:
Reference: https://towardsdatascience.com/automating-aws-sagemaker-notebooks-2dec62bc2c84