amazon AWS Certified Machine Learning Specialty - MLS-C01 practice test

Last exam update: Apr 18 ,2024
Page 1 out of 13. Viewing questions 1-15 out of 186

Question 1

A Data Scientist is training a multilayer perception (MLP) on a dataset with multiple classes. The target class of interest is
unique compared to the other classes within the dataset, but it does not achieve and acceptable recall metric. The Data
Scientist has already tried varying the number and size of the MLPs hidden layers, which has not significantly improved the
results. A solution to improve recall must be implemented as quickly as possible.
Which techniques should be used to meet these requirements?

  • A. Gather more data using Amazon Mechanical Turk and then retrain
  • B. Train an anomaly detection model instead of an MLP
  • C. Train an XGBoost model instead of an MLP
  • D. Add class weights to the MLP’s loss function and then retrain
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 2

A Machine Learning Specialist is working with a large company to leverage machine learning within its products. The
company wants to group its customers into categories based on which customers will and will not churn within the next 6
months. The company has labeled the data available to the Specialist.
Which machine learning model type should the Specialist use to accomplish this task?

  • A. Linear regression
  • B. Classification
  • C. Clustering
  • D. Reinforcement learning
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
The goal of classification is to determine to which class or category a data point (customer in our case) belongs to. For
classification problems, data scientists would use historical data with predefined target variables AKA labels (churner/non-
churner) answers that need to be predicted to train an algorithm. With classification, businesses can answer the following
questions:
Will this customer churn or not?

Will a customer renew their subscription?

Will a user downgrade a pricing plan?

Are there any signs of unusual customer behavior?

Reference: https://www.kdnuggets.com/2019/05/churn-prediction-machine-learning.html

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 3

A company that manufactures mobile devices wants to determine and calibrate the appropriate sales price for its devices.
The company is collecting the relevant data and is determining data features that it can use to train machine learning (ML)
models. There are more than 1,000 features, and the company wants to determine the primary features that contribute to the
sales price.
Which techniques should the company use for feature selection? (Choose three.)

  • A. Data scaling with standardization and normalization
  • B. Correlation plot with heat maps
  • C. Data binning
  • D. Univariate selection
  • E. Feature importance with a tree-based classifier
  • F. Data augmentation
Answer:

C D F

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
F
50%

Explanation:
Reference: https://towardsdatascience.com/an-overview-of-data-preprocessing-features-enrichment-automatic-feature-
selection-60b0c12d75ad
https://towardsdatascience.com/feature-selection-using-python-for-classification-problem-
b5f00a1c7028#:~:text=Univariate%20feature%20selection%20works%20by,analysis%20of%20variance%20
(ANOVA).&text=That%20is%20why%20it%20is%20called%20'univariate' https://arxiv.org/abs/2101.04530

Discussions
vote your answer:
A
B
C
D
E
F
0 / 1000

Question 4

A telecommunications company is developing a mobile app for its customers. The company is using an Amazon SageMaker
hosted endpoint for machine learning model inferences.
Developers want to introduce a new version of the model for a limited number of users who subscribed to a preview feature
of the app. After the new version of the model is tested as a preview, developers will evaluate its accuracy. If a new version
of the model has better accuracy, developers need to be able to gradually release the new version for all users over a fixed
period of time.
How can the company implement the testing model with the LEAST amount of operational overhead?

  • A. Update the ProductionVariant data type with the new version of the model by using the CreateEndpointConfig operation with the InitialVariantWeight parameter set to 0. Specify the TargetVariant parameter for InvokeEndpoint calls for users who subscribed to the preview feature. When the new version of the model is ready for release, gradually increase InitialVariantWeight until all users have the updated version.
  • B. Configure two SageMaker hosted endpoints that serve the different versions of the model. Create an Application Load Balancer (ALB) to route traffic to both endpoints based on the TargetVariant query string parameter. Reconfigure the app to send the TargetVariant query string parameter for users who subscribed to the preview feature. When the new version of the model is ready for release, change the ALB's routing algorithm to weighted until all users have the updated version.
  • C. Update the DesiredWeightsAndCapacity data type with the new version of the model by using the UpdateEndpointWeightsAndCapacities operation with the DesiredWeight parameter set to 0. Specify the TargetVariant parameter for InvokeEndpoint calls for users who subscribed to the preview feature. When the new version of the model is ready for release, gradually increase DesiredWeight until all users have the updated version.
  • D. Configure two SageMaker hosted endpoints that serve the different versions of the model. Create an Amazon Route 53 record that is configured with a simple routing policy and that points to the current version of the model. Configure the mobile app to use the endpoint URL for users who subscribed to the preview feature and to use the Route 53 record for other users. When the new version of the model is ready for release, add a new model version endpoint to Route 53, and switch the policy to weighted until all users have the updated version.
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 5

A Machine Learning Specialist is assigned to a Fraud Detection team and must tune an XGBoost model, which is working
appropriately for test data. However, with unknown data, it is not working as expected. The existing parameters are provided
as follows.

Which parameter tuning guidelines should the Specialist follow to avoid overfitting?

  • A. Increase the max_depth parameter value.
  • B. Lower the max_depth parameter value.
  • C. Update the objective to binary:logistic.
  • D. Lower the min_child_weight parameter value.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 6

A company uses a long short-term memory (LSTM) model to evaluate the risk factors of a particular energy sector. The
model reviews multi-page text documents to analyze each sentence of the text and categorize it as either a potential risk or
no risk. The model is not performing well, even though the Data Scientist has experimented with many different network
structures and tuned the corresponding hyperparameters.
Which approach will provide the MAXIMUM performance boost?

  • A. Initialize the words by term frequency-inverse document frequency (TF-IDF) vectors pretrained on a large collection of news articles related to the energy sector.
  • B. Use gated recurrent units (GRUs) instead of LSTM and run the training process until the validation loss stops decreasing.
  • C. Reduce the learning rate and run the training process until the training loss stops decreasing.
  • D. Initialize the words by word2vec embeddings pretrained on a large collection of news articles related to the energy sector.
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 7

A Machine Learning Specialist has completed a proof of concept for a company using a small data sample, and now the
Specialist is ready to implement an end-to-end solution in AWS using Amazon SageMaker. The historical training data is
stored in Amazon RDS.
Which approach should the Specialist use for training a model using that data?

  • A. Write a direct connection to the SQL database within the notebook and pull data in
  • B. Push the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook.
  • C. Move the data to Amazon DynamoDB and set up a connection to DynamoDB within the notebook to pull data in.
  • D. Move the data to Amazon ElastiCache using AWS DMS and set up a connection within the notebook to pull data in for fast access.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 8

A Machine Learning Specialist is building a logistic regression model that will predict whether or not a person will order a
pizza. The Specialist is trying to build the optimal model with an ideal classification threshold.
What model evaluation technique should the Specialist use to understand how different classification thresholds will impact
the model's performance?

  • A. Receiver operating characteristic (ROC) curve
  • B. Misclassification rate
  • C. Root Mean Square Error (RMSE)
  • D. L1 norm
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Reference: https://docs.aws.amazon.com/machine-learning/latest/dg/binary-model-insights.html

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 9

The chief editor for a product catalog wants the research and development team to build a machine learning system that can
be used to detect whether or not individuals in a collection of images are wearing the company's retail brand. The team has
a set of training data.
Which machine learning algorithm should the researchers use that BEST meets their requirements?

  • A. Latent Dirichlet Allocation (LDA)
  • B. Recurrent neural network (RNN)
  • C. K-means
  • D. Convolutional neural network (CNN)
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 10

A data scientist uses an Amazon SageMaker notebook instance to conduct data exploration and analysis. This requires
certain Python packages that are not natively available on Amazon SageMaker to be installed on the notebook instance.
How can a machine learning specialist ensure that required packages are automatically available on the notebook instance
for the data scientist to use?

  • A. Install AWS Systems Manager Agent on the underlying Amazon EC2 instance and use Systems Manager Automation to execute the package installation commands.
  • B. Create a Jupyter notebook file (.ipynb) with cells containing the package installation commands to execute and place the file under the /etc/init directory of each Amazon SageMaker notebook instance.
  • C. Use the conda package manager from within the Jupyter notebook console to apply the necessary conda packages to the default kernel of the notebook.
  • D. Create an Amazon SageMaker lifecycle configuration with package installation commands and assign the lifecycle configuration to the notebook instance.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Reference: https://towardsdatascience.com/automating-aws-sagemaker-notebooks-2dec62bc2c84

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 11

A manufacturing company uses machine learning (ML) models to detect quality issues. The models use images that are
taken of the company's product at the end of each production step. The company has thousands of machines at the
production site that generate one image per second on average.
The company ran a successful pilot with a single manufacturing machine. For the pilot, ML specialists used an industrial PC
that ran AWS IoT Greengrass with a long-running AWS Lambda function that uploaded the images to Amazon S3. The
uploaded images invoked a Lambda function that was written in Python to perform inference by using an Amazon
SageMaker endpoint that ran a custom model. The inference results were forwarded back to a web service that was hosted
at the production site to prevent faulty products from being shipped.
The company scaled the solution out to all manufacturing machines by installing similarly configured industrial PCs on each
production machine. However, latency for predictions increased beyond acceptable limits. Analysis shows that the internet
connection is at its capacity limit.
How can the company resolve this issue MOST cost-effectively?

  • A. Set up a 10 Gbps AWS Direct Connect connection between the production site and the nearest AWS Region. Use the Direct Connect connection to upload the images. Increase the size of the instances and the number of instances that are used by the SageMaker endpoint.
  • B. Extend the long-running Lambda function that runs on AWS IoT Greengrass to compress the images and upload the compressed files to Amazon S3. Decompress the files by using a separate Lambda function that invokes the existing Lambda function to run the inference pipeline.
  • C. Use auto scaling for SageMaker. Set up an AWS Direct Connect connection between the production site and the nearest AWS Region. Use the Direct Connect connection to upload the images.
  • D. Deploy the Lambda function and the ML models onto the AWS IoT Greengrass core that is running on the industrial PCs that are installed on each machine. Extend the long-running Lambda function that runs on AWS IoT Greengrass to invoke the Lambda function with the captured images and run the inference on the edge component that forwards the results directly to the web service.
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 12

A retail company is selling products through a global online marketplace. The company wants to use machine learning (ML)
to analyze customer feedback and identify specific areas for improvement. A developer has built a tool that collects customer
reviews from the online marketplace and stores them in an Amazon S3 bucket. This process yields a dataset of 40 reviews.
A data scientist building the ML models must identify additional sources of data to increase the size of the dataset.
Which data sources should the data scientist use to augment the dataset of reviews? (Choose three.)

  • A. Emails exchanged by customers and the company’s customer service agents
  • B. Social media posts containing the name of the company or its products
  • C. A publicly available collection of news articles
  • D. A publicly available collection of customer reviews
  • E. Product sales revenue figures for the company
  • F. Instruction manuals for the company’s products
Answer:

B D F

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
F
50%
Discussions
vote your answer:
A
B
C
D
E
F
0 / 1000

Question 13

A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a machine learning
specialist will build a binary classifier based on two features: age of account, denoted by x, and transaction month, denoted
by y. The class distributions are illustrated in the provided figure. The positive class is portrayed in red, while the negative
class is portrayed in black.

Which model would have the HIGHEST accuracy?

  • A. Linear support vector machine (SVM)
  • B. Decision tree
  • C. Support vector machine (SVM) with a radial basis function kernel
  • D. Single perceptron with a Tanh activation function
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 14

A Data Scientist needs to create a serverless ingestion and analytics solution for high-velocity, real-time streaming data.
The ingestion process must buffer and convert incoming records from JSON to a query-optimized, columnar format without
data loss. The output datastore must be highly available, and Analysts must be able to run SQL queries against the data and
connect to existing business intelligence dashboards.
Which solution should the Data Scientist build to satisfy the requirements?

  • A. Create a schema in the AWS Glue Data Catalog of the incoming data format. Use an Amazon Kinesis Data Firehose delivery stream to stream the data and transform the data to Apache Parquet or ORC format using the AWS Glue Data Catalog before delivering to Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena, and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector.
  • B. Write each JSON record to a staging location in Amazon S3. Use the S3 Put event to trigger an AWS Lambda function that transforms the data into Apache Parquet or ORC format and writes the data to a processed data location in Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena, and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector.
  • C. Write each JSON record to a staging location in Amazon S3. Use the S3 Put event to trigger an AWS Lambda function that transforms the data into Apache Parquet or ORC format and inserts it into an Amazon RDS PostgreSQL database. Have the Analysts query and run dashboards from the RDS database.
  • D. Use Amazon Kinesis Data Analytics to ingest the streaming data and perform real-time SQL queries to convert the records to Apache Parquet before delivering to Amazon S3. Have the Analysts query the data directly from Amazon S3 using Amazon Athena and connect to BI tools using the Athena Java Database Connectivity (JDBC) connector.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 15

A company wants to predict the sale prices of houses based on available historical sales data. The target variable in the
companys dataset is the sale price. The features include parameters such as the lot size, living area measurements, non-
living area measurements, number of bedrooms, number of bathrooms, year built, and postal code. The company wants to
use multi-variable linear regression to predict house sale prices.
Which step should a machine learning specialist take to remove features that are irrelevant for the analysis and reduce the
models complexity?

  • A. Plot a histogram of the features and compute their standard deviation. Remove features with high variance.
  • B. Plot a histogram of the features and compute their standard deviation. Remove features with low variance.
  • C. Build a heatmap showing the correlation of the dataset against itself. Remove features with low mutual correlation scores.
  • D. Run a correlation check of all features against the target variable. Remove features with low target variable correlation scores.
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000
To page 2