google google professional cloud devops engineer practice test

Professional Cloud DevOps Engineer

Last exam update: Apr 17 ,2024
Page 1 out of 6. Viewing questions 1-15 out of 81

Question 1

Your company experiences bugs, outages, and slowness in its production systems. Developers use the production
environment for new feature development and bug fixes. Configuration and experiments are done in the production
environment, causing outages for users. Testers use the production environment for load testing, which often slows the
production systems. You need to redesign the environment to reduce the number of bugs and outages in production and to
enable testers to toad test new features. What should you do?

  • A. Create an automated testing script in production to detect failures as soon as they occur.
  • B. Create a development environment with smaller server capacity and give access only to developers and testers.
  • C. Secure the production environment to ensure that developers can't change it and set up one controlled update per year.
  • D. Create a development environment for writing code and a test environment for configurations, experiments, and load testing.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 2

You support a trading application written in Python and hosted on App Engine flexible environment. You want to customize
the error information being sent to Stackdriver Error Reporting. What should you do?

  • A. Install the Stackdriver Error Reporting library for Python, and then run your code on a Compute Engine VM.
  • B. Install the Stackdriver Error Reporting library for Python, and then run your code on Google Kubernetes Engine.
  • C. Install the Stackdriver Error Reporting library for Python, and then run your code on App Engine flexible environment.
  • D. Use the Stackdriver Error Reporting API to write errors from your application to ReportedErrorEvent, and then generate log entries with properly formatted error messages in Stackdriver Logging.
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
References: https://cloud.google.com/error-reporting/docs/setup/app-engine-flexible-environment

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 3

You need to define Service Level Objectives (SLOs) for a high-traffic multi-region web application. Customers expect the
application to always be available and have fast response times. Customers are currently happy with the application
performance and availability. Based on current measurement, you observe that the 90th percentile of latency is 120ms and
the 95th percentile of latency is 275ms over a 28-day window. What latency SLO would you recommend to the team to
publish?

  • A. 90th percentile – 100ms 95th percentile – 250ms
  • B. 90th percentile – 120ms 95th percentile – 275ms
  • C. 90th percentile – 150ms 95th percentile – 300ms
  • D. 90th percentile – 250ms 95th percentile – 400ms
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 4

You support a high-traffic web application that runs on Google Cloud Platform (GCP). You need to measure application
reliability from a user perspective without making any engineering changes to it.
What should you do? (Choose two.)

  • A. Review current application metrics and add new ones as needed.
  • B. Modify the code to capture additional information for user interaction.
  • C. Analyze the web proxy logs only and capture response time of each request.
  • D. Create new synthetic clients to simulate a user journey using the application.
  • E. Use current and historic Request Logs to trace customer interaction with the application.
Answer:

B D

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 5

You are managing an application that exposes an HTTP endpoint without using a load balancer. The latency of the HTTP
responses is important for the user experience. You want to understand what HTTP latencies all of your users are
experiencing. You use Stackdriver Monitoring. What should you do?

  • A. • In your application, create a metric with a metricKind set to DELTA and a valueType set to DOUBLE. • In Stackdriver’s Metrics Explorer, use a Stacked Bar graph to visualize the metric.
  • B. • In your application, create a metric with a metricKind set to CUMULATIVE and a valueType set to DOUBLE. • In Stackdriver’s Metrics Explorer, use a Line graph to visualize the metric.
  • C. • In your application, create a metric with a metricKind set to GAUGE and a valueType set to DISTRIBUTION. • In Stackdriver’s Metrics Explorer, use a Heatmap graph to visualize the metric.
  • D. In your application, create a metric with a metricKind set to METRIC_KIND_UNSPECIFIED and a valueType set to INT64. In Stackdrivers Metrics Explorer, use a Stacked Area graph to visualize the metric.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 6

You are performing a semi-annual capacity planning exercise for your flagship service. You expect a service user growth
rate of 10% month-over-month over the next six months. Your service is fully containerized and runs on Google Cloud
Platform (GCP), using a Google Kubernetes Engine (GKE) Standard regional cluster on three zones with cluster autoscaler
enabled. You currently consume about 30% of your total deployed CPU capacity, and you require resilience against the
failure of a zone. You want to ensure that your users experience minimal negative impact as a result of this growth or as a
result of zone failure, while avoiding unnecessary costs. How should you prepare to handle the predicted growth?

  • A. Verify the maximum node pool size, enable a horizontal pod autoscaler, and then perform a load test to verify your expected resource needs.
  • B. Because you are deployed on GKE and are using a cluster autoscaler, your GKE cluster will scale automatically, regardless of growth rate.
  • C. Because you are at only 30% utilization, you have significant headroom and you wont need to add any additional capacity for this rate of growth.
  • D. Proactively add 60% more node capacity to account for six months of 10% growth rate, and then perform a load test to make sure you have enough capacity.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 7

You support a web application that is hosted on Compute Engine. The application provides a booking service for thousands
of users. Shortly after the release of a new feature, your monitoring dashboard shows that all users are experiencing latency
at login. You want to mitigate the impact of the incident on the users of your service. What should you do first?

  • A. Roll back the recent release.
  • B. Review the Stackdriver monitoring.
  • C. Upsize the virtual machines running the login services.
  • D. Deploy a new release to see whether it fixes the problem.
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 8

You support a large service with a well-defined Service Level Objective (SLO). The development team deploys new releases
of the service multiple times a week. If a major incident causes the service to miss its SLO, you want the development team
to shift its focus from working on features to improving service reliability. What should you do before a major incident occurs?

  • A. Develop an appropriate error budget policy in cooperation with all service stakeholders.
  • B. Negotiate with the product team to always prioritize service reliability over releasing new features.
  • C. Negotiate with the development team to reduce the release frequency to no more than once a week.
  • D. Add a plugin to your Jenkins pipeline that prevents new releases whenever your service is out of SLO.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 9

You support a high-traffic web application and want to ensure that the home page loads in a timely manner. As a first step,
you decide to implement a Service Level Indicator (SLI) to represent home page request latency with an acceptable page
load time set to 100 ms. What is the Google-recommended way of calculating this SLI?

  • A. Bucketize the request latencies into ranges, and then compute the percentile at 100 ms.
  • B. Bucketize the request latencies into ranges, and then compute the median and 90th percentiles.
  • C. Count the number of home page requests that load in under 100 ms, and then divide by the total number of home page requests.
  • D. Count the number of home page request that load in under 100 ms, and then divide by the total number of all web application requests.
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Reference: https://sre.google/workbook/implementing-slos/

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 10

You encountered a major service outage that affected all users of the service for multiple hours. After several hours of
incident management, the service returned to normal, and user access was restored.
You need to provide an incident summary to relevant stakeholders following the Site Reliability Engineering recommended
practices. What should you do first?

  • A. Call individual stakeholders to explain what happened.
  • B. Develop a post-mortem to be distributed to stakeholders.
  • C. Send the Incident State Document to all the stakeholders.
  • D. Require the engineer responsible to write an apology email to all stakeholders.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 11

Your product is currently deployed in three Google Cloud Platform (GCP) zones with your users divided between the zones.
You can fail over from one zone to another, but it causes a 10-minute service disruption for the affected users. You typically
experience a database failure once per quarter and can detect it within five minutes. You are cataloging the reliability risks of
a new real-time chat feature for your product. You catalog the following information for each risk:
Mean Time to Detect (MTTD) in minutes
Mean Time to Repair (MTTR) in minutes
Mean Time Between Failure (MTBF) in days
User Impact Percentage
The chat feature requires a new database system that takes twice as long to successfully fail over between zones. You want
to account for the risk of the new database failing in one zone. What would be the values for the risk of database failover with
the new system?

  • A. MTTD: 5 MTTR: 10 MTBF: 90 Impact: 33%
  • B. MTTD: 5 MTTR: 20 MTBF: 90 Impact: 33%
  • C. MTTD: 5 MTTR: 10 MTBF: 90 Impact: 50%
  • D. MTTD: 5 MTTR: 20 MTBF: 90 Impact: 50%
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 12

You are responsible for the reliability of a high-volume enterprise application. A large number of users report that an
important subset of the applications functionality a data intensive reporting feature is consistently failing with an HTTP
500 error. When you investigate your applications dashboards, you notice a strong correlation between the failures and a
metric that represents the size of an internal queue used for generating reports. You trace the failures to a reporting backend
that is experiencing high I/O wait times. You quickly fix the issue by resizing the backends persistent disk (PD). How you
need to create an availability Service Level Indicator (SLI) for the report generation feature. How would you define it?

  • A. As the I/O wait times aggregated across all report generation backends
  • B. As the proportion of report generation requests that result in a successful response
  • C. As the application’s report generation queue size compared to a known-good threshold
  • D. As the reporting backend PD throughout capacity compared to a known-good threshold
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 13

You are running an application on Compute Engine and collecting logs through Stackdriver. You discover that some
personally identifiable information (PII) is leaking into certain log entry fields. All PII entries begin with the text userinfo. You
want to capture these log entries in a secure location for later review and prevent them from leaking to Stackdriver Logging.
What should you do?

  • A. Create a basic log filter matching userinfo, and then configure a log export in the Stackdriver console with Cloud Storage as a sink.
  • B. Use a Fluentd filter plugin with the Stackdriver Agent to remove log entries containing userinfo, and then copy the entries to a Cloud Storage bucket.
  • C. Create an advanced log filter matching userinfo, configure a log export in the Stackdriver console with Cloud Storage as a sink, and then configure a log exclusion with userinfo as a filter.
  • D. Use a Fluentd filter plugin with the Stackdriver Agent to remove log entries containing userinfo, create an advanced log filter matching userinfo, and then configure a log export in the Stackdriver console with Cloud Storage as a sink.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 14

You are part of an organization that follows SRE practices and principles. You are taking over the management of a new
service from the Development Team, and you conduct a Production Readiness Review (PRR). After the PRR analysis
phase, you determine that the service cannot currently meet its Service Level Objectives (SLOs). You want to ensure that
the service can meet its SLOs in production. What should you do next?

  • A. Adjust the SLO targets to be achievable by the service so you can bring it into production.
  • B. Notify the development team that they will have to provide production support for the service.
  • C. Identify recommended reliability improvements to the service to be completed before handover.
  • D. Bring the service into production with no SLOs and build them when you have collected operational data.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 15

You encounter a large number of outages in the production systems you support. You receive alerts for all the outages that
wake you up at night. The alerts are due to unhealthy systems that are automatically restarted within a minute. You want to
set up a process that would prevent staff burnout while following Site Reliability Engineering practices. What should you do?

  • A. Eliminate unactionable alerts.
  • B. Create an incident report for each of the alerts.
  • C. Distribute the alerts to engineers in different time zones.
  • D. Redefine the related Service Level Objective so that the error budget is not exhausted.
Answer:

A

User Votes:
A
50%
B
50%
C
50%
D
50%
Discussions
vote your answer:
A
B
C
D
0 / 1000
To page 2