microsoft dp-203 practice test

Data Engineering on Microsoft Azure

Note: This exam has case studies

Question 1 Topic 3, Mixed Questions

You have an Azure data factory.
You need to examine the pipeline failures from the last 180 days.
What should you use?

  • A. the Activity log blade for the Data Factory resource
  • B. Pipeline runs in the Azure Data Factory user experience
  • C. the Resource health blade for the Data Factory resource
  • D. Azure Data Factory activity runs in Azure Monitor
Answer:

D

Explanation:
Data Factory stores pipeline-run data for only 45 days. Use Azure Monitor if you want to keep that data for a longer time.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-azure-monitor

Discussions

Question 2 Topic 3, Mixed Questions

You manage an enterprise data warehouse in Azure Synapse Analytics.
Users report slow performance when they run commonly used queries. Users do not report performance changes for
infrequently used queries.
You need to monitor resource utilization to determine the source of the performance issues.
Which metric should you monitor?

  • A. Local tempdb percentage
  • B. Cache used percentage
  • C. Data IO percentage
  • D. CPU percentage
Answer:

B

Explanation:
Monitor and troubleshoot slow query performance by determining whether your workload is optimally leveraging the adaptive
cache for dedicated SQL pools.
Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-how-to-
monitor-cache

Discussions

Question 3 Topic 3, Mixed Questions

You have an Azure Synapse Analytics dedicated SQL pool named Pool1 and a database named DB1. DB1 contains a fact
table named Table1.
You need to identify the extent of the data skew in Table1.
What should you do in Synapse Studio?

  • A. Connect to the built-in pool and run sys.dm_pdw_nodes_db_partition_stats.
  • B. Connect to Pool1 and run DBCC CHECKALLOC.
  • C. Connect to the built-in pool and run DBCC CHECKALLOC.
  • D. Connect to Pool1 and query sys.dm_pdw_nodes_db_partition_stats.
Answer:

D

Explanation:
Microsoft recommends use of sys.dm_pdw_nodes_db_partition_stats to analyze any skewness in the data.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-partition-stats-
transact-sql https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/cheat-sheet

Discussions

Question 4 Topic 3, Mixed Questions

You have several Azure Data Factory pipelines that contain a mix of the following types of activities:
Wrangling data flow

Notebook

Copy Jar


Which two Azure services should you use to debug the activities? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point

  • A. Azure Synapse Analytics
  • B. Azure HDInsight
  • C. Azure Machine Learning
  • D. Azure Data Factory
  • E. Azure Databricks
Answer:

A C

Discussions

Question 5 Topic 3, Mixed Questions

HOTSPOT
You have an Azure Data Factory pipeline that has the activities shown in the following exhibit.

Use the drop-down menus to select the answer choice that completes each statement based on the information presented in
the graphic.
NOTE: Each correct selection is worth one point.
Hot Area:

Answer:

Explanation:
Box 1: succeed
Box 2: failed Example:
Now lets say we have a pipeline with 3 activities, where Activity1 has a success path to Activity2 and a failure path to
Activity3. If Activity1 fails and Activity3 succeeds, the pipeline will fail. The presence of the success path alongside the failure
path changes the outcome reported by the pipeline, even though the activity executions from the pipeline are the same as
the previous scenario.

Activity1 fails, Activity2 is skipped, and Activity3 succeeds. The pipeline reports failure.
Reference:
https://datasavvy.me/2021/02/18/azure-data-factory-activity-failures-and-pipeline-outcomes/

Discussions

Question 6 Topic 3, Mixed Questions

You have two fact tables named Flight and Weather. Queries targeting the tables will be based on the join between the
following columns.

You need to recommend a solution that maximizes query performance.
What should you include in the recommendation?

  • A. In the tables use a hash distribution of ArrivalDateTime and ReportDateTime.
  • B. In the tables use a hash distribution of ArrivalAirportID and AirportID.
  • C. In each table, create an IDENTITY column.
  • D. In each table, create a column as a composite of the other two columns in the table.
Answer:

B

Explanation:
Hash-distribution improves query performance on large fact tables.
Incorrect Answers:
A: Do not use a date column for hash distribution. All data for the same date lands in the same distribution. If several users
are all filtering on the same date, then only 1 of the 60 distributions do all the processing work.

Discussions

Question 7 Topic 3, Mixed Questions

You have an Azure Synapse Analytics dedicated SQL pool.
You run PDW_SHOWSPACEUSED('dbo.FactInternetSales'); and get the results shown in the following table.

Which statement accurately describes the dbo.FactInternetSales table?

  • A. All distributions contain data.
  • B. The table contains less than 10,000 rows.
  • C. The table uses round-robin distribution.
  • D. The table is skewed.
Answer:

D

Explanation:
Data skew means the data is not distributed evenly across the distributions.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute

Discussions

Question 8 Topic 3, Mixed Questions

You configure monitoring for an Azure Synapse Analytics implementation. The implementation uses PolyBase to load data
from comma-separated value (CSV) files stored in Azure Data Lake Storage Gen2 using an external table.
Files with an invalid schema cause errors to occur.
You need to monitor for an invalid schema error.
For which error should you monitor?

  • A. EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [com.microsoft.polybase.client.KerberosSecureLogin] occurred while accessing external file.'
  • B. Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". Query aborted- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
  • C. EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [Unable to instantiate LoginClass] occurred while accessing external file.'
  • D. EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [No FileSystem for scheme: wasbs] occurred while accessing external file.'
Answer:

B

Explanation:
Error message: Cannot execute the query "Remote Query"
Possible Reason:
The reason this error happens is because each file has different schema. The PolyBase external table DDL when pointed to
a directory recursively reads all the files in that directory. When a column or data type mismatch happens, this error could be
seen in SSMS.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-errors-and-possible-solutions

Discussions

Question 9 Topic 3, Mixed Questions

You are designing a highly available Azure Data Lake Storage solution that will include geo-zone-redundant storage (GZRS).
You need to monitor for replication delays that can affect the recovery point objective (RPO).
What should you include in the monitoring solution?

  • A. 5xx: Server Error errors
  • B. Average Success E2E Latency
  • C. availability
  • D. Last Sync Time
Answer:

D

Explanation:
Because geo-replication is asynchronous, it is possible that data written to the primary region has not yet been written to the
secondary region at the time an outage occurs. The Last Sync Time property indicates the last time that data from the
primary region was written successfully to the secondary region. All writes made to the primary region before the last sync
time are available to be read from the secondary location. Writes made to the primary region after the last sync time property
may or may not be available for reads yet.
Reference:
https://docs.microsoft.com/en-us/azure/storage/common/last-sync-time-get

Discussions

Question 10 Topic 3, Mixed Questions

You have an Azure Databricks resource.
You need to log actions that relate to changes in compute for the Databricks resource.
Which Databricks services should you log?

  • A. clusters
  • B. workspace
  • C. DBFS
  • D. SSH
  • E. jobs
Answer:

B

Explanation:
Databricks provides access to audit logs of activities performed by Databricks users, allowing your enterprise to monitor
detailed Databricks usage patterns.
There are two types of logs:
Workspace-level audit logs with workspace-level events. Account-level audit logs with account-level events.


Reference:
https://docs.databricks.com/administration-guide/account-settings/audit-logs.html

Discussions
To page 2