microsoft dp-203 practice test

Data Engineering on Microsoft Azure

Note: Test Case questions are at the end of the exam
Last exam update: Apr 17 ,2024
Page 1 out of 15. Viewing questions 1-15 out of 212

Question 1 Topic 3, Mixed Questions

You have an Azure data factory.
You need to examine the pipeline failures from the last 180 days.
What should you use?

  • A. the Activity log blade for the Data Factory resource
  • B. Pipeline runs in the Azure Data Factory user experience
  • C. the Resource health blade for the Data Factory resource
  • D. Azure Data Factory activity runs in Azure Monitor
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Data Factory stores pipeline-run data for only 45 days. Use Azure Monitor if you want to keep that data for a longer time.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/monitor-using-azure-monitor

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 2 Topic 3, Mixed Questions

You manage an enterprise data warehouse in Azure Synapse Analytics.
Users report slow performance when they run commonly used queries. Users do not report performance changes for
infrequently used queries.
You need to monitor resource utilization to determine the source of the performance issues.
Which metric should you monitor?

  • A. Local tempdb percentage
  • B. Cache used percentage
  • C. Data IO percentage
  • D. CPU percentage
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Monitor and troubleshoot slow query performance by determining whether your workload is optimally leveraging the adaptive
cache for dedicated SQL pools.
Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-how-to-
monitor-cache

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 3 Topic 3, Mixed Questions

You have an Azure Synapse Analytics dedicated SQL pool named Pool1 and a database named DB1. DB1 contains a fact
table named Table1.
You need to identify the extent of the data skew in Table1.
What should you do in Synapse Studio?

  • A. Connect to the built-in pool and run sys.dm_pdw_nodes_db_partition_stats.
  • B. Connect to Pool1 and run DBCC CHECKALLOC.
  • C. Connect to the built-in pool and run DBCC CHECKALLOC.
  • D. Connect to Pool1 and query sys.dm_pdw_nodes_db_partition_stats.
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Microsoft recommends use of sys.dm_pdw_nodes_db_partition_stats to analyze any skewness in the data.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-partition-stats-
transact-sql https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/cheat-sheet

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 4 Topic 3, Mixed Questions

You have several Azure Data Factory pipelines that contain a mix of the following types of activities:
Wrangling data flow

Notebook

Copy Jar


Which two Azure services should you use to debug the activities? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point

  • A. Azure Synapse Analytics
  • B. Azure HDInsight
  • C. Azure Machine Learning
  • D. Azure Data Factory
  • E. Azure Databricks
Answer:

A C

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 5 Topic 3, Mixed Questions

HOTSPOT
You have an Azure Data Factory pipeline that has the activities shown in the following exhibit.

Use the drop-down menus to select the answer choice that completes each statement based on the information presented in
the graphic.
NOTE: Each correct selection is worth one point.
Hot Area:

Answer:


Explanation:
Box 1: succeed
Box 2: failed Example:
Now lets say we have a pipeline with 3 activities, where Activity1 has a success path to Activity2 and a failure path to
Activity3. If Activity1 fails and Activity3 succeeds, the pipeline will fail. The presence of the success path alongside the failure
path changes the outcome reported by the pipeline, even though the activity executions from the pipeline are the same as
the previous scenario.

Activity1 fails, Activity2 is skipped, and Activity3 succeeds. The pipeline reports failure.
Reference:
https://datasavvy.me/2021/02/18/azure-data-factory-activity-failures-and-pipeline-outcomes/

Discussions
0 / 1000

Question 6 Topic 3, Mixed Questions

You have two fact tables named Flight and Weather. Queries targeting the tables will be based on the join between the
following columns.

You need to recommend a solution that maximizes query performance.
What should you include in the recommendation?

  • A. In the tables use a hash distribution of ArrivalDateTime and ReportDateTime.
  • B. In the tables use a hash distribution of ArrivalAirportID and AirportID.
  • C. In each table, create an IDENTITY column.
  • D. In each table, create a column as a composite of the other two columns in the table.
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Hash-distribution improves query performance on large fact tables.
Incorrect Answers:
A: Do not use a date column for hash distribution. All data for the same date lands in the same distribution. If several users
are all filtering on the same date, then only 1 of the 60 distributions do all the processing work.

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 7 Topic 3, Mixed Questions

You have an Azure Synapse Analytics dedicated SQL pool.
You run PDW_SHOWSPACEUSED('dbo.FactInternetSales'); and get the results shown in the following table.

Which statement accurately describes the dbo.FactInternetSales table?

  • A. All distributions contain data.
  • B. The table contains less than 10,000 rows.
  • C. The table uses round-robin distribution.
  • D. The table is skewed.
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Data skew means the data is not distributed evenly across the distributions.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 8 Topic 3, Mixed Questions

You configure monitoring for an Azure Synapse Analytics implementation. The implementation uses PolyBase to load data
from comma-separated value (CSV) files stored in Azure Data Lake Storage Gen2 using an external table.
Files with an invalid schema cause errors to occur.
You need to monitor for an invalid schema error.
For which error should you monitor?

  • A. EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [com.microsoft.polybase.client.KerberosSecureLogin] occurred while accessing external file.'
  • B. Cannot execute the query "Remote Query" against OLE DB provider "SQLNCLI11" for linked server "(null)". Query aborted- the maximum reject threshold (0 rows) was reached while reading from an external source: 1 rows rejected out of total 1 rows processed.
  • C. EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [Unable to instantiate LoginClass] occurred while accessing external file.'
  • D. EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_Connect: Error [No FileSystem for scheme: wasbs] occurred while accessing external file.'
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Error message: Cannot execute the query "Remote Query"
Possible Reason:
The reason this error happens is because each file has different schema. The PolyBase external table DDL when pointed to
a directory recursively reads all the files in that directory. When a column or data type mismatch happens, this error could be
seen in SSMS.
Reference:
https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-errors-and-possible-solutions

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 9 Topic 3, Mixed Questions

You are designing a highly available Azure Data Lake Storage solution that will include geo-zone-redundant storage (GZRS).
You need to monitor for replication delays that can affect the recovery point objective (RPO).
What should you include in the monitoring solution?

  • A. 5xx: Server Error errors
  • B. Average Success E2E Latency
  • C. availability
  • D. Last Sync Time
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Because geo-replication is asynchronous, it is possible that data written to the primary region has not yet been written to the
secondary region at the time an outage occurs. The Last Sync Time property indicates the last time that data from the
primary region was written successfully to the secondary region. All writes made to the primary region before the last sync
time are available to be read from the secondary location. Writes made to the primary region after the last sync time property
may or may not be available for reads yet.
Reference:
https://docs.microsoft.com/en-us/azure/storage/common/last-sync-time-get

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 10 Topic 3, Mixed Questions

You have an Azure Databricks resource.
You need to log actions that relate to changes in compute for the Databricks resource.
Which Databricks services should you log?

  • A. clusters
  • B. workspace
  • C. DBFS
  • D. SSH
  • E. jobs
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%

Explanation:
Databricks provides access to audit logs of activities performed by Databricks users, allowing your enterprise to monitor
detailed Databricks usage patterns.
There are two types of logs:
Workspace-level audit logs with workspace-level events. Account-level audit logs with account-level events.


Reference:
https://docs.databricks.com/administration-guide/account-settings/audit-logs.html

Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 11 Topic 3, Mixed Questions

You manage an enterprise data warehouse in Azure Synapse Analytics.
Users report slow performance when they run commonly used queries. Users do not report performance changes for
infrequently used queries.
You need to monitor resource utilization to determine the source of the performance issues.
Which metric should you monitor?

  • A. DWU percentage
  • B. Cache hit percentage
  • C. DWU limit
  • D. Data IO percentage
Answer:

B

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
Monitor and troubleshoot slow query performance by determining whether your workload is optimally leveraging the adaptive
cache for dedicated SQL pools.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-how-to-monitor-cache

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 12 Topic 3, Mixed Questions

You have a SQL pool in Azure Synapse.
A user reports that queries against the pool take longer than expected to complete. You determine that the issue relates to
queried columnstore segments.
You need to add monitoring to the underlying storage to help diagnose the issue.
Which two metrics should you monitor? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

  • A. Snapshot Storage Size
  • B. Cache used percentage
  • C. DWU Limit
  • D. Cache hit percentage
Answer:

B D

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
D: Cache hit percentage: (cache hits / cache miss) * 100 where cache hits is the sum of all columnstore segments hits in the
local SSD cache and cache miss is the columnstore segments misses in the local SSD cache summed across all nodes
B: (cache used / cache capacity) * 100 where cache used is the sum of all bytes in the local SSD cache across all nodes and
cache capacity is the sum of the storage capacity of the local SSD cache across all nodes
Incorrect Asnwers:
C: DWU limit: Service level objective of the data warehouse.
Reference: https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-concept-
resource-utilization-query-activity

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 13 Topic 3, Mixed Questions

A company purchases IoT devices to monitor manufacturing machinery. The company uses an Azure IoT Hub to
communicate with the IoT devices.
The company must be able to monitor the devices in real-time.
You need to design the solution.
What should you recommend?

  • A. Azure Analysis Services using Azure Portal
  • B. Azure Analysis Services using Azure PowerShell
  • C. Azure Stream Analytics cloud job using Azure Portal
  • D. Azure Data Factory instance using Microsoft Visual Studio
Answer:

C

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
In a real-world scenario, you could have hundreds of these sensors generating events as a stream. Ideally, a gateway device
would run code to push these events to Azure Event Hubs or Azure IoT Hubs. Your Stream Analytics job would ingest these
events from Event Hubs and run real-time analytics queries against the streams.
Create a Stream Analytics job:
In the Azure portal, select + Create a resource from the left navigation menu. Then, select Stream Analytics job from
Analytics.
Reference:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-get-started-with-azure-stream-analytics-to-
process-data-from-iot-devices

Discussions
vote your answer:
A
B
C
D
0 / 1000

Question 14 Topic 3, Mixed Questions

You are designing a star schema for a dataset that contains records of online orders. Each record includes an order date, an
order due date, and an order ship date.
You need to ensure that the design provides the fastest query times of the records when querying for arbitrary date ranges
and aggregating by fiscal calendar attributes.
Which two actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

  • A. Create a date dimension table that has a DateTime key.
  • B. Use built-in SQL functions to extract date attributes.
  • C. Create a date dimension table that has an integer key in the format of YYYYMMDD.
  • D. In the fact table, use integer columns for the date fields.
  • E. Use DateTime columns for the date fields.
Answer:

B D

User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
vote your answer:
A
B
C
D
E
0 / 1000

Question 15 Topic 3, Mixed Questions

You are designing an inventory updates table in an Azure Synapse Analytics dedicated SQL pool. The table will have a
clustered columnstore index and will include the following columns:

You identify the following usage patterns:
Analysts will most commonly analyze transactions for a warehouse.

Queries will summarize by product category type, date, and/or inventory event type.

You need to recommend a partition strategy for the table to minimize query times.
On which column should you partition the table?

  • A. EventTypeID
  • B. ProductCategoryTypeID
  • C. EventDate
  • D. WarehouseID
Answer:

D

User Votes:
A
50%
B
50%
C
50%
D
50%

Explanation:
The number of records for each warehouse is big enough for a good partitioning.
Note: Table partitions enable you to divide your data into smaller groups of data. In most cases, table partitions are created
on a date column.
When creating partitions on clustered columnstore tables, it is important to consider how many rows belong to each partition.
For optimal compression and performance of clustered columnstore tables, a minimum of 1 million rows per distribution and
partition is needed. Before partitions are created, dedicated SQL pool already divides each table into 60 distributed
databases.

Discussions
vote your answer:
A
B
C
D
0 / 1000
To page 2