You have an Azure data factory.
You need to examine the pipeline failures from the last 180 days.
What should you use?
Data Factory stores pipeline-run data for only 45 days. Use Azure Monitor if you want to keep that data for a longer time.
You manage an enterprise data warehouse in Azure Synapse Analytics.
Users report slow performance when they run commonly used queries. Users do not report performance changes for
infrequently used queries.
You need to monitor resource utilization to determine the source of the performance issues.
Which metric should you monitor?
Monitor and troubleshoot slow query performance by determining whether your workload is optimally leveraging the adaptive
cache for dedicated SQL pools.
You have an Azure Synapse Analytics dedicated SQL pool named Pool1 and a database named DB1. DB1 contains a fact
table named Table1.
You need to identify the extent of the data skew in Table1.
What should you do in Synapse Studio?
Microsoft recommends use of sys.dm_pdw_nodes_db_partition_stats to analyze any skewness in the data.
You have several Azure Data Factory pipelines that contain a mix of the following types of activities:
Wrangling data flow
Which two Azure services should you use to debug the activities? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point
You have an Azure Data Factory pipeline that has the activities shown in the following exhibit.
Use the drop-down menus to select the answer choice that completes each statement based on the information presented in
NOTE: Each correct selection is worth one point.
Box 1: succeed
Box 2: failed Example:
Now lets say we have a pipeline with 3 activities, where Activity1 has a success path to Activity2 and a failure path to
Activity3. If Activity1 fails and Activity3 succeeds, the pipeline will fail. The presence of the success path alongside the failure
path changes the outcome reported by the pipeline, even though the activity executions from the pipeline are the same as
the previous scenario.
Activity1 fails, Activity2 is skipped, and Activity3 succeeds. The pipeline reports failure.
You have two fact tables named Flight and Weather. Queries targeting the tables will be based on the join between the
You need to recommend a solution that maximizes query performance.
What should you include in the recommendation?
Hash-distribution improves query performance on large fact tables.
A: Do not use a date column for hash distribution. All data for the same date lands in the same distribution. If several users
are all filtering on the same date, then only 1 of the 60 distributions do all the processing work.
You have an Azure Synapse Analytics dedicated SQL pool.
You run PDW_SHOWSPACEUSED('dbo.FactInternetSales'); and get the results shown in the following table.
Which statement accurately describes the dbo.FactInternetSales table?
Data skew means the data is not distributed evenly across the distributions.
You configure monitoring for an Azure Synapse Analytics implementation. The implementation uses PolyBase to load data
from comma-separated value (CSV) files stored in Azure Data Lake Storage Gen2 using an external table.
Files with an invalid schema cause errors to occur.
You need to monitor for an invalid schema error.
For which error should you monitor?
Error message: Cannot execute the query "Remote Query"
The reason this error happens is because each file has different schema. The PolyBase external table DDL when pointed to
a directory recursively reads all the files in that directory. When a column or data type mismatch happens, this error could be
seen in SSMS.
You are designing a highly available Azure Data Lake Storage solution that will include geo-zone-redundant storage (GZRS).
You need to monitor for replication delays that can affect the recovery point objective (RPO).
What should you include in the monitoring solution?
Because geo-replication is asynchronous, it is possible that data written to the primary region has not yet been written to the
secondary region at the time an outage occurs. The Last Sync Time property indicates the last time that data from the
primary region was written successfully to the secondary region. All writes made to the primary region before the last sync
time are available to be read from the secondary location. Writes made to the primary region after the last sync time property
may or may not be available for reads yet.
You have an Azure Databricks resource.
You need to log actions that relate to changes in compute for the Databricks resource.
Which Databricks services should you log?
Databricks provides access to audit logs of activities performed by Databricks users, allowing your enterprise to monitor
detailed Databricks usage patterns.
There are two types of logs:
Workspace-level audit logs with workspace-level events. Account-level audit logs with account-level events.