databricks certified associate developer for apache spark practice test
certified associate developer for apache spark
Last exam update: Sep 07 ,2024
Page 1 out of 11. Viewing questions 1-10 out of 102
Question 1
The code block shown below should return a new DataFrame with the mean of column sqft from DataFrame storesDF in column sqftMean. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
storesDF.__1__(__2__(__3__).alias(sqftMean))
A.
1. agg2. mean3. col("sqft")
B.
1. withColumn2. mean3. col("sqft")
C.
1. agg2. average3. col("sqft")
D.
1. mean2. col3. "sqft"
E.
1. agg2. mean3. "sqft"
Answer:
a
User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
0/ 1000
Question 2
The code block shown below should return a new DataFrame from DataFrame storesDF where column modality is the constant string PHYSICAL, Assume DataFrame storesDF is the only defined language variable. Choose the response that correctly fills in the numbered blanks within the code block to complete this task. Code block: storesDF. _1_(_2_,_3_(_4_))
A.
1. withColumn2. "modality"3. col4. "PHYSICAL"
B.
1. withColumn2. "modality"3. lit4. PHYSICAL
C.
1. withColumn2. "modality"3. lit4. "PHYSICAL"
D.
1. withColumn2. "modality"3. SrtringType4. "PHYSICAL"
E.
1. newColumn2. modality3. SrtringType4. PHYSICAL
Answer:
c
User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
0/ 1000
Question 3
Which of the following statements about Sparks stability is incorrect?
A.
Spark is designed to support the loss of any set of worker nodes.
B.
Spark will rerun any failed tasks due to failed worker nodes.
C.
Spark will recompute data cached on failed worker nodes.
D.
Spark will spill data to disk if it does not fit in memory.
E.
Spark will reassign the driver to a worker node if the drivers node fails.
Answer:
c
User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
0/ 1000
Question 4
Which of the following operations returns a GroupedData object?
B.
DataFrame.cubed()
C.
DataFrame.group()
D.
DataFrame.groupBy()
E.
DataFrame.grouping_id()
Answer:
d
User Votes:
B
50%
C
50%
D
50%
E
50%
Discussions
0/ 1000
Question 5
Which of the following code blocks returns a DataFrame containing a column dayOfYear, an integer representation of the day of the year from column openDate from DataFrame storesDF? Note that column openDate is of type integer and represents a date in the UNIX epoch format the number of seconds since midnight on January 1st, 1970. A sample of storesDF is displayed below:
A.
(storesDF.withColumn("openTimestamp", col("openDate").cast("Timestamp")). withColumn("dayOfYear", dayofyear(col("openTimestamp"))))
B.
storesDF.withColumn("dayOfYear", get dayofyear(col("openDate")))
C.
storesDF.withColumn("dayOfYear", dayofyear(col("openDate")))
D.
(storesDF.withColumn("openDateFormat", col("openDate").cast("Date")). withColumn("dayOfYear", dayofyear(col("openDateFormat"))))
E.
storesDF.withColumn("dayOfYear", substr(col("openDate"), 4, 6))
Answer:
c
User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
0/ 1000
Question 6
Which of the following statements about Spark jobs is incorrect?
A.
Jobs are broken down into stages.
B.
There are multiple tasks within a single job when a DataFrame has more than one partition.
C.
Jobs are collections of tasks that are divided up based on when an action is called.
D.
There is no way to monitor the progress of a job.
E.
Jobs are collections of tasks that are divided based on when language variables are defined.
Answer:
d
User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
0/ 1000
Question 7
The code block shown below should return a new DataFrame where column division from DataFrame storesDF has been renamed to column state and column managerName from DataFrame storesDF has been renamed to column managerFullName. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
storesDF. __1__(__2__, __2__).__4__(__5__, __6__)
A.
1. withColumnRenamed2."state"3."division"4.withColumnRenamed5."managerFullName"6."managerName"
B.
1.withColumnRenamed2.division3.col("state")4. withColumnRenamed5."managerName"6.col("managerFullName")
C.
1. WithColumnRenamed2. "division"3."state"4. withColumnRenamed5. "managerName"6."managerFullName"
D.
1. withColumn2. "division"3. "state"4.withcolumn5."managerName"6."managerFullName
E.
1. withColumn2. "division"3. "state"4. withColumn5."managerName"6."managerFullName"
Answer:
a
User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
0/ 1000
Question 8
The code block shown below should efficiently perform a broadcast join of DataFrame storesDF and the much larger DataFrame employeesDF using key column storeId.
Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
__1__.join(__2__(__3__), storeId)
A.
1. employeesDF2. broadcast3. storesDF
B.
1. broadcast(employeesDF)2. broadcast3. storesDF
C.
1. broadcast2. employeesDF3. storesDF
D.
1. storesDF2. broadcast3. employeesDF
E.
1. broadcast(storesDF)2. broadcast3. employeesDF
Answer:
d
User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
0/ 1000
Question 9
Which of the following code blocks returns a DataFrame sorted alphabetically based on column division?
A.
storesDF.sort("division")
B.
storesDF.orderBy(desc("division"))
C.
storesDF.orderBy(col("division").desc())
D.
storesDF.orderBy("division", ascending - true)
E.
storesDF.sort(desc("division"))
Answer:
d
User Votes:
A
50%
B
50%
C
50%
D
50%
E
50%
Discussions
0/ 1000
Question 10
Which of the following describes the difference between cluster and client execution modes?
A.
The cluster execution mode runs the driver on a worker node within a cluster, while the client execution mode runs the driver on the client machine (also known as a gateway machine or edge node).
B.
The cluster execution mode is run on a local cluster, while the client execution mode is run in the cloud.
C.
The cluster execution mode distributes executors across worker nodes in a cluster, while the client execution mode runs a Spark job entirely on one client machine.
D.
The cluster execution mode runs the driver on the cluster machine (also known as a gateway machine or edge node), while the client execution mode runs the driver on a worker node within a cluster.
E.
The cluster execution mode distributes executors across worker nodes in a cluster, while the client execution mode submits a Spark job from a remote machine to be run on a remote, unconfigurable cluster.