Hortonworks hdpcd practice test

Hortonworks Data Platform Certified Developer Exam


Question 1

You want to Ingest log files Into HDFS, which tool would you use?

  • A. HCatalog
  • B. Flume
  • C. Sqoop
  • D. Ambari
Answer:

B

Discussions

Question 2

Which of the following tool was designed to import data from a relational database into HDFS?

  • A. HCatalog
  • B. Sqoop
  • C. Flume
  • D. Ambari
Answer:

B

Discussions

Question 3

Which HDFS command copies an HDFS file named foo to the local filesystem as localFoo?

  • A. hadoop fs -get foo LocalFoo
  • B. hadoop -cp foo LocalFoo
  • C. hadoop fs -Is foo
  • D. hadoop fs -put foo LocalFoo
Answer:

A

Discussions

Question 4

Which HDFS command displays the contents of the file x in the user's HDFS home directory?

  • A. hadoop fs -Is x
  • B. hdfs fs -get x
  • C. hadoop fs -cat x
  • D. hadoop fs -cp x
Answer:

C

Discussions

Question 5

What is a SequenceFile?
A. A SequenceFile contains a binary encoding of an arbitrary number of homogeneous writable
objects.
B. A SequenceFile contains a binary encoding of an arbitrary number of heterogeneous writable
objects.
C. A SequenceFile contains a binary encoding of an arbitrary number of WritableComparable objects,
in sorted order.
D. A SequenceFile contains a binary encoding of an arbitrary number key-value pairs. Each key must
be the same type. Each value must be same type.

Answer:

D
SequenceFile is a flat file consisting of binary key/value pairs.

Uncompressed key/value records.
Record compressed key/value records - only 'values' are compressed here.
Block compressed key/value records - both keys and values are collected in 'blocks' separately and
compressed. The size of the 'block' is configurable.

Reference:
http://wiki.apache.org/hadoop/SequenceFile

Discussions

Question 6

Your clusters HDFS block size in 64MB. You have directory containing 100 plain text files, each of
which is 100MB in size. The InputFormat for your job is TextInputFormat. Determine how many
Mappers will run?
A. 64
B. 100
C. 200
D. 640

Answer:

C
Each file would be split into two as the block size (64 MB) is less than the file size (100 MB), so 200
mappers would be running.

If you're not compressing the files then hadoop will process your large files (say 10G), with a number
of mappers related to the block size of the file.
Say your block size is 64M, then you will have ~160 mappers processing this 10G file (160*64 ~=
10G). Depending on how CPU intensive your mapper logic is, this might be an
acceptable blocks size, but if you find that your mappers are executing in sub minute times, then you
might want to increase the work done by each mapper (by increasing the block size to 128, 256,
512m - the actual size depends on how you intend to process the data).

Reference:
http://stackoverflow.com/questions/11014493/hadoop-mapreduce-appropriate-input-
files-size
(first answer, second paragraph)

Discussions

Question 7

You want to run Hadoop jobs on your development workstation for testing before you submit them
to your production cluster. Which mode of operation in Hadoop allows you to most closely simulate a
production cluster while using a single machine?

  • A. Run all the nodes in your production cluster as virtual machines on your development workstation.
  • B. Run the hadoop command with the –jt local and the –fs file:///options.
  • C. Run the DataNode, TaskTracker, NameNode and JobTracker daemons on a single machine.
  • D. Run simldooop, the Apache open-source software for simulating Hadoop clusters.
Answer:

C

Discussions

Question 8

You want to perform analysis on a large collection of images. You want to store this data in HDFS and
process it with MapReduce but you also want to give your data analysts and data scientists the ability
to process the data directly from HDFS with an interpreted high-level programming language like
Python. Which format should you use to store this data in HDFS?

  • A. SequenceFiles
  • B. Avro
  • C. JSON
  • D. HTML
  • E. XML
  • F. CSV
Answer:

B

Reference: Hadoop binary files processing introduced by image duplicates finder

Discussions

Question 9

When can a reduce class also serve as a combiner without affecting the output of a MapReduce
program?
A. When the types of the reduce operations input key and input value match the types of the
reducers output key and output value and when the reduce operation is both communicative and
associative.
B. When the signature of the reduce method matches the signature of the combine method.
C. Always. Code can be reused in Java since it is a polymorphic object-oriented programming
language.
D. Always. The point of a combiner is to serve as a mini-reducer directly after the map phase to
increase performance.
E. Never. Combiners and reducers must be implemented separately because they serve different
purposes.

Answer:

A
You can use your reducer code as a combiner if the operation performed is commutative and
associative.

Reference: 24 Interview Questions & Answers for Hadoop MapReduce developers, What are
combiners? When should I use a combiner in my MapReduce Job?

Discussions

Question 10

Which best describes what the map method accepts and emits?
A. It accepts a single key-value pair as input and emits a single key and list of corresponding values as
output.
B. It accepts a single key-value pairs as input and can emit only one key-value pair as output.
C. It accepts a list key-value pairs as input and can emit only one key-value pair as output.
D. It accepts a single key-value pairs as input and can emit any number of key-value pair as output,
including zero.

Answer:

D
public class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
extends Object
Maps input key/value pairs to a set of intermediate key/value pairs.
Maps are the individual tasks which transform input records into a intermediate records. The
transformed intermediate records need not be of the same type as the input records. A given input
pair may map to zero or many output pairs.

Reference: org.apache.hadoop.mapreduce
Class Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

Discussions
To page 2