Top Answers to ETL Interview Questions


ETL represents extract, transform, and load. These are the three functions of databases that are joined into a solitary apparatus with the end goal that you can take out data from a specific database and store or keep it in another. This ETL Interview Questions blog has an accumulated rundown of inquiries that are most commonly posed during interviews. Set up the ETL interview addresses recorded underneath and prepare to split your prospective employee meeting:

Q1. Compare between ETL and ELT.
Q2. What is an ETL process?
Q3. How many steps are there in an ETL process?
Q4. What are the steps involved in an ETL process?
Q5. Can there be sub-steps for each of the ETL steps?
Q6. What are initial load and full load?
Q7. What is meant by incremental load?
Q8. What is a 3-tier system in ETL?
Q9. What are the three tiers in ETL?
Q10. What are the names of the layers in ETL?

This ETL Interview Questions blog is broadly divided into the categories mentioned below:
1. Basic

2. Intermediate

3. Advanced

Basic Interview Questions

1. Compare between ETL and ELT.

Working methodologyData from the source system to the data warehouseLeverages the target system to transform data

2. What is an ETL process?

ETL is the process of Extraction, Transformation, and Loading.

3. How many steps are there in an ETL process?

In an ETL process, first data is extracted from a source, such as database servers, and this data is then used to generate business roll.

4. What are the steps involved in an ETL process?

The steps involved are defining the source and the target, creating the mapping, creating the session, and creating the workflow.

5. Can there be sub-steps for each of the ETL steps?

Each of the steps involved in ETL has several sub-steps. The transform step has more number of sub-steps.

6. What are initial load and full load?

In ETL, the underlying load is the cycle for populating all data warehousing tables for the absolute first time. In full load, when the data is loaded unexpectedly, all set records are loaded at a stretch contingent upon its volume. It would eradicate all substance from the table and would reload the new data.

7. What is meant by incremental load?

Incremental load refers to applying dynamic changes as and when required in a specific period and predefined schedules.

8. What is a 3-tier system in ETL?

The data warehouse is considered to be the 3-tier system in ETL.

9. What are the three tiers in ETL?

The middle tier in ETL provides end users the data that is usable in a secure way. the other two layers are on either side of the middle tier, the end users and the back-end data storage.

10. What are the names of the layers in ETL?

The first layer in ETL is the source layer, and it is the layer where data lands. The second layer is the integration layer where the data is stored after transformation. The third layer is the dimension layer where the actual presentation layer stands.

Intermediate Interview Questions

11. What is meant by snapshots?

Snapshots are the copies of the read-only data that is stored in the master table.

12. What are the characteristics of snapshots?

Snapshots are located on remote nodes and refreshed periodically so that the changes in the master table can be recorded. They are also the replica of tables.

13. What are views?

Views are built using the attributes of one or more tables. View with a single table can be updated, but those with multiple tables cannot be updated.

14. What is meant by a materialized view log?

A materialized view log is the pre-computed table with aggregated or joined data from the fact tables, as well as the dimension tables.

15. What is a materialized view?

A materialized view is an aggregate table.

16. What is the difference between PowerCenter and PowerMart?

PowerCenter processes large volumes of data, whereas Power Mart processes small volumes of data.

17. With which apps can PowerCenter be connected?

PowerCenter can be connected with ERP sources such as SAP, Oracle Apps, PeopleSoft, etc.

18. Which partition is used to improve the performances of ETL transactions?

To improve the performances of ETL transactions, the session partition is used.

19. Does PowerMart provide connections to ERP sources?

No! PowerMart does not provide connections to any of the ERP sources. It also does not allow sessions partition.

20. What is meant by partitioning in ETL?

Partitioning in ETL refers to the sub-division of the transactions in order to improve their performance.

Advanced Interview Questions

21. What is the benefit of increasing the number of partitions in ETL?

An increase in the number of partitions enables the Informatica server to create multiple connections to a host of sources.

22. What are the types of partitions in ETL?

Types of partitions in ETL are Round-Robin partition and Hash partition.

23. What is Round-Robin partitioning?

In Round-Robin partitioning, the data is evenly distributed by Informatica among all partitions. It is used when the number of rows in the process in each of the partitions is nearly the same.

24. What is Hash partitioning?

In Hash partitioning, the Informatica server would apply a hash function in order to partition keys to group data among the partitions. It is used to ensure the processing of a group of rows with the same partitioning key in the same partition.

25. What is mapping in ETL?

Mapping refers to the flow of data from the source to the destination.

26. What is a session in ETL?

A session is a set of instructions that describe the data movement from the source to the destination.

27. What is meant by Worklet in ETL?

Worklet is a set of tasks in ETL. It can be any set of tasks in the program.

28. What is Workflow in ETL?

Workflow is a set of instructions that specify the way of executing the tasks to the Informatica.

29. What is the use of Mapplet in ETL?

Mapplet in ETL is used for the purpose of creation as well as the configuration of a group of transformations.

30. What is meant by operational data store?

The operational data store (ODS) is the repository that exists between the staging area and the data warehouse. The data stored in ODS has low granularity.

31. How does the operational data store work?

Aggregated data is loaded into the enterprise data warehouse (EDW) after it is populated in the operational data store (ODS). Basically, ODS is a semi-data warehouse (DWH) that allows analysts to analyze the business data. The data persistence period in ODS is usually in the range of 30–45 days and not more.

32. What does the ODS in ETL generate?

ODS in ETL generates primary keys, takes care of errors, and also rejects just like the DWH.

33. When are the tables in ETL analyzed?

The use of the ANALYZE statement allows the validation and computing of statistics for either the index table or the cluster.

34. How are the tables analyzed in ETL?

Statistics generated by the ANALYZE statement is reused by a cost-based optimizer in order to calculate the most efficient plan for data retrieval. The ANALYZE statement can support the validation of structures of objects, as well as space management, in the system. Operations include COMPUTER, ESTIMATE, and DELETE.

35. How can the mapping be fine-tuned in ETL?

Steps for tweaking the planning includes utilizing the condition for channel in the source qualifying the data without the utilization of channel, using constancy just as reserve store in query t/r, utilizing the conglomerations t/r in arranged I/p group by various ports, utilizing operators in articulations rather than functions, and expanding the store estimate and submit span.

36. What are the differences between connected and unconnected lookups in ETL?

Associated query is utilized for planning and returns different qualities. It tends to be associated with another transformation and furthermore restores a worth. Detached query is utilized when the query isn’t accessible in the fundamental stream, and it returns just a solitary yield. It additionally can’t be associated with another transformation yet is reusable.