by William Laurent
A good ETL tool will easily scale up (and down)--from a departmental solution to one suitable for the enterprise--to meet the most demanding requirements of any project, regardless of platform. There are more than 100 tools on the market that purport to have some ETL function, many of which offer complexity in the face of scalability. To load, change, reshape, and migrate data to both star schema and normalized databases, you must rely heavily on both sequential and parallel processing. Although the point-and-click graphical features of many tools seem to promise quick and easy implementation, the fact is that they don't deliver what they promise--you will have to write code to handle the bulk of your processing. But ETL is about more than transformation and scrubbing of data; automation, scheduling, and communications utilities all have to be provided within your ETL environment. Tools that focus primarily on straightforward extraction and vanilla transformation/loading processes should actually be referred to as data migration tools instead of full-function ETL processors. You must learn to identify the outstanding qualifications as well as the caveats of each ETL product you evaluate. Applying the questions and criterion that follow will help your evaluation team¡¯s objective analysis.
-
Database Connectivity: What databases does this product connect to? What non-database data sources (mainframe, XML, flat files, etc.) will this product connect to?
-
Conformity, Integration and Independence: How will the tool behave and operate against existing architectural limitations and requirements? Does it support the hardware platforms and operating systems currently used? Will it integrate with the current network and user security protocols? Is there true platform independence--i.e. the application works the same on all platforms and there will be no nasty surprises when porting compiled code from NT to UNIX--the application will behave exactly the same under two separate platforms and environments?
-
BI Tool Integration and Modeling Tool Connectivity: How does the tool integrate with other tools, such as design tools like ERWin and ER Studio, or reporting tools like Brio, Cognos, etc.?
-
Concurrency, Parallelization Tool Architecture: Does the tool take advantage of the efficiencies of the database and hardware such as parallelism or SMP/MMP? What are the maximum number of simultaneous developers and production processes supported? Are true multithreading or multi processor jobs possible? Is the threading unlimited or set within the product? Are multiple jobs required to achieve high degrees of parallelization or can processing only be done in a serial fashion?
-
Scheduling and Restartability: Can you schedule your loading processes (and restarts) using the product, or do you need to purchase another product to support this functionality--what is built into the tool and what is extra? What features does it have in place for data backup and local/distributed disaster recovery? Can the tool recover from a failed job without manual intervention? How flexible is the tool when executing dependent or sequential job streams? How well does the product respond to inevitable failures such as network problems, server crashes, or running out of disk space?
-
Price and Vendor Comfort: What is the bottom line price (including maintenance) of the product? Are you comfortable with the vendor¡¯s stated price to performance ratio? What do IT advisory services like Gartner, Forrester, Yankee, and META have to say about the value of the product? Remember that real cost represents the purchase price and the hidden support costs associated with each product: It may be necessary to purchase additional hardware, software, or training to realize the full potential of the product in the context of your project.
-
Administration: How well does the tool handle issues of versioning and auditing? What form of documentation does the tool generate? Will this documentation meet internal documentation standards, both departmental and enterprise? Does the tool support performance and throughput benchmarking? What reporting capabilities (per error log, run log, etc.) are provided? How does the documentation stack up?
-
Support: What is the product support and training that you are being offered from the vendor? Are you comfortable with the availability (24/7, consulting, offshore vs. onshore) of the support? Does the maintenance contract seem fair? How often does the vendor upgrade the product? How painful are the upgrade installations? Will the vendor really work with you to help you define and meet your ETL objectives?
-
Versioning and Security:. Can security on database objects, data stores, and files, as well as transformation scripts be controlled through the tool? How are versioning and change management functions handled within the tool? What happens when developers attempt to make changes to the same routine or transformation object at the same time? Can older incarnations of various routines be retained for reference?
-
Proprietary Languages: What programming languages are supported: VB, C, C++, BASIC, COBOL? Are the built-in tool functions, string parsing, and macros robust? Does it have a custom script based modeling language aimed specifically at data warehousing and optimization of data movement and cleansing? How long for developers to get up to speed in learning new languages or methodologies?
The data convergence and warehousing goals of each business will be unique and different depending on their respective needs. Desires will be diverse and numerous; therefore, high-level core needs should be identified and agreed upon up front. Examples:
We must have a tool that runs on {specific operating system}.
We must be able to process {amount} GB of data nightly in a {number} hour batch window.
We must have connectivity to {type of data source}.
We must be able to distribute our data to {number and locations of users}.
We must have a tool that supports concurrent development by multiple sites.
After high-level considerations are enumerated, you can start to examine and list the specific requirements of your project and begin to ask pointed questions to your vendors, ensuring that all evaluations and purchases of ETL products are prudent.
Article content is either copyrighted to the magazine that published it or to
William Laurent.
Copyright © 2025 by Loyer TCG, Inc.