Certainly the most important Business Intelligence (BI) component to Microsoft’s arsenal is SSIS. Its core responsibility is the movement and cleansing of data. Without this cleansing and movement, every other component would not exist or, at a minimum, would report bad data.
Integration Services is a solution that provides enterprise-level data integration and workflow solutions that have as their goal the extraction, transformation, and loading (ETL) of data from various sources to various destinations. SSIS includes a wide range of tools and wizards to assist in the creation of the workflow and data flow activities that you need to manage in these complex data-movement solutions.
Before diving into the detailed components within Integration Services, you should understand some of the more common business scenarios that involve creating SSIS solutions. Some common uses for SSIS include:
- Archival of data (export)
- Loading of new data (import)
- Transferring data from one data source to another
- Data cleansing or transformation of dirty data
- DBA tasks like purging old files or indexing a database
One of the first scenarios is combining data from different sources stored in different storage systems. In this scenario, SSIS is responsible for connecting to each data source, extracting the data, and merging it into a single dataset. For example, in today’s information systems topology, this is becoming increasingly common because businesses archive information that is not needed for regular operations but is invaluable to analyze business trends or meet compliance requirements. You can also find this scenario when different parts of a business use different storage technologies or different schemas to represent the same data. In these cases, SSIS performs the homogenization of the information. SSIS seamlessly handles multiple divergent data sources and the transformations that can alter data types, split or merge columns, and look up descriptive information that becomes a powerful asset for these situations.
Another common scenario is the population and maintenance of data warehouses and data marts. In these business uses, the data volumes tend to be exceptionally large, and the window of time in which to perform the extraction, transformation, and loading of the data tends to be rather short. SSIS includes the capability to bulk-load data directly from flat files in SQL Server and has a destination component that can perform a bulk load into SQL Server. A key feature for large data volume and complex enrichment and transformation situations such as these is restartability. SSIS includes checkpoints to handle rerunning a package from a task or container within the control flow so that you can elegantly handle various types of errors that may occur during these complex data-loading scenarios.
Also important in data warehouse loads is the ability to source a particular destination from many different tables or files. In the database world, this is referred to as denormalization, and SSIS packages can easily merge data into a single dataset and load the destination table in a single process without the need to stage or land the data at each step of the process.
You often require the management or partitioning of history within your data warehouses to review the state of activity at a certain point in time. This history management creates complex updating scenarios, and SSIS handles this with the assistance of the Slowly Changing Dimension Wizard. This wizard dynamically creates and configures a set of data transformation tasks used to manage inserting and updating records, updating related records, and adding new columns to tables to support this history management.
Often, businesses receive data from outside of their systems and need to perform data-quality routines to standardize and clean the data before loading it into their systems. SSIS can handle most data situations from heterogeneous databases or flat files. This is commonly the case when different areas of the business use different standards and formats for the information or when the data is being purchased, such as with address data. Sometimes the data formats are different because the platforms from which they originate differ from the intended destination. In these cases, SSIS includes a rich set of data-transformation tasks to perform a wide range of data-cleaning, converting, and enriching functions. You can replace values or get descriptions from code values by using exact or fuzzy lookups within SSIS. Identifying records that may be duplicates by using SSIS grouping transformations helps to successfully remove them before loading the destination.
The ability to dynamically adjust the data transformations being performed is a common scenario within businesses. Often, data needs to be handled differently based on certain values it may contain or even based upon the summary or count of values in a given set of records. SSIS includes a rich set of transformations that are useful for splitting or merging data based upon data values, applying different aggregations or calculations based on different parts of a dataset, and loading different parts of the data into different locations. SSIS containers specifically support evaluating expressions, enumerating across a set of information, and performing workflow tasks based on results of the data values.
Lastly, you commonly have operational administrative functions that require automation. SSIS includes an entire set of tasks devoted to these administrative functions. You can use tasks specifically designed to copy SQL Server objects or facilitate the bulk loading of data. You also have access in SSIS to a SQL Management Objects (SMO) enumerator to perform looping across your servers to perform administrative operations on each server in your environment. When complete, you can also schedule all your SSIS packages and solutions using SQL Server Agent jobs.
Leave a Reply