EDW
Let’s start by considering a general architectural model for an Enterprise Data Warehouse, or EDW, platform, which companies can adapt for their analytics requirements.
In this architecture, you can have various layers or components, including:
- Data sources, such as flat files, databases, and existing operational systems,
- an ETL layer for extracting, transforming, and loading data, optional staging and sandbox areas for holding data and developing workflows,
- an enterprise data warehouse repository,
- sometimes, data marts, which are known as a “hub and spoke” architecture when multiple data marts are involved, and
- an analytics layer and business intelligence tools.
Data warehouses also enforce security for incoming data and data passing through to further stages and users throughout the network. Interoperability among components is vital.
IBM EDW
Next, let’s check out IBM-specific reference data warehouse architecture.
Each layer of the architecture performs a specific function:
- The data acquisition layer consists of components to acquire raw data from source systems, such as human resources, finance, and billing departments.
- The data integration layer, essentially a staging area, has components for extracting the data, transforming it, and loading it into the data repository layer. It also houses administration tools and central metadata.
- The data repository layer stores the integrated data, typically employing a relational model.
- The analytics layer often stores data in a cube format to make it easier for users toanalyze it.
- And, the final presentation layer incorporates applications that provide access for different sets of users, such as marketing analysts, users, and agents. Applications consume the data through web pages and portals defined in the reporting tool or through web services.
IBM reference architecture is supported and extended using several products from the IBM InfoSphere suite.
- IBM InfoSphere DataStage is a scalable ETL platform that delivers near real-time integration of all data types, on-premises, and in cloud environments.
- IBM InfoSphere MetaData Workbench provides end-to-end data flow reporting and impacts analysis of information assets in an environment that allows organizations to share easily, locate, and retrieve information from these systems.
- Use the built-in data flow reporting capabilities to monitor how IBM InfoSphere DataStage moves and transforms your data.
- IBM InfoSphere QualityStage, designed to support your data quality and information governance initiatives, enables you to investigate, cleanse, and manage your data.
- And finally, IBM Cognos Analytics is an advanced business intelligence platform that generates reports, scoreboards, and dashboards, performs exploratory data analysis, and even curates and joins your data using multiple sources.