You have been tasked with the unenviable job of helping your company re-define and re-deploy their overall reporting strategy and architecture. A vision of a consolidated data model that accommodates (and [possibly] ultimately replaces) these disparate data sources, as well as a design that supports incremental implementation (for example: by geography, source system, line of business, or user base) is beginning to materialize. Project scope will include architecture, data model(s), source/target mappings and recommendations for data movement/delivery to the end user via data marts. Ongoing support requirements (staffing, business continuity procedures, change management, etc.) and "best practices" are also considered within the scope of this strategy.
An experienced data warehouse project manager will usually determine if a single common centralized data model and environment (that will act as a foundation to the reporting repository and accommodate future additional data sources) will get the job done. The physical implementation of this model should serve ongoing reporting needs dynamically and incrementally, ensuring an acceptable level of alignment and long-term compliance between centralized and distributed data structures and elements. The goals of the centralized repository will have to be completely understood and agreed upon by all stakeholders and members of the project team from the outset.
Analytical initiatives will be diverse. Some objective goals may be to:
During the Discovery Phase, architects should become familiar with the design concepts, approach, extent, intended audience, redundancies, and gaps of each reporting solution in the enterprise. A clear understanding will develop concerning key business drivers and objectives, and how they support analytical reporting efforts throughout the organization.
What will become clear during the discovery phase is that there are many levels of reporting perspectives and needs within the organization. At a minimum, a generalized and (often used) delineation can be made between operational reporting (done, for example, by local branches to report on their daily transactional activity at a detailed level) and analytical reporting (such as trend analysis or risk/portfolio management reviews performed by management or HQ staff.) Each reporting perspective has different informational objectives in terms of scope, volatility, and level of detail: Operational reporting requires access to all transaction details, including deals or sales in progress, on at least a daily basis; analytical reporting is typically performed against historical, somewhat summarized data that is updated less often (monthly or weekly.) Although the individual solutions for each reporting perspective may vary, many enterprises require a single and comprehensive reporting complex--a main/primary reporting warehouse repository that will aid in the effective, cohesive, and streamlined propagation of information across the enterprise.
The differences between operational and analytical informational needs are significant in three critical ways:
Reporting Perspective |
Scope | Volatility | Detail |
Operational ("tactical") |
Includes in-process and active transaction data (i.e.: WIP) | High; usually requires daily (even real-time) updates | Lowest level - local system terminology, data values, business rules (i.e.: local product) |
Analytical ("strategic") |
Includes historical data | Low; static data updated less often (monthly, weekly) | Cleansed/aggregated to a higher level of detail consistent across the organization |
The requirements for distinct multi-level reporting perspectives cannot be satisfied by a single and contradiction-laden "one size fits all" solution. Attempting to address both tactical and strategic reporting requirements in a single reporting data warehouse will compromise and greatly complicate architecture and design, reducing the likelihood of success. A reporting strategy must be developed for analytical and operational perspectives separately--one that is coordinated and complementary to both perspectives, so as to provide the greatest flexibility and functionality. A common and shared meta data schema that allows navigation between analytical and operational reporting areas should form the nucleus of said reporting strategy and associated architectural frameworks.
On most occasions, when undertaking reporting repository project, you will inherit some sort of database that is referred to as the "data warehouse". Although in its current format, this warehouse may not satisfy an acceptable amount of the analytical reporting requirements, it certainly has value as a foundation, and will serve as a jumping off point for the designing and building a full analytical reporting strategy and solution. Scrutiny must be paid to this existing foundation:
For the architect, the discovery process will always unearth omnifarious challenges, both potential and decisive, that will continually impact the design and development of the information repository/warehouse. Typical snafus and challenges could be as noted:
Often, a good approach is to suggest, agree upon, and formalize the optimal and overall warehouse design while simultaneously implementing a incremental data architecture. This will help solve immediate business problems, with the understanding that the 'best practice architecture' and full model for the reporting repository/data warehouse will be iteratively implemented in the future. With a centralized meta data layer in the main data repository, the initial data mart audiences will have a place of mutual consensus on the definition and structure of current and proposed dimensions and fields. As new data marts are modeled with a dimensional slant, they will be populated from subject areas that have already materialized and been completed in the warehouse. The data marts should retain some degree of isolation from the repository. Modification in the repository should not affect the core values of marts downstream.
Clearly defined data marts will support the complex analysis OLAP tools along via pre-defined measures. Measures will be determined by the nature of each dimensional star schema model (its cluster of fact tables and associated dimensions) and/or OLAP cube.
To insure optimum distribution of data, expectations will have to be clarified and the following questions will need to be answered:
Designing a logical architecture for your data warehouse must be done in parallel with that of the physical architecture. Database administrators and systems administrators must be brought in the loop, so as to expose hardware, network, and resource limitations or issues. In addition to the outline above, many more specific issues related to the physical implementation of the data warehouse will need to be addressed. Some of these logistical and administrative issues may be:
Summary
Your company is most likely "data rich, but knowledge poor". Various operational and reporting systems deployed around the globe house a wealth of information about your business; however, data consistency, semantics, and beyond, all need to be consolidated above the local level, providing input and direction to strategic corporate initiatives. It is important the project team recognize the logical and physical risks associated with all proposed solutions, and that NO single enterprise solution will adequately accommodate operational requirements at both the local/regional and enterprise levels without requiring excessive time and effort for development, deployment, customization. Success will only be assured by assiduous attention to every level of detail of meta data, data transformation, and overall architecture.
A sound recommendation is to create a somewhat normalized Entity Relationship Model(s) that will form the logical basis for the reporting repository. This model will initially be coupled with and (loosely) based on existing ODS and transactional systems and their common data elements. This should serve as a nascent template, offering the organization a starting point in the reporting warehouse design, and defining the most common data elements from the outset. If beneficial, child models should inherit from the main model, as there will be certain shared elements across child models that will be indivisible. Via a robust meta data schema (that underlies your reporting architecture) data elements should be able to be viewed consistently at each level and in relation to each other. This will safeguard a truly global view of the business at all levels. Child models, and thus reporting data elements at company local/branch levels, will share common meta data schema elements with the (Parent) main reporting repository, ensuring data consistency of mappings, values, specification, and semantics across the organization. To help deploy such a scenario cleanly, next-generation XML enabled repositories may offer value, but only if these repositories are able to evolve beyond a document-centric skeleton.