Tuesday 10 January 2017

Enterprise Data Warehousing

Introduction:
A data warehouse is a database designed to enable business intelligence activities. It exists to help users understand and enhance their organization's performance. A data warehouse environment can include an extraction, transportation, transformation, and loading (ETL) solution, statistical analysis, reporting, data mining capabilities and client analysis tools. It also helps for content management systems that manage the process of gathering data, transforming it into useful, actionable information, and delivering it to business users.

A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as follow:

  • Subject-oriented:  Data warehousing is designed to help analysing data for a particular subject.
  • Integrated: Data warehouses must put data from disparate sources into a consistent format.
  • Non-volatile: Once data is entered into the data warehouse, it should not change. This is logical because the purpose of a data warehouse is to enable you to analyze what has occurred.
  • Time variant: A data warehouse's focus on change over time is what is meant by the term time variant.
Key characteristics of data warehousing:
  • Data is structured for simplicity of access and high-speed query performance.
  • End users are time-sensitive and desire speed-of-thought response times.
  • Large amounts of historical data are used.
  • Queries often retrieve large amounts of data, perhaps many thousands of rows.
  • Both predefined and ad hoc queries are common.
  • The data load involves multiple sources and transformations.
Tasks of Data Warehousing:
  • Configuring an Oracle database for use as a data warehouse
  • Designing data warehouses
  • Performing upgrades of the database and data warehousing software to new releases
  • Managing schema objects, such as tables, indexes, and materialized views
  • Managing users and security
  • Developing routines used for the extraction, transformation, and loading (ETL) processes
  • Creating reports based on the data in the data warehouse
  • Backing up the data warehouse and performing recovery when necessary
  • Monitoring the data warehouse's performance and taking preventive or corrective action as required
Challenges of data warehousing:
There are so many challenges faced by software development companies regarding data warehousing as follow:

Ensuring acceptable data quality:
  • Disparate data sources add to data inconsistency
  • Not stabilized source systems
Ensuring acceptable performance:
  • Prioritizing performance
  • Setting realistic goal
Testing data warehouse:
  • Test planning
  • No automated testing
Reconciliation of data in data warehouse:
  • Complex
User acceptance:
  • Reluctant users
Benefits of Data warehousing:
  • Congregate data from multiple sources into a single database so a single query engine can be used to present data.
  • Mitigate the problem of database isolation level lock contention in transaction processing systems caused by attempts to run large, long running, analysis queries in transaction processing databases.
  • Maintain data history, even if the source transaction systems do not.
  • Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger.
  • Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
  • Present the organization's information consistently.
  • Provide a single common data model for all data of interest regardless of the data's source.
  • Restructure the data so that it makes sense to the business users.
  • Restructure the data so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems.
  • Add value to operational business applications, notably customer relationship management (CRM) systems.
  • Make decision–support queries easier to write.
Conclusion:
Data warehousing is a collection of methods, techniques, and tools used to support knowledge workers—senior managers, directors, managers, and analysts—to conduct data analyses that help with performing decision-making processes and improving information resources. This concept is very useful to all software development companies in India.