Data Integration

From Glitchdata
Jump to: navigation, search

Data Integration centers around the transport of data across an organisation. This is generally performed by ETL tools.


Selection Criteria

Typical selection criteria for data integration (ETL / ELT) tools are:

  1. Architecture
  2. ETL Functionality
  3. Ease-of-Use
  4. Reusability
  5. Debugging
  6. Real-time
  7. Connectivity
  8. General ETL tool characteristics

Checks

  • Data Cleasing
  • Data Formats
  • Errors - ie. missing records
  • Error Tolerance - when does an ETL bail.
  • Memory Usage

Challenges

  • Data transfer volumes are growing exponentially.
    • Disparate sources of data are becoming common place.
  • ETL processes have to process large amounts of OLTP data.
    • Some ETL process are smarter with incremental updating, but generally this is not good enough.
  • BI data structures are varied. ETL requirements are different for data warehouses, data marts, and for specific visualisation needs (eg. analysis, reporting, dashboarding, scorecarding).
  • Transformation needs are getting more complex.
    • data needs to be aggregated, parsed, computed, statistically processed
  • BI is tending towards realtime, so ETLs have to refresh data-warehouses and datamarts more frequently and within a smaller load time window.
  • Off-peak ETL window is getting increasing small
  • Incremental ETL transfer becoming commonplace.
  • Realtime ETLs are nice to have


Documentation

Links