Data Integration

From Glitchdata
Jump to navigation Jump to search

Data Integration centers around the transport of data across an organisation. This is generally performed by ETL tools.


Checks

  • Data Cleasing
  • Data Formats
  • Errors - ie. missing records
  • Error Tolerance - when does an ETL bail.
  • Memory Usage

Challenges

  • Data transfer volumes are growing exponentially.
    • Disparate sources of data are becoming common place.
  • ETL processes have to process large amounts of OLTP data.
    • Some ETL process are smarter with incremental updating, but generally this is not good enough.
  • BI data structures are varied. ETL requirements are different for data warehouses, data marts, and for specific visualisation needs (eg. analysis, reporting, dashboarding, scorecarding).
  • Transformation needs are getting more complex.
    • data needs to be aggregated, parsed, computed, statistically processed
  • BI is tending towards realtime, so ETLs have to refresh data-warehouses and datamarts more frequently and within a smaller load time window.
  • Off-peak ETL window is getting increasing small
  • Incremental ETL transfer becoming commonplace.
  • Realtime ETLs are nice to have
  • Data Quality

Documentation

Links