Difference between revisions of "AWS Glue"

From Glitchdata
Jump to navigation Jump to search
(Created page with "ETL platform that works well with S3, Hive, Spark. Searches S3 for data structures in csv and extracts table, fields into the Data Catalog. * https://aws.amazon.com/glue/...")
 
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
ETL platform that works well with S3, Hive, Spark.
+
ETL platform that works well with S3, Athena, Redshift.
 
Searches S3 for data structures in csv and extracts table, fields into the Data Catalog.
 
Searches S3 for data structures in csv and extracts table, fields into the Data Catalog.
  
 +
AWS Glue consists of a few components:
 +
* Data Catalog
 +
** Hive equivalent
 +
* Crawlers
 +
** Data discovery using Spark.
 +
* Jobs
 +
** ETL using PySpark
 +
 +
 +
 +
==Links==
 
* https://aws.amazon.com/glue/
 
* https://aws.amazon.com/glue/
  

Latest revision as of 01:36, 9 September 2019

ETL platform that works well with S3, Athena, Redshift. Searches S3 for data structures in csv and extracts table, fields into the Data Catalog.

AWS Glue consists of a few components:

  • Data Catalog
    • Hive equivalent
  • Crawlers
    • Data discovery using Spark.
  • Jobs
    • ETL using PySpark


Links