Difference between revisions of "Parquet"

From Glitchdata
Jump to navigation Jump to search
 
Line 1: Line 1:
 
Parquet compresses data for storage in Hadoop. This is especially useful when you have lots of small files. These small files consume a lot of disk blocks and reduce the storage-efficiency of HFDS.
 
Parquet compresses data for storage in Hadoop. This is especially useful when you have lots of small files. These small files consume a lot of disk blocks and reduce the storage-efficiency of HFDS.
  
Parquet compress groups of small files to fill a block. It still facilitates queries.
+
* Parquet compress groups of small files to fill a block. It still facilitates queries.
 
+
* Parquet can facilitate columnar storage of data. In the big data paradigm, bruteforce data scanning by tools like [[Athena]] can query parquet data in [[S3]]
  
 +
==Links==
 +
* https://parquet.apache.org/
  
  
 
+
==Related==
 
+
* [[Oozie]]
==Links==
 
* https://parquet.apache.org/
 
  
  
 
[[Category: Parquet]]
 
[[Category: Parquet]]
 
[[Category: Hadoop]]
 
[[Category: Hadoop]]

Latest revision as of 18:00, 8 October 2019

Parquet compresses data for storage in Hadoop. This is especially useful when you have lots of small files. These small files consume a lot of disk blocks and reduce the storage-efficiency of HFDS.

  • Parquet compress groups of small files to fill a block. It still facilitates queries.
  • Parquet can facilitate columnar storage of data. In the big data paradigm, bruteforce data scanning by tools like Athena can query parquet data in S3

Links


Related