Parquet Input/Output

View Only

Back to discussions

Expand all | Collapse all

Mario Ermacora12-03-2020 18:10

Hi Wondering if anyone has a solution they are willing to share for reading and writing Parquet formatted ...

Gerard Cafaro01-04-2021 15:25

Mario - I have used the Apache parquet libraries with Analyze in the past with variations for all the ...

1. Parquet Input/Output

0 Like
Mario Ermacora
Posted 12-03-2020 18:10

Reply Reply Privately
Hi

Wondering if anyone has a solution they are willing to share for reading and writing Parquet formatted files with Analyze.

https://parquet.apache.org/documentation/latest/

There are several Python modules available that could potentially be leveraged:

parquet-python (pure python, reader only) : https://github.com/jcrobak/parquet-python

fastparquet (Python 3 only) : https://github.com/dask/fastparquet

PyArrow (Python 3 only) : https://arrow.apache.org/docs/python/

The priority use-case is to publish a partitioned Parquet dataset (which may be multiple physical files) to cloud based storage (S3 or Azure) for efficient ingestion into Hadoop.

Also, I’d also like to know if Infogix has any roadmap plans to include Input and Output connector nodes for Parquet formatted datasets.

Thanks in advance,

Mario Ermacora
2. RE: Parquet Input/Output

0 Like
Employee

Gerard Cafaro
Posted 01-04-2021 15:25

Reply Reply Privately
Mario - I have used the Apache parquet libraries with Analyze in the past with variations for all the use cases above. Attached is reader that may work for you. I've included the 5 additional publicly available jar files you will need. Please note, these are not official nodes, but those I have created and used in the past.

The python modules look interesting and are probably a much simpler implementation. There are also a few options to read directly from the cloud/hdfs datasources across other file types.

Attached files

parquetJars.zip
Parquet Reader-Writer - 4 Jan 2021.lna

Data360 Analyze

Parquet Input/Output

Mario Ermacora12-03-2020 18:10

Gerard Cafaro01-04-2021 15:25

1. Parquet Input/Output

2. RE: Parquet Input/Output

Attached files

About Precisely

Customer Support

Copyright ©2024 Precisely. All rights reserved worldwide.