Ingest CSV / flat files without knowing schema in advance
We’re looking to hear use cases from users who need to load CSV or other flat files (JSON, ORC, Parquet etc…) without having to create a table with the correct structure first.
Currently in Data Productivity Cloud the process for loading a CSV file is
- Analyse the file structure outside of the platform - perhaps using Notepad or Excel
- Create a table using the
Create Table
component with the correct structure - Run this, then add an
S3 Load
/Azure Blob Load
/GCS Load
component and configure to load the file(s) - Run the pipeline
We believe this approach currently has limitation in terms of
- Delaying data value by requiring the understanding of the file schema in advance
- Not supporting schema drift - that is the file schema changing between executions (e.g. new columns being added to a CSV file after initial pipeline development).
If you are interested in this potential please, please contact us so we can learn more and you can help us shape the product development
Post Information
Subscribe to post
Get notified by email when there are changes.
Upvoters
+2
Downvoters
Post Details