Ingest CSV / flat files without knowing schema in advance

We’re looking to hear use cases from users who need to load CSV or other flat files (JSON, ORC, Parquet etc…) without having to create a table with the correct structure first.

Currently in Data Productivity Cloud the process for loading a CSV file is

Analyse the file structure outside of the platform - perhaps using Notepad or Excel
Create a table using the Create Table component with the correct structure
Run this, then add an S3 Load / Azure Blob Load / GCS Load component and configure to load the file(s)
Run the pipeline

We believe this approach currently has limitation in terms of

Delaying data value by requiring the understanding of the file schema in advance
Not supporting schema drift - that is the file schema changing between executions (e.g. new columns being added to a CSV file after initial pipeline development).

If you are interested in this potential please, please contact us so we can learn more and you can help us shape the product development

Post Information

Subscribe to post

Get notified by email when there are changes.

Upvoters

Downvoters

Post Details

Board

Features

Posted on

Posted by

Matillion Community