How to preprocess invalid CSV in a canonical way #1835
alexeyabel
started this conversation in
Idea
Replies: 1 comment 1 reply
-
|
What do you mean by illegal syntax? If it's not a valid csv file then you will just treat it as a normal text file. In that case doing pd.read_csv within your node may not be too bad, it's is acting like a transformation logic (arguably a string2dataframe function) instead of I/O. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What is the canonical way of fixing a CSV file with illegal syntax and then continue working with it? I cant' use
type: pandas.CSVDataSetfor it in the data catalog because parsing it would drop some illegal data.So far I am using
kedro.extras.datasets.text.TextDataSetand fix the raw string of the file in a node. But how should I create the next catalog entry. I tried telling the node tooutputit into a data entry oftype: pandas.CSVDataSetbut I get the error thatstrdoes not contain ato_csvattribute. Should I callpandas.read_csv()in my syntax fixing method manually? Or how do I add preprocssing steps to fix the faulty CSV?Beta Was this translation helpful? Give feedback.
All reactions