Pandas write blank values to CSV

I have a DataFrame which looks like the following:

import pandas as pd

import numpy as np

df = pd.DataFrame(data={'data': [4, np.nan]})

df

   data

0   4.0

1   NaN

This DataFrame is generated after parsing data from various sources and sometimes I do not have values for some integer fields.

Eventually this data is to be stored in a CSV and I'd like to store this as an integer 4 and a blank. I have tried using replace({numpy.nan: None}) and also replace({numpy.nan: ''}) but that didn't help.

How do I ensure I get an integer 4 and a blank when this gets exported to CSV? The end goal is to use this CSV to load it into Redshift/Postgres which has explicitly defined datatypes.

Update:
How am I writing this file?

with s3fs.open(s3_path, 'wb') as f:

        s = StringIO()

        df.to_csv(s, index=False, header=True)

        f.write(s.getvalue().encode('utf-8'))

        s = None

edited Dec 28 '18 at 18:01

asked Dec 28 '18 at 17:57

Little Child

17.3k2396169

i did not get any problem in saving the file. I did df.to_csv('mydf.csv', index=False)

– YOLO
Dec 28 '18 at 18:00

how are you writing the file ?

– YOLO
Dec 28 '18 at 18:00

@YOLO Updated the question. I am writing this as a file to S3.

– Little Child
Dec 28 '18 at 18:02

okay, may be you should update the question title, that changes everything. are you using boto3 ?

– YOLO
Dec 28 '18 at 18:02

@YOLO When I do to_csv, I get 4.0 and a blank.

– Little Child
Dec 28 '18 at 18:03

add a comment |

I have a DataFrame which looks like the following:

import pandas as pd

import numpy as np

df = pd.DataFrame(data={'data': [4, np.nan]})

df

   data

0   4.0

1   NaN

This DataFrame is generated after parsing data from various sources and sometimes I do not have values for some integer fields.

How do I ensure I get an integer 4 and a blank when this gets exported to CSV? The end goal is to use this CSV to load it into Redshift/Postgres which has explicitly defined datatypes.

Update:
How am I writing this file?

with s3fs.open(s3_path, 'wb') as f:

        s = StringIO()

        df.to_csv(s, index=False, header=True)

        f.write(s.getvalue().encode('utf-8'))

        s = None

edited Dec 28 '18 at 18:01

asked Dec 28 '18 at 17:57

Little Child

17.3k2396169

i did not get any problem in saving the file. I did df.to_csv('mydf.csv', index=False)

– YOLO
Dec 28 '18 at 18:00

how are you writing the file ?

– YOLO
Dec 28 '18 at 18:00

@YOLO Updated the question. I am writing this as a file to S3.

– Little Child
Dec 28 '18 at 18:02

okay, may be you should update the question title, that changes everything. are you using boto3 ?

– YOLO
Dec 28 '18 at 18:02

@YOLO When I do to_csv, I get 4.0 and a blank.

– Little Child
Dec 28 '18 at 18:03

add a comment |

I have a DataFrame which looks like the following:

import pandas as pd

import numpy as np

df = pd.DataFrame(data={'data': [4, np.nan]})

df

   data

0   4.0

1   NaN

This DataFrame is generated after parsing data from various sources and sometimes I do not have values for some integer fields.

How do I ensure I get an integer 4 and a blank when this gets exported to CSV? The end goal is to use this CSV to load it into Redshift/Postgres which has explicitly defined datatypes.

Update:
How am I writing this file?

with s3fs.open(s3_path, 'wb') as f:

        s = StringIO()

        df.to_csv(s, index=False, header=True)

        f.write(s.getvalue().encode('utf-8'))

        s = None

edited Dec 28 '18 at 18:01

asked Dec 28 '18 at 17:57

Little Child

17.3k2396169

I have a DataFrame which looks like the following:

import pandas as pd

import numpy as np

df = pd.DataFrame(data={'data': [4, np.nan]})

df

   data

0   4.0

1   NaN

This DataFrame is generated after parsing data from various sources and sometimes I do not have values for some integer fields.

How do I ensure I get an integer 4 and a blank when this gets exported to CSV? The end goal is to use this CSV to load it into Redshift/Postgres which has explicitly defined datatypes.

Update:
How am I writing this file?

with s3fs.open(s3_path, 'wb') as f:

        s = StringIO()

        df.to_csv(s, index=False, header=True)

        f.write(s.getvalue().encode('utf-8'))

        s = None

python pandas

edited Dec 28 '18 at 18:01

asked Dec 28 '18 at 17:57

Little Child

17.3k2396169

edited Dec 28 '18 at 18:01

asked Dec 28 '18 at 17:57

Little Child

17.3k2396169

edited Dec 28 '18 at 18:01

asked Dec 28 '18 at 17:57

Little Child

17.3k2396169

asked Dec 28 '18 at 17:57

Little Child

17.3k2396169

asked Dec 28 '18 at 17:57

Little Child

17.3k2396169

i did not get any problem in saving the file. I did df.to_csv('mydf.csv', index=False)

– YOLO
Dec 28 '18 at 18:00

how are you writing the file ?

– YOLO
Dec 28 '18 at 18:00

@YOLO Updated the question. I am writing this as a file to S3.

– Little Child
Dec 28 '18 at 18:02

okay, may be you should update the question title, that changes everything. are you using boto3 ?

– YOLO
Dec 28 '18 at 18:02

@YOLO When I do to_csv, I get 4.0 and a blank.

– Little Child
Dec 28 '18 at 18:03

add a comment |

i did not get any problem in saving the file. I did df.to_csv('mydf.csv', index=False)

– YOLO
Dec 28 '18 at 18:00

how are you writing the file ?

– YOLO
Dec 28 '18 at 18:00

@YOLO Updated the question. I am writing this as a file to S3.

– Little Child
Dec 28 '18 at 18:02

okay, may be you should update the question title, that changes everything. are you using boto3 ?

– YOLO
Dec 28 '18 at 18:02

@YOLO When I do to_csv, I get 4.0 and a blank.

– Little Child
Dec 28 '18 at 18:03

i did not get any problem in saving the file. I did df.to_csv('mydf.csv', index=False)

– YOLO
Dec 28 '18 at 18:00

how are you writing the file ?

– YOLO
Dec 28 '18 at 18:00

@YOLO Updated the question. I am writing this as a file to S3.

– Little Child
Dec 28 '18 at 18:02

okay, may be you should update the question title, that changes everything. are you using boto3 ?

– YOLO
Dec 28 '18 at 18:02

@YOLO When I do to_csv, I get 4.0 and a blank.

– Little Child
Dec 28 '18 at 18:03

add a comment |

3 Answers
3

active

oldest

votes

You could specify the dtype as int:

df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)

df.to_csv('output.csv', index=False)

output.csv

data

4

""

EDIT

If you need to specify the data types on a dataframe you already created you can use .astype on the dataframe.
If your dataframe contains NaN values, you should convert to object type:

df['data'] = df['data'].astype(object)

edited Dec 28 '18 at 18:43

answered Dec 28 '18 at 18:07

Edgar R. Mondragón

1,5112719

This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

– Little Child
Dec 28 '18 at 18:10

I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

– Edgar R. Mondragón
Dec 28 '18 at 18:20

I tried astype(int) but NaNs cannot be converted to int

– Little Child
Dec 28 '18 at 18:22

You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

– Edgar R. Mondragón
Dec 28 '18 at 18:42

add a comment |

The function

df.to_csv(s, index=False, header=True)

should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:

>>> df.mask(df.isna(), None)

   data

0     4

1  None

Feel free to replace None above with whatever you like, such as, "".

answered Dec 28 '18 at 18:07

Adrian Tam

1,133112

This does not help when saving to CSV.

– Little Child
Dec 28 '18 at 18:09

yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

– Adrian Tam
Dec 28 '18 at 18:10

Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

– Little Child
Dec 28 '18 at 18:11

add a comment |

Uncomplicated!

df.data = df.data.fillna(' ')

df.to_csv('anyfilename.csv', sep=',')

Basically what fillna does is fills all the NaN values of the 'data' column of the dataframe by any desired value (in our case " " blank). Then we are using the to_csv method to write the entire dataframe to the desired csv file.

answered Dec 28 '18 at 18:16

Kaivalya Kate

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53962485%2fpandas-write-blank-values-to-csv%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

You could specify the dtype as int:

df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)

df.to_csv('output.csv', index=False)

output.csv

data

4

""

EDIT

If you need to specify the data types on a dataframe you already created you can use .astype on the dataframe.
If your dataframe contains NaN values, you should convert to object type:

df['data'] = df['data'].astype(object)

edited Dec 28 '18 at 18:43

answered Dec 28 '18 at 18:07

Edgar R. Mondragón

1,5112719

This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

– Little Child
Dec 28 '18 at 18:10

I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

– Edgar R. Mondragón
Dec 28 '18 at 18:20

I tried astype(int) but NaNs cannot be converted to int

– Little Child
Dec 28 '18 at 18:22

You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

– Edgar R. Mondragón
Dec 28 '18 at 18:42

add a comment |

You could specify the dtype as int:

df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)

df.to_csv('output.csv', index=False)

output.csv

data

4

""

EDIT

If you need to specify the data types on a dataframe you already created you can use .astype on the dataframe.
If your dataframe contains NaN values, you should convert to object type:

df['data'] = df['data'].astype(object)

edited Dec 28 '18 at 18:43

answered Dec 28 '18 at 18:07

Edgar R. Mondragón

1,5112719

This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

– Little Child
Dec 28 '18 at 18:10

I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

– Edgar R. Mondragón
Dec 28 '18 at 18:20

I tried astype(int) but NaNs cannot be converted to int

– Little Child
Dec 28 '18 at 18:22

You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

– Edgar R. Mondragón
Dec 28 '18 at 18:42

add a comment |

You could specify the dtype as int:

df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)

df.to_csv('output.csv', index=False)

output.csv

data

4

""

EDIT

If you need to specify the data types on a dataframe you already created you can use .astype on the dataframe.
If your dataframe contains NaN values, you should convert to object type:

df['data'] = df['data'].astype(object)

edited Dec 28 '18 at 18:43

answered Dec 28 '18 at 18:07

Edgar R. Mondragón

1,5112719

You could specify the dtype as int:

df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)

df.to_csv('output.csv', index=False)

output.csv

data

4

""

EDIT

If you need to specify the data types on a dataframe you already created you can use .astype on the dataframe.
If your dataframe contains NaN values, you should convert to object type:

df['data'] = df['data'].astype(object)

edited Dec 28 '18 at 18:43

answered Dec 28 '18 at 18:07

Edgar R. Mondragón

1,5112719

edited Dec 28 '18 at 18:43

answered Dec 28 '18 at 18:07

Edgar R. Mondragón

1,5112719

answered Dec 28 '18 at 18:07

Edgar R. Mondragón

1,5112719

answered Dec 28 '18 at 18:07

Edgar R. Mondragón

1,5112719

This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

– Little Child
Dec 28 '18 at 18:10

I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

– Edgar R. Mondragón
Dec 28 '18 at 18:20

I tried astype(int) but NaNs cannot be converted to int

– Little Child
Dec 28 '18 at 18:22

You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

– Edgar R. Mondragón
Dec 28 '18 at 18:42

add a comment |

This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

– Little Child
Dec 28 '18 at 18:10

I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

– Edgar R. Mondragón
Dec 28 '18 at 18:20

I tried astype(int) but NaNs cannot be converted to int

– Little Child
Dec 28 '18 at 18:22

You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

– Edgar R. Mondragón
Dec 28 '18 at 18:42

This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

– Little Child
Dec 28 '18 at 18:10

I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

– Edgar R. Mondragón
Dec 28 '18 at 18:20

I tried astype(int) but NaNs cannot be converted to int

– Little Child
Dec 28 '18 at 18:22

You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

– Edgar R. Mondragón
Dec 28 '18 at 18:42

add a comment |

The function

df.to_csv(s, index=False, header=True)

should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:

>>> df.mask(df.isna(), None)

   data

0     4

1  None

Feel free to replace None above with whatever you like, such as, "".

answered Dec 28 '18 at 18:07

Adrian Tam

1,133112

This does not help when saving to CSV.

– Little Child
Dec 28 '18 at 18:09

yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

– Adrian Tam
Dec 28 '18 at 18:10

Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

– Little Child
Dec 28 '18 at 18:11

add a comment |

The function

df.to_csv(s, index=False, header=True)

should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:

>>> df.mask(df.isna(), None)

   data

0     4

1  None

Feel free to replace None above with whatever you like, such as, "".

answered Dec 28 '18 at 18:07

Adrian Tam

1,133112

This does not help when saving to CSV.

– Little Child
Dec 28 '18 at 18:09

yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

– Adrian Tam
Dec 28 '18 at 18:10

Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

– Little Child
Dec 28 '18 at 18:11

add a comment |

The function

df.to_csv(s, index=False, header=True)

should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:

>>> df.mask(df.isna(), None)

   data

0     4

1  None

Feel free to replace None above with whatever you like, such as, "".

answered Dec 28 '18 at 18:07

Adrian Tam

1,133112

The function

df.to_csv(s, index=False, header=True)

should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:

>>> df.mask(df.isna(), None)

   data

0     4

1  None

Feel free to replace None above with whatever you like, such as, "".

answered Dec 28 '18 at 18:07

Adrian Tam

1,133112

answered Dec 28 '18 at 18:07

Adrian Tam

1,133112

answered Dec 28 '18 at 18:07

Adrian Tam

1,133112

answered Dec 28 '18 at 18:07

Adrian Tam

1,133112

This does not help when saving to CSV.

– Little Child
Dec 28 '18 at 18:09

yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

– Adrian Tam
Dec 28 '18 at 18:10

Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

– Little Child
Dec 28 '18 at 18:11

add a comment |

This does not help when saving to CSV.

– Little Child
Dec 28 '18 at 18:09

yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

– Adrian Tam
Dec 28 '18 at 18:10

Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

– Little Child
Dec 28 '18 at 18:11

This does not help when saving to CSV.

– Little Child
Dec 28 '18 at 18:09

yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

– Adrian Tam
Dec 28 '18 at 18:10

Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

– Little Child
Dec 28 '18 at 18:11

add a comment |

Uncomplicated!

df.data = df.data.fillna(' ')

df.to_csv('anyfilename.csv', sep=',')

answered Dec 28 '18 at 18:16

Kaivalya Kate

add a comment |

Uncomplicated!

df.data = df.data.fillna(' ')

df.to_csv('anyfilename.csv', sep=',')

answered Dec 28 '18 at 18:16

Kaivalya Kate

add a comment |

Uncomplicated!

df.data = df.data.fillna(' ')

df.to_csv('anyfilename.csv', sep=',')

answered Dec 28 '18 at 18:16

Kaivalya Kate

Uncomplicated!

df.data = df.data.fillna(' ')

df.to_csv('anyfilename.csv', sep=',')

answered Dec 28 '18 at 18:16

Kaivalya Kate

answered Dec 28 '18 at 18:16

Kaivalya Kate

answered Dec 28 '18 at 18:16

Kaivalya Kate

answered Dec 28 '18 at 18:16

Kaivalya Kate

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

DaaJlOIt2YpGen,zAVhTlZi

搜尋此網誌

Bdtjtk