Pandas write blank values to CSV
I have a DataFrame which looks like the following:
import pandas as pd
import numpy as np
df = pd.DataFrame(data={'data': [4, np.nan]})
df
data
0 4.0
1 NaN
This DataFrame is generated after parsing data from various sources and sometimes I do not have values for some integer fields.
Eventually this data is to be stored in a CSV and I'd like to store this as an integer 4 and a blank. I have tried using replace({numpy.nan: None})
and also replace({numpy.nan: ''})
but that didn't help.
How do I ensure I get an integer 4 and a blank when this gets exported to CSV? The end goal is to use this CSV to load it into Redshift/Postgres which has explicitly defined datatypes.
Update:
How am I writing this file?
with s3fs.open(s3_path, 'wb') as f:
s = StringIO()
df.to_csv(s, index=False, header=True)
f.write(s.getvalue().encode('utf-8'))
s = None
python pandas
add a comment |
I have a DataFrame which looks like the following:
import pandas as pd
import numpy as np
df = pd.DataFrame(data={'data': [4, np.nan]})
df
data
0 4.0
1 NaN
This DataFrame is generated after parsing data from various sources and sometimes I do not have values for some integer fields.
Eventually this data is to be stored in a CSV and I'd like to store this as an integer 4 and a blank. I have tried using replace({numpy.nan: None})
and also replace({numpy.nan: ''})
but that didn't help.
How do I ensure I get an integer 4 and a blank when this gets exported to CSV? The end goal is to use this CSV to load it into Redshift/Postgres which has explicitly defined datatypes.
Update:
How am I writing this file?
with s3fs.open(s3_path, 'wb') as f:
s = StringIO()
df.to_csv(s, index=False, header=True)
f.write(s.getvalue().encode('utf-8'))
s = None
python pandas
i did not get any problem in saving the file. I diddf.to_csv('mydf.csv', index=False)
– YOLO
Dec 28 '18 at 18:00
how are you writing the file ?
– YOLO
Dec 28 '18 at 18:00
@YOLO Updated the question. I am writing this as a file to S3.
– Little Child
Dec 28 '18 at 18:02
okay, may be you should update the question title, that changes everything. are you using boto3 ?
– YOLO
Dec 28 '18 at 18:02
@YOLO When I doto_csv
, I get 4.0 and a blank.
– Little Child
Dec 28 '18 at 18:03
add a comment |
I have a DataFrame which looks like the following:
import pandas as pd
import numpy as np
df = pd.DataFrame(data={'data': [4, np.nan]})
df
data
0 4.0
1 NaN
This DataFrame is generated after parsing data from various sources and sometimes I do not have values for some integer fields.
Eventually this data is to be stored in a CSV and I'd like to store this as an integer 4 and a blank. I have tried using replace({numpy.nan: None})
and also replace({numpy.nan: ''})
but that didn't help.
How do I ensure I get an integer 4 and a blank when this gets exported to CSV? The end goal is to use this CSV to load it into Redshift/Postgres which has explicitly defined datatypes.
Update:
How am I writing this file?
with s3fs.open(s3_path, 'wb') as f:
s = StringIO()
df.to_csv(s, index=False, header=True)
f.write(s.getvalue().encode('utf-8'))
s = None
python pandas
I have a DataFrame which looks like the following:
import pandas as pd
import numpy as np
df = pd.DataFrame(data={'data': [4, np.nan]})
df
data
0 4.0
1 NaN
This DataFrame is generated after parsing data from various sources and sometimes I do not have values for some integer fields.
Eventually this data is to be stored in a CSV and I'd like to store this as an integer 4 and a blank. I have tried using replace({numpy.nan: None})
and also replace({numpy.nan: ''})
but that didn't help.
How do I ensure I get an integer 4 and a blank when this gets exported to CSV? The end goal is to use this CSV to load it into Redshift/Postgres which has explicitly defined datatypes.
Update:
How am I writing this file?
with s3fs.open(s3_path, 'wb') as f:
s = StringIO()
df.to_csv(s, index=False, header=True)
f.write(s.getvalue().encode('utf-8'))
s = None
python pandas
python pandas
edited Dec 28 '18 at 18:01
Little Child
asked Dec 28 '18 at 17:57
Little ChildLittle Child
17.3k2396169
17.3k2396169
i did not get any problem in saving the file. I diddf.to_csv('mydf.csv', index=False)
– YOLO
Dec 28 '18 at 18:00
how are you writing the file ?
– YOLO
Dec 28 '18 at 18:00
@YOLO Updated the question. I am writing this as a file to S3.
– Little Child
Dec 28 '18 at 18:02
okay, may be you should update the question title, that changes everything. are you using boto3 ?
– YOLO
Dec 28 '18 at 18:02
@YOLO When I doto_csv
, I get 4.0 and a blank.
– Little Child
Dec 28 '18 at 18:03
add a comment |
i did not get any problem in saving the file. I diddf.to_csv('mydf.csv', index=False)
– YOLO
Dec 28 '18 at 18:00
how are you writing the file ?
– YOLO
Dec 28 '18 at 18:00
@YOLO Updated the question. I am writing this as a file to S3.
– Little Child
Dec 28 '18 at 18:02
okay, may be you should update the question title, that changes everything. are you using boto3 ?
– YOLO
Dec 28 '18 at 18:02
@YOLO When I doto_csv
, I get 4.0 and a blank.
– Little Child
Dec 28 '18 at 18:03
i did not get any problem in saving the file. I did
df.to_csv('mydf.csv', index=False)
– YOLO
Dec 28 '18 at 18:00
i did not get any problem in saving the file. I did
df.to_csv('mydf.csv', index=False)
– YOLO
Dec 28 '18 at 18:00
how are you writing the file ?
– YOLO
Dec 28 '18 at 18:00
how are you writing the file ?
– YOLO
Dec 28 '18 at 18:00
@YOLO Updated the question. I am writing this as a file to S3.
– Little Child
Dec 28 '18 at 18:02
@YOLO Updated the question. I am writing this as a file to S3.
– Little Child
Dec 28 '18 at 18:02
okay, may be you should update the question title, that changes everything. are you using boto3 ?
– YOLO
Dec 28 '18 at 18:02
okay, may be you should update the question title, that changes everything. are you using boto3 ?
– YOLO
Dec 28 '18 at 18:02
@YOLO When I do
to_csv
, I get 4.0 and a blank.– Little Child
Dec 28 '18 at 18:03
@YOLO When I do
to_csv
, I get 4.0 and a blank.– Little Child
Dec 28 '18 at 18:03
add a comment |
3 Answers
3
active
oldest
votes
You could specify the dtype
as int
:
df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)
df.to_csv('output.csv', index=False)
output.csv
data
4
""
EDIT
If you need to specify the data types on a dataframe you already created you can use .astype
on the dataframe.
If your dataframe contains NaN values, you should convert to object
type:
df['data'] = df['data'].astype(object)
This seems like a good direction to go. I elided the fact that the DataFrame is constructed usingDataFrame.from_records
and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?
– Little Child
Dec 28 '18 at 18:10
I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries toDataFrame.from_records
but you can change the dtypes later on the dataframe.
– Edgar R. Mondragón
Dec 28 '18 at 18:20
I triedastype(int)
but NaNs cannot be converted to int
– Little Child
Dec 28 '18 at 18:22
You can then useastype(object)
. It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.
– Edgar R. Mondragón
Dec 28 '18 at 18:42
add a comment |
The function
df.to_csv(s, index=False, header=True)
should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:
>>> df.mask(df.isna(), None)
data
0 4
1 None
Feel free to replace None
above with whatever you like, such as, ""
.
This does not help when saving to CSV.
– Little Child
Dec 28 '18 at 18:09
yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN
– Adrian Tam
Dec 28 '18 at 18:10
Well my issue is that I do not want my integer to become a float when I write this dataframe out :)
– Little Child
Dec 28 '18 at 18:11
add a comment |
Uncomplicated!
df.data = df.data.fillna(' ')
df.to_csv('anyfilename.csv', sep=',')
Basically what fillna does is fills all the NaN values of the 'data' column of the dataframe by any desired value (in our case " " blank). Then we are using the to_csv method to write the entire dataframe to the desired csv file.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53962485%2fpandas-write-blank-values-to-csv%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
You could specify the dtype
as int
:
df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)
df.to_csv('output.csv', index=False)
output.csv
data
4
""
EDIT
If you need to specify the data types on a dataframe you already created you can use .astype
on the dataframe.
If your dataframe contains NaN values, you should convert to object
type:
df['data'] = df['data'].astype(object)
This seems like a good direction to go. I elided the fact that the DataFrame is constructed usingDataFrame.from_records
and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?
– Little Child
Dec 28 '18 at 18:10
I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries toDataFrame.from_records
but you can change the dtypes later on the dataframe.
– Edgar R. Mondragón
Dec 28 '18 at 18:20
I triedastype(int)
but NaNs cannot be converted to int
– Little Child
Dec 28 '18 at 18:22
You can then useastype(object)
. It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.
– Edgar R. Mondragón
Dec 28 '18 at 18:42
add a comment |
You could specify the dtype
as int
:
df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)
df.to_csv('output.csv', index=False)
output.csv
data
4
""
EDIT
If you need to specify the data types on a dataframe you already created you can use .astype
on the dataframe.
If your dataframe contains NaN values, you should convert to object
type:
df['data'] = df['data'].astype(object)
This seems like a good direction to go. I elided the fact that the DataFrame is constructed usingDataFrame.from_records
and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?
– Little Child
Dec 28 '18 at 18:10
I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries toDataFrame.from_records
but you can change the dtypes later on the dataframe.
– Edgar R. Mondragón
Dec 28 '18 at 18:20
I triedastype(int)
but NaNs cannot be converted to int
– Little Child
Dec 28 '18 at 18:22
You can then useastype(object)
. It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.
– Edgar R. Mondragón
Dec 28 '18 at 18:42
add a comment |
You could specify the dtype
as int
:
df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)
df.to_csv('output.csv', index=False)
output.csv
data
4
""
EDIT
If you need to specify the data types on a dataframe you already created you can use .astype
on the dataframe.
If your dataframe contains NaN values, you should convert to object
type:
df['data'] = df['data'].astype(object)
You could specify the dtype
as int
:
df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)
df.to_csv('output.csv', index=False)
output.csv
data
4
""
EDIT
If you need to specify the data types on a dataframe you already created you can use .astype
on the dataframe.
If your dataframe contains NaN values, you should convert to object
type:
df['data'] = df['data'].astype(object)
edited Dec 28 '18 at 18:43
answered Dec 28 '18 at 18:07
Edgar R. MondragónEdgar R. Mondragón
1,5112719
1,5112719
This seems like a good direction to go. I elided the fact that the DataFrame is constructed usingDataFrame.from_records
and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?
– Little Child
Dec 28 '18 at 18:10
I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries toDataFrame.from_records
but you can change the dtypes later on the dataframe.
– Edgar R. Mondragón
Dec 28 '18 at 18:20
I triedastype(int)
but NaNs cannot be converted to int
– Little Child
Dec 28 '18 at 18:22
You can then useastype(object)
. It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.
– Edgar R. Mondragón
Dec 28 '18 at 18:42
add a comment |
This seems like a good direction to go. I elided the fact that the DataFrame is constructed usingDataFrame.from_records
and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?
– Little Child
Dec 28 '18 at 18:10
I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries toDataFrame.from_records
but you can change the dtypes later on the dataframe.
– Edgar R. Mondragón
Dec 28 '18 at 18:20
I triedastype(int)
but NaNs cannot be converted to int
– Little Child
Dec 28 '18 at 18:22
You can then useastype(object)
. It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.
– Edgar R. Mondragón
Dec 28 '18 at 18:42
This seems like a good direction to go. I elided the fact that the DataFrame is constructed using
DataFrame.from_records
and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?– Little Child
Dec 28 '18 at 18:10
This seems like a good direction to go. I elided the fact that the DataFrame is constructed using
DataFrame.from_records
and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?– Little Child
Dec 28 '18 at 18:10
I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to
DataFrame.from_records
but you can change the dtypes later on the dataframe.– Edgar R. Mondragón
Dec 28 '18 at 18:20
I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to
DataFrame.from_records
but you can change the dtypes later on the dataframe.– Edgar R. Mondragón
Dec 28 '18 at 18:20
I tried
astype(int)
but NaNs cannot be converted to int– Little Child
Dec 28 '18 at 18:22
I tried
astype(int)
but NaNs cannot be converted to int– Little Child
Dec 28 '18 at 18:22
You can then use
astype(object)
. It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.– Edgar R. Mondragón
Dec 28 '18 at 18:42
You can then use
astype(object)
. It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.– Edgar R. Mondragón
Dec 28 '18 at 18:42
add a comment |
The function
df.to_csv(s, index=False, header=True)
should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:
>>> df.mask(df.isna(), None)
data
0 4
1 None
Feel free to replace None
above with whatever you like, such as, ""
.
This does not help when saving to CSV.
– Little Child
Dec 28 '18 at 18:09
yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN
– Adrian Tam
Dec 28 '18 at 18:10
Well my issue is that I do not want my integer to become a float when I write this dataframe out :)
– Little Child
Dec 28 '18 at 18:11
add a comment |
The function
df.to_csv(s, index=False, header=True)
should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:
>>> df.mask(df.isna(), None)
data
0 4
1 None
Feel free to replace None
above with whatever you like, such as, ""
.
This does not help when saving to CSV.
– Little Child
Dec 28 '18 at 18:09
yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN
– Adrian Tam
Dec 28 '18 at 18:10
Well my issue is that I do not want my integer to become a float when I write this dataframe out :)
– Little Child
Dec 28 '18 at 18:11
add a comment |
The function
df.to_csv(s, index=False, header=True)
should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:
>>> df.mask(df.isna(), None)
data
0 4
1 None
Feel free to replace None
above with whatever you like, such as, ""
.
The function
df.to_csv(s, index=False, header=True)
should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:
>>> df.mask(df.isna(), None)
data
0 4
1 None
Feel free to replace None
above with whatever you like, such as, ""
.
answered Dec 28 '18 at 18:07
Adrian TamAdrian Tam
1,133112
1,133112
This does not help when saving to CSV.
– Little Child
Dec 28 '18 at 18:09
yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN
– Adrian Tam
Dec 28 '18 at 18:10
Well my issue is that I do not want my integer to become a float when I write this dataframe out :)
– Little Child
Dec 28 '18 at 18:11
add a comment |
This does not help when saving to CSV.
– Little Child
Dec 28 '18 at 18:09
yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN
– Adrian Tam
Dec 28 '18 at 18:10
Well my issue is that I do not want my integer to become a float when I write this dataframe out :)
– Little Child
Dec 28 '18 at 18:11
This does not help when saving to CSV.
– Little Child
Dec 28 '18 at 18:09
This does not help when saving to CSV.
– Little Child
Dec 28 '18 at 18:09
yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN
– Adrian Tam
Dec 28 '18 at 18:10
yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN
– Adrian Tam
Dec 28 '18 at 18:10
Well my issue is that I do not want my integer to become a float when I write this dataframe out :)
– Little Child
Dec 28 '18 at 18:11
Well my issue is that I do not want my integer to become a float when I write this dataframe out :)
– Little Child
Dec 28 '18 at 18:11
add a comment |
Uncomplicated!
df.data = df.data.fillna(' ')
df.to_csv('anyfilename.csv', sep=',')
Basically what fillna does is fills all the NaN values of the 'data' column of the dataframe by any desired value (in our case " " blank). Then we are using the to_csv method to write the entire dataframe to the desired csv file.
add a comment |
Uncomplicated!
df.data = df.data.fillna(' ')
df.to_csv('anyfilename.csv', sep=',')
Basically what fillna does is fills all the NaN values of the 'data' column of the dataframe by any desired value (in our case " " blank). Then we are using the to_csv method to write the entire dataframe to the desired csv file.
add a comment |
Uncomplicated!
df.data = df.data.fillna(' ')
df.to_csv('anyfilename.csv', sep=',')
Basically what fillna does is fills all the NaN values of the 'data' column of the dataframe by any desired value (in our case " " blank). Then we are using the to_csv method to write the entire dataframe to the desired csv file.
Uncomplicated!
df.data = df.data.fillna(' ')
df.to_csv('anyfilename.csv', sep=',')
Basically what fillna does is fills all the NaN values of the 'data' column of the dataframe by any desired value (in our case " " blank). Then we are using the to_csv method to write the entire dataframe to the desired csv file.
answered Dec 28 '18 at 18:16
Kaivalya KateKaivalya Kate
11
11
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53962485%2fpandas-write-blank-values-to-csv%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
i did not get any problem in saving the file. I did
df.to_csv('mydf.csv', index=False)
– YOLO
Dec 28 '18 at 18:00
how are you writing the file ?
– YOLO
Dec 28 '18 at 18:00
@YOLO Updated the question. I am writing this as a file to S3.
– Little Child
Dec 28 '18 at 18:02
okay, may be you should update the question title, that changes everything. are you using boto3 ?
– YOLO
Dec 28 '18 at 18:02
@YOLO When I do
to_csv
, I get 4.0 and a blank.– Little Child
Dec 28 '18 at 18:03