Pandas write blank values to CSV












1















I have a DataFrame which looks like the following:



import pandas as pd
import numpy as np
df = pd.DataFrame(data={'data': [4, np.nan]})
df
data
0 4.0
1 NaN


This DataFrame is generated after parsing data from various sources and sometimes I do not have values for some integer fields.



Eventually this data is to be stored in a CSV and I'd like to store this as an integer 4 and a blank. I have tried using replace({numpy.nan: None}) and also replace({numpy.nan: ''}) but that didn't help.



How do I ensure I get an integer 4 and a blank when this gets exported to CSV? The end goal is to use this CSV to load it into Redshift/Postgres which has explicitly defined datatypes.



Update:
How am I writing this file?



with s3fs.open(s3_path, 'wb') as f:
s = StringIO()
df.to_csv(s, index=False, header=True)
f.write(s.getvalue().encode('utf-8'))
s = None









share|improve this question

























  • i did not get any problem in saving the file. I did df.to_csv('mydf.csv', index=False)

    – YOLO
    Dec 28 '18 at 18:00











  • how are you writing the file ?

    – YOLO
    Dec 28 '18 at 18:00











  • @YOLO Updated the question. I am writing this as a file to S3.

    – Little Child
    Dec 28 '18 at 18:02













  • okay, may be you should update the question title, that changes everything. are you using boto3 ?

    – YOLO
    Dec 28 '18 at 18:02













  • @YOLO When I do to_csv, I get 4.0 and a blank.

    – Little Child
    Dec 28 '18 at 18:03
















1















I have a DataFrame which looks like the following:



import pandas as pd
import numpy as np
df = pd.DataFrame(data={'data': [4, np.nan]})
df
data
0 4.0
1 NaN


This DataFrame is generated after parsing data from various sources and sometimes I do not have values for some integer fields.



Eventually this data is to be stored in a CSV and I'd like to store this as an integer 4 and a blank. I have tried using replace({numpy.nan: None}) and also replace({numpy.nan: ''}) but that didn't help.



How do I ensure I get an integer 4 and a blank when this gets exported to CSV? The end goal is to use this CSV to load it into Redshift/Postgres which has explicitly defined datatypes.



Update:
How am I writing this file?



with s3fs.open(s3_path, 'wb') as f:
s = StringIO()
df.to_csv(s, index=False, header=True)
f.write(s.getvalue().encode('utf-8'))
s = None









share|improve this question

























  • i did not get any problem in saving the file. I did df.to_csv('mydf.csv', index=False)

    – YOLO
    Dec 28 '18 at 18:00











  • how are you writing the file ?

    – YOLO
    Dec 28 '18 at 18:00











  • @YOLO Updated the question. I am writing this as a file to S3.

    – Little Child
    Dec 28 '18 at 18:02













  • okay, may be you should update the question title, that changes everything. are you using boto3 ?

    – YOLO
    Dec 28 '18 at 18:02













  • @YOLO When I do to_csv, I get 4.0 and a blank.

    – Little Child
    Dec 28 '18 at 18:03














1












1








1








I have a DataFrame which looks like the following:



import pandas as pd
import numpy as np
df = pd.DataFrame(data={'data': [4, np.nan]})
df
data
0 4.0
1 NaN


This DataFrame is generated after parsing data from various sources and sometimes I do not have values for some integer fields.



Eventually this data is to be stored in a CSV and I'd like to store this as an integer 4 and a blank. I have tried using replace({numpy.nan: None}) and also replace({numpy.nan: ''}) but that didn't help.



How do I ensure I get an integer 4 and a blank when this gets exported to CSV? The end goal is to use this CSV to load it into Redshift/Postgres which has explicitly defined datatypes.



Update:
How am I writing this file?



with s3fs.open(s3_path, 'wb') as f:
s = StringIO()
df.to_csv(s, index=False, header=True)
f.write(s.getvalue().encode('utf-8'))
s = None









share|improve this question
















I have a DataFrame which looks like the following:



import pandas as pd
import numpy as np
df = pd.DataFrame(data={'data': [4, np.nan]})
df
data
0 4.0
1 NaN


This DataFrame is generated after parsing data from various sources and sometimes I do not have values for some integer fields.



Eventually this data is to be stored in a CSV and I'd like to store this as an integer 4 and a blank. I have tried using replace({numpy.nan: None}) and also replace({numpy.nan: ''}) but that didn't help.



How do I ensure I get an integer 4 and a blank when this gets exported to CSV? The end goal is to use this CSV to load it into Redshift/Postgres which has explicitly defined datatypes.



Update:
How am I writing this file?



with s3fs.open(s3_path, 'wb') as f:
s = StringIO()
df.to_csv(s, index=False, header=True)
f.write(s.getvalue().encode('utf-8'))
s = None






python pandas






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 28 '18 at 18:01







Little Child

















asked Dec 28 '18 at 17:57









Little ChildLittle Child

17.3k2396169




17.3k2396169













  • i did not get any problem in saving the file. I did df.to_csv('mydf.csv', index=False)

    – YOLO
    Dec 28 '18 at 18:00











  • how are you writing the file ?

    – YOLO
    Dec 28 '18 at 18:00











  • @YOLO Updated the question. I am writing this as a file to S3.

    – Little Child
    Dec 28 '18 at 18:02













  • okay, may be you should update the question title, that changes everything. are you using boto3 ?

    – YOLO
    Dec 28 '18 at 18:02













  • @YOLO When I do to_csv, I get 4.0 and a blank.

    – Little Child
    Dec 28 '18 at 18:03



















  • i did not get any problem in saving the file. I did df.to_csv('mydf.csv', index=False)

    – YOLO
    Dec 28 '18 at 18:00











  • how are you writing the file ?

    – YOLO
    Dec 28 '18 at 18:00











  • @YOLO Updated the question. I am writing this as a file to S3.

    – Little Child
    Dec 28 '18 at 18:02













  • okay, may be you should update the question title, that changes everything. are you using boto3 ?

    – YOLO
    Dec 28 '18 at 18:02













  • @YOLO When I do to_csv, I get 4.0 and a blank.

    – Little Child
    Dec 28 '18 at 18:03

















i did not get any problem in saving the file. I did df.to_csv('mydf.csv', index=False)

– YOLO
Dec 28 '18 at 18:00





i did not get any problem in saving the file. I did df.to_csv('mydf.csv', index=False)

– YOLO
Dec 28 '18 at 18:00













how are you writing the file ?

– YOLO
Dec 28 '18 at 18:00





how are you writing the file ?

– YOLO
Dec 28 '18 at 18:00













@YOLO Updated the question. I am writing this as a file to S3.

– Little Child
Dec 28 '18 at 18:02







@YOLO Updated the question. I am writing this as a file to S3.

– Little Child
Dec 28 '18 at 18:02















okay, may be you should update the question title, that changes everything. are you using boto3 ?

– YOLO
Dec 28 '18 at 18:02







okay, may be you should update the question title, that changes everything. are you using boto3 ?

– YOLO
Dec 28 '18 at 18:02















@YOLO When I do to_csv, I get 4.0 and a blank.

– Little Child
Dec 28 '18 at 18:03





@YOLO When I do to_csv, I get 4.0 and a blank.

– Little Child
Dec 28 '18 at 18:03












3 Answers
3






active

oldest

votes


















1














You could specify the dtype as int:



df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)
df.to_csv('output.csv', index=False)


output.csv



data
4
""


EDIT



If you need to specify the data types on a dataframe you already created you can use .astype on the dataframe.
If your dataframe contains NaN values, you should convert to object type:



df['data'] = df['data'].astype(object)





share|improve this answer


























  • This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

    – Little Child
    Dec 28 '18 at 18:10











  • I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

    – Edgar R. Mondragón
    Dec 28 '18 at 18:20













  • I tried astype(int) but NaNs cannot be converted to int

    – Little Child
    Dec 28 '18 at 18:22













  • You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

    – Edgar R. Mondragón
    Dec 28 '18 at 18:42





















0














The function



df.to_csv(s, index=False, header=True)


should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:



>>> df.mask(df.isna(), None)
data
0 4
1 None


Feel free to replace None above with whatever you like, such as, "".






share|improve this answer
























  • This does not help when saving to CSV.

    – Little Child
    Dec 28 '18 at 18:09











  • yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

    – Adrian Tam
    Dec 28 '18 at 18:10











  • Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

    – Little Child
    Dec 28 '18 at 18:11



















0














Uncomplicated!



df.data = df.data.fillna(' ')
df.to_csv('anyfilename.csv', sep=',')


Basically what fillna does is fills all the NaN values of the 'data' column of the dataframe by any desired value (in our case " " blank). Then we are using the to_csv method to write the entire dataframe to the desired csv file.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53962485%2fpandas-write-blank-values-to-csv%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    You could specify the dtype as int:



    df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)
    df.to_csv('output.csv', index=False)


    output.csv



    data
    4
    ""


    EDIT



    If you need to specify the data types on a dataframe you already created you can use .astype on the dataframe.
    If your dataframe contains NaN values, you should convert to object type:



    df['data'] = df['data'].astype(object)





    share|improve this answer


























    • This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

      – Little Child
      Dec 28 '18 at 18:10











    • I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

      – Edgar R. Mondragón
      Dec 28 '18 at 18:20













    • I tried astype(int) but NaNs cannot be converted to int

      – Little Child
      Dec 28 '18 at 18:22













    • You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

      – Edgar R. Mondragón
      Dec 28 '18 at 18:42


















    1














    You could specify the dtype as int:



    df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)
    df.to_csv('output.csv', index=False)


    output.csv



    data
    4
    ""


    EDIT



    If you need to specify the data types on a dataframe you already created you can use .astype on the dataframe.
    If your dataframe contains NaN values, you should convert to object type:



    df['data'] = df['data'].astype(object)





    share|improve this answer


























    • This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

      – Little Child
      Dec 28 '18 at 18:10











    • I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

      – Edgar R. Mondragón
      Dec 28 '18 at 18:20













    • I tried astype(int) but NaNs cannot be converted to int

      – Little Child
      Dec 28 '18 at 18:22













    • You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

      – Edgar R. Mondragón
      Dec 28 '18 at 18:42
















    1












    1








    1







    You could specify the dtype as int:



    df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)
    df.to_csv('output.csv', index=False)


    output.csv



    data
    4
    ""


    EDIT



    If you need to specify the data types on a dataframe you already created you can use .astype on the dataframe.
    If your dataframe contains NaN values, you should convert to object type:



    df['data'] = df['data'].astype(object)





    share|improve this answer















    You could specify the dtype as int:



    df = pd.DataFrame(data={'data': [4, np.nan]}, dtype=int)
    df.to_csv('output.csv', index=False)


    output.csv



    data
    4
    ""


    EDIT



    If you need to specify the data types on a dataframe you already created you can use .astype on the dataframe.
    If your dataframe contains NaN values, you should convert to object type:



    df['data'] = df['data'].astype(object)






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 28 '18 at 18:43

























    answered Dec 28 '18 at 18:07









    Edgar R. MondragónEdgar R. Mondragón

    1,5112719




    1,5112719













    • This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

      – Little Child
      Dec 28 '18 at 18:10











    • I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

      – Edgar R. Mondragón
      Dec 28 '18 at 18:20













    • I tried astype(int) but NaNs cannot be converted to int

      – Little Child
      Dec 28 '18 at 18:22













    • You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

      – Edgar R. Mondragón
      Dec 28 '18 at 18:42





















    • This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

      – Little Child
      Dec 28 '18 at 18:10











    • I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

      – Edgar R. Mondragón
      Dec 28 '18 at 18:20













    • I tried astype(int) but NaNs cannot be converted to int

      – Little Child
      Dec 28 '18 at 18:22













    • You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

      – Edgar R. Mondragón
      Dec 28 '18 at 18:42



















    This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

    – Little Child
    Dec 28 '18 at 18:10





    This seems like a good direction to go. I elided the fact that the DataFrame is constructed using DataFrame.from_records and a list of dictionaries. How do I specify a dtype on a dataframe constructed that way?

    – Little Child
    Dec 28 '18 at 18:10













    I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

    – Edgar R. Mondragón
    Dec 28 '18 at 18:20







    I edited my question. AFAIK, you can't specify datatypes when you pass a list of dictionaries to DataFrame.from_records but you can change the dtypes later on the dataframe.

    – Edgar R. Mondragón
    Dec 28 '18 at 18:20















    I tried astype(int) but NaNs cannot be converted to int

    – Little Child
    Dec 28 '18 at 18:22







    I tried astype(int) but NaNs cannot be converted to int

    – Little Child
    Dec 28 '18 at 18:22















    You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

    – Edgar R. Mondragón
    Dec 28 '18 at 18:42







    You can then use astype(object). It's not the healthiest thing to do, but it's the only way to have ints and NaNs in the same column.

    – Edgar R. Mondragón
    Dec 28 '18 at 18:42















    0














    The function



    df.to_csv(s, index=False, header=True)


    should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:



    >>> df.mask(df.isna(), None)
    data
    0 4
    1 None


    Feel free to replace None above with whatever you like, such as, "".






    share|improve this answer
























    • This does not help when saving to CSV.

      – Little Child
      Dec 28 '18 at 18:09











    • yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

      – Adrian Tam
      Dec 28 '18 at 18:10











    • Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

      – Little Child
      Dec 28 '18 at 18:11
















    0














    The function



    df.to_csv(s, index=False, header=True)


    should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:



    >>> df.mask(df.isna(), None)
    data
    0 4
    1 None


    Feel free to replace None above with whatever you like, such as, "".






    share|improve this answer
























    • This does not help when saving to CSV.

      – Little Child
      Dec 28 '18 at 18:09











    • yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

      – Adrian Tam
      Dec 28 '18 at 18:10











    • Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

      – Little Child
      Dec 28 '18 at 18:11














    0












    0








    0







    The function



    df.to_csv(s, index=False, header=True)


    should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:



    >>> df.mask(df.isna(), None)
    data
    0 4
    1 None


    Feel free to replace None above with whatever you like, such as, "".






    share|improve this answer













    The function



    df.to_csv(s, index=False, header=True)


    should just give you the 4 and empty row in CSV as you expected. But if you mean to replace the NaN into blank, this is how I can do it:



    >>> df.mask(df.isna(), None)
    data
    0 4
    1 None


    Feel free to replace None above with whatever you like, such as, "".







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Dec 28 '18 at 18:07









    Adrian TamAdrian Tam

    1,133112




    1,133112













    • This does not help when saving to CSV.

      – Little Child
      Dec 28 '18 at 18:09











    • yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

      – Adrian Tam
      Dec 28 '18 at 18:10











    • Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

      – Little Child
      Dec 28 '18 at 18:11



















    • This does not help when saving to CSV.

      – Little Child
      Dec 28 '18 at 18:09











    • yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

      – Adrian Tam
      Dec 28 '18 at 18:10











    • Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

      – Little Child
      Dec 28 '18 at 18:11

















    This does not help when saving to CSV.

    – Little Child
    Dec 28 '18 at 18:09





    This does not help when saving to CSV.

    – Little Child
    Dec 28 '18 at 18:09













    yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

    – Adrian Tam
    Dec 28 '18 at 18:10





    yes, saving to CSV will be the same either way. I can't remember which version of pandas will not write blank for NaN values. After all, the default placeholder for missing values in pandas is NaN

    – Adrian Tam
    Dec 28 '18 at 18:10













    Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

    – Little Child
    Dec 28 '18 at 18:11





    Well my issue is that I do not want my integer to become a float when I write this dataframe out :)

    – Little Child
    Dec 28 '18 at 18:11











    0














    Uncomplicated!



    df.data = df.data.fillna(' ')
    df.to_csv('anyfilename.csv', sep=',')


    Basically what fillna does is fills all the NaN values of the 'data' column of the dataframe by any desired value (in our case " " blank). Then we are using the to_csv method to write the entire dataframe to the desired csv file.






    share|improve this answer




























      0














      Uncomplicated!



      df.data = df.data.fillna(' ')
      df.to_csv('anyfilename.csv', sep=',')


      Basically what fillna does is fills all the NaN values of the 'data' column of the dataframe by any desired value (in our case " " blank). Then we are using the to_csv method to write the entire dataframe to the desired csv file.






      share|improve this answer


























        0












        0








        0







        Uncomplicated!



        df.data = df.data.fillna(' ')
        df.to_csv('anyfilename.csv', sep=',')


        Basically what fillna does is fills all the NaN values of the 'data' column of the dataframe by any desired value (in our case " " blank). Then we are using the to_csv method to write the entire dataframe to the desired csv file.






        share|improve this answer













        Uncomplicated!



        df.data = df.data.fillna(' ')
        df.to_csv('anyfilename.csv', sep=',')


        Basically what fillna does is fills all the NaN values of the 'data' column of the dataframe by any desired value (in our case " " blank). Then we are using the to_csv method to write the entire dataframe to the desired csv file.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 28 '18 at 18:16









        Kaivalya KateKaivalya Kate

        11




        11






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53962485%2fpandas-write-blank-values-to-csv%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas