Getting a key error while converting npz to csv format

Multi tool use
Multi tool use












0















I am trying to convert a .npz file to .csv format, but it is giving the following key error
KeyError: '0 is not a file in the archive'



I had a sparse matrix which I converted to .npz format. I then loaded the npz file using np.load(). I tried converting the loaded npz file to csv using np.savetxt() but it gives the following error
KeyError: '0 is not a file in the archive'.



What does this key error mean and how to solve it?



I tried the following code:



DF = np.load("DF_tfidf.npz")

np.savetxt("DF.csv",DF)









share|improve this question

























  • np.load gives you a dictionary like object. The actual arrays are accessed by name, or dictionary key. So it doesn't make sense to simply pass this object to the savetxt function. I suspect you are trying to use these functions without learning what they produce and require.

    – hpaulj
    Jan 2 at 6:32











  • If you have created a scipy sparse matrix, and saved it with save_npz you have added another layer of complexity. While such a file can be read with np.load, you have to understand the save format first. If instead you use load_npz, you get a sparse matrix, just like what you started with. Saving that to a csv text format is different topic. The simplest would be to convert it to a dense array, with toarray(), and write that with savetxt. But if the sparse matrix was at all large, you could end up with a MemoryError.

    – hpaulj
    Jan 2 at 6:37













  • What exactly do you expect the csv to look like?

    – hpaulj
    Jan 2 at 12:26











  • Question has nothing to do with machine-learning - kindly do not spam the tag (removed).

    – desertnaut
    Jan 2 at 15:45
















0















I am trying to convert a .npz file to .csv format, but it is giving the following key error
KeyError: '0 is not a file in the archive'



I had a sparse matrix which I converted to .npz format. I then loaded the npz file using np.load(). I tried converting the loaded npz file to csv using np.savetxt() but it gives the following error
KeyError: '0 is not a file in the archive'.



What does this key error mean and how to solve it?



I tried the following code:



DF = np.load("DF_tfidf.npz")

np.savetxt("DF.csv",DF)









share|improve this question

























  • np.load gives you a dictionary like object. The actual arrays are accessed by name, or dictionary key. So it doesn't make sense to simply pass this object to the savetxt function. I suspect you are trying to use these functions without learning what they produce and require.

    – hpaulj
    Jan 2 at 6:32











  • If you have created a scipy sparse matrix, and saved it with save_npz you have added another layer of complexity. While such a file can be read with np.load, you have to understand the save format first. If instead you use load_npz, you get a sparse matrix, just like what you started with. Saving that to a csv text format is different topic. The simplest would be to convert it to a dense array, with toarray(), and write that with savetxt. But if the sparse matrix was at all large, you could end up with a MemoryError.

    – hpaulj
    Jan 2 at 6:37













  • What exactly do you expect the csv to look like?

    – hpaulj
    Jan 2 at 12:26











  • Question has nothing to do with machine-learning - kindly do not spam the tag (removed).

    – desertnaut
    Jan 2 at 15:45














0












0








0








I am trying to convert a .npz file to .csv format, but it is giving the following key error
KeyError: '0 is not a file in the archive'



I had a sparse matrix which I converted to .npz format. I then loaded the npz file using np.load(). I tried converting the loaded npz file to csv using np.savetxt() but it gives the following error
KeyError: '0 is not a file in the archive'.



What does this key error mean and how to solve it?



I tried the following code:



DF = np.load("DF_tfidf.npz")

np.savetxt("DF.csv",DF)









share|improve this question
















I am trying to convert a .npz file to .csv format, but it is giving the following key error
KeyError: '0 is not a file in the archive'



I had a sparse matrix which I converted to .npz format. I then loaded the npz file using np.load(). I tried converting the loaded npz file to csv using np.savetxt() but it gives the following error
KeyError: '0 is not a file in the archive'.



What does this key error mean and how to solve it?



I tried the following code:



DF = np.load("DF_tfidf.npz")

np.savetxt("DF.csv",DF)






python-3.x csv numpy scipy sparse-matrix






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 2 at 17:49









hpaulj

115k784155




115k784155










asked Jan 2 at 4:11









Hardik BapnaHardik Bapna

114




114













  • np.load gives you a dictionary like object. The actual arrays are accessed by name, or dictionary key. So it doesn't make sense to simply pass this object to the savetxt function. I suspect you are trying to use these functions without learning what they produce and require.

    – hpaulj
    Jan 2 at 6:32











  • If you have created a scipy sparse matrix, and saved it with save_npz you have added another layer of complexity. While such a file can be read with np.load, you have to understand the save format first. If instead you use load_npz, you get a sparse matrix, just like what you started with. Saving that to a csv text format is different topic. The simplest would be to convert it to a dense array, with toarray(), and write that with savetxt. But if the sparse matrix was at all large, you could end up with a MemoryError.

    – hpaulj
    Jan 2 at 6:37













  • What exactly do you expect the csv to look like?

    – hpaulj
    Jan 2 at 12:26











  • Question has nothing to do with machine-learning - kindly do not spam the tag (removed).

    – desertnaut
    Jan 2 at 15:45



















  • np.load gives you a dictionary like object. The actual arrays are accessed by name, or dictionary key. So it doesn't make sense to simply pass this object to the savetxt function. I suspect you are trying to use these functions without learning what they produce and require.

    – hpaulj
    Jan 2 at 6:32











  • If you have created a scipy sparse matrix, and saved it with save_npz you have added another layer of complexity. While such a file can be read with np.load, you have to understand the save format first. If instead you use load_npz, you get a sparse matrix, just like what you started with. Saving that to a csv text format is different topic. The simplest would be to convert it to a dense array, with toarray(), and write that with savetxt. But if the sparse matrix was at all large, you could end up with a MemoryError.

    – hpaulj
    Jan 2 at 6:37













  • What exactly do you expect the csv to look like?

    – hpaulj
    Jan 2 at 12:26











  • Question has nothing to do with machine-learning - kindly do not spam the tag (removed).

    – desertnaut
    Jan 2 at 15:45

















np.load gives you a dictionary like object. The actual arrays are accessed by name, or dictionary key. So it doesn't make sense to simply pass this object to the savetxt function. I suspect you are trying to use these functions without learning what they produce and require.

– hpaulj
Jan 2 at 6:32





np.load gives you a dictionary like object. The actual arrays are accessed by name, or dictionary key. So it doesn't make sense to simply pass this object to the savetxt function. I suspect you are trying to use these functions without learning what they produce and require.

– hpaulj
Jan 2 at 6:32













If you have created a scipy sparse matrix, and saved it with save_npz you have added another layer of complexity. While such a file can be read with np.load, you have to understand the save format first. If instead you use load_npz, you get a sparse matrix, just like what you started with. Saving that to a csv text format is different topic. The simplest would be to convert it to a dense array, with toarray(), and write that with savetxt. But if the sparse matrix was at all large, you could end up with a MemoryError.

– hpaulj
Jan 2 at 6:37







If you have created a scipy sparse matrix, and saved it with save_npz you have added another layer of complexity. While such a file can be read with np.load, you have to understand the save format first. If instead you use load_npz, you get a sparse matrix, just like what you started with. Saving that to a csv text format is different topic. The simplest would be to convert it to a dense array, with toarray(), and write that with savetxt. But if the sparse matrix was at all large, you could end up with a MemoryError.

– hpaulj
Jan 2 at 6:37















What exactly do you expect the csv to look like?

– hpaulj
Jan 2 at 12:26





What exactly do you expect the csv to look like?

– hpaulj
Jan 2 at 12:26













Question has nothing to do with machine-learning - kindly do not spam the tag (removed).

– desertnaut
Jan 2 at 15:45





Question has nothing to do with machine-learning - kindly do not spam the tag (removed).

– desertnaut
Jan 2 at 15:45












2 Answers
2






active

oldest

votes


















2














You cannot convert NPZ file to csv file. First we need to find out what are the files in NPZ File like below



np_Array=np.load('DF_tfidf.npz')
print(np_Array.files)


for example if output is like ['arr_0'] for above print
So you need to extract that array and then convert it to csv like below.



arr=np_Array.files[0]
np.savetxt("DF.csv", np_Array[arr], delimiter=",")





share|improve this answer
























  • This ignores the shape and dtype of arrays. The OP doesn't' understand the content of the npz enough to get meaningful csvs

    – hpaulj
    Jan 2 at 12:25



















0














This isn't a problem of how to convert npz to csv, but how to properly load the data from the npz, and then save that as csv. In general a npz is a file archive that contains several arrays. A csv on the other is a format for saving one 2d array.



You could, in theory, write each file of the npz to its own csv. But if the npz saves some complex object, rather than a random set of array, that's probably not what you want to do. My guess is that you have a scipy.sparse matrix (possibly created in the course of some machine learning project). In that case you should focus on how to write a sparse matrix, or some representation of it, not on converting its npz save.



Let's make a scipy sparse matrix and save it:



In [45]: from scipy import sparse
In [46]: M = sparse.random(4,4,.2,'csr')
In [47]: M
Out[47]:
<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>
In [48]: M.A
Out[48]:
array([[0.30442216, 0. , 0. , 0. ],
[0.29783572, 0. , 0. , 0. ],
[0. , 0. , 0.83881939, 0. ],
[0. , 0. , 0. , 0. ]])
In [49]: sparse.save_npz('sparse.npz',M)


Now load it:



In [50]: sparse.load_npz('sparse.npz')
Out[50]:
<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in Compressed Sparse Row format>


That's the same thing that we saved.



Now look at it with np.load:



In [51]: data = np.load('sparse.npz')
In [52]: list(data.keys())
Out[52]: ['indices', 'indptr', 'format', 'shape', 'data']
In [53]: data['indices']
Out[53]: array([0, 0, 2], dtype=int32)
In [54]: data['indptr']
Out[54]: array([0, 1, 2, 3, 3], dtype=int32)
In [55]: data['format']
Out[55]: array(b'csr', dtype='|S3')
In [56]: data['shape']
Out[56]: array([4, 4])
In [57]: data['data']
Out[57]: array([0.30442216, 0.29783572, 0.83881939])


I can save the dense equivalent of this sparse matrix to a csv with:



In [60]: np.savetxt('sparse.csv', M.A, fmt='%10f',delimiter=',')
In [61]: cat sparse.csv
0.304422, 0.000000, 0.000000, 0.000000
0.297836, 0.000000, 0.000000, 0.000000
0.000000, 0.000000, 0.838819, 0.000000
0.000000, 0.000000, 0.000000, 0.000000


For a small matrix like this that's no problem. But often in machine learning the sparse matrix is very large, and M.A raises a MemoryError.



I suppose one could try to write a 3 column csv with the row,col,data attributes of a coo format matrix, the same sort of numbers we get with:



In [62]: print(M)
(0, 0) 0.3044221604204369
(1, 0) 0.29783571660339536
(2, 2) 0.8388193913095385





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54001104%2fgetting-a-key-error-while-converting-npz-to-csv-format%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    You cannot convert NPZ file to csv file. First we need to find out what are the files in NPZ File like below



    np_Array=np.load('DF_tfidf.npz')
    print(np_Array.files)


    for example if output is like ['arr_0'] for above print
    So you need to extract that array and then convert it to csv like below.



    arr=np_Array.files[0]
    np.savetxt("DF.csv", np_Array[arr], delimiter=",")





    share|improve this answer
























    • This ignores the shape and dtype of arrays. The OP doesn't' understand the content of the npz enough to get meaningful csvs

      – hpaulj
      Jan 2 at 12:25
















    2














    You cannot convert NPZ file to csv file. First we need to find out what are the files in NPZ File like below



    np_Array=np.load('DF_tfidf.npz')
    print(np_Array.files)


    for example if output is like ['arr_0'] for above print
    So you need to extract that array and then convert it to csv like below.



    arr=np_Array.files[0]
    np.savetxt("DF.csv", np_Array[arr], delimiter=",")





    share|improve this answer
























    • This ignores the shape and dtype of arrays. The OP doesn't' understand the content of the npz enough to get meaningful csvs

      – hpaulj
      Jan 2 at 12:25














    2












    2








    2







    You cannot convert NPZ file to csv file. First we need to find out what are the files in NPZ File like below



    np_Array=np.load('DF_tfidf.npz')
    print(np_Array.files)


    for example if output is like ['arr_0'] for above print
    So you need to extract that array and then convert it to csv like below.



    arr=np_Array.files[0]
    np.savetxt("DF.csv", np_Array[arr], delimiter=",")





    share|improve this answer













    You cannot convert NPZ file to csv file. First we need to find out what are the files in NPZ File like below



    np_Array=np.load('DF_tfidf.npz')
    print(np_Array.files)


    for example if output is like ['arr_0'] for above print
    So you need to extract that array and then convert it to csv like below.



    arr=np_Array.files[0]
    np.savetxt("DF.csv", np_Array[arr], delimiter=",")






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Jan 2 at 6:51









    Lakshmi Bhavani - IntelLakshmi Bhavani - Intel

    28717




    28717













    • This ignores the shape and dtype of arrays. The OP doesn't' understand the content of the npz enough to get meaningful csvs

      – hpaulj
      Jan 2 at 12:25



















    • This ignores the shape and dtype of arrays. The OP doesn't' understand the content of the npz enough to get meaningful csvs

      – hpaulj
      Jan 2 at 12:25

















    This ignores the shape and dtype of arrays. The OP doesn't' understand the content of the npz enough to get meaningful csvs

    – hpaulj
    Jan 2 at 12:25





    This ignores the shape and dtype of arrays. The OP doesn't' understand the content of the npz enough to get meaningful csvs

    – hpaulj
    Jan 2 at 12:25













    0














    This isn't a problem of how to convert npz to csv, but how to properly load the data from the npz, and then save that as csv. In general a npz is a file archive that contains several arrays. A csv on the other is a format for saving one 2d array.



    You could, in theory, write each file of the npz to its own csv. But if the npz saves some complex object, rather than a random set of array, that's probably not what you want to do. My guess is that you have a scipy.sparse matrix (possibly created in the course of some machine learning project). In that case you should focus on how to write a sparse matrix, or some representation of it, not on converting its npz save.



    Let's make a scipy sparse matrix and save it:



    In [45]: from scipy import sparse
    In [46]: M = sparse.random(4,4,.2,'csr')
    In [47]: M
    Out[47]:
    <4x4 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>
    In [48]: M.A
    Out[48]:
    array([[0.30442216, 0. , 0. , 0. ],
    [0.29783572, 0. , 0. , 0. ],
    [0. , 0. , 0.83881939, 0. ],
    [0. , 0. , 0. , 0. ]])
    In [49]: sparse.save_npz('sparse.npz',M)


    Now load it:



    In [50]: sparse.load_npz('sparse.npz')
    Out[50]:
    <4x4 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in Compressed Sparse Row format>


    That's the same thing that we saved.



    Now look at it with np.load:



    In [51]: data = np.load('sparse.npz')
    In [52]: list(data.keys())
    Out[52]: ['indices', 'indptr', 'format', 'shape', 'data']
    In [53]: data['indices']
    Out[53]: array([0, 0, 2], dtype=int32)
    In [54]: data['indptr']
    Out[54]: array([0, 1, 2, 3, 3], dtype=int32)
    In [55]: data['format']
    Out[55]: array(b'csr', dtype='|S3')
    In [56]: data['shape']
    Out[56]: array([4, 4])
    In [57]: data['data']
    Out[57]: array([0.30442216, 0.29783572, 0.83881939])


    I can save the dense equivalent of this sparse matrix to a csv with:



    In [60]: np.savetxt('sparse.csv', M.A, fmt='%10f',delimiter=',')
    In [61]: cat sparse.csv
    0.304422, 0.000000, 0.000000, 0.000000
    0.297836, 0.000000, 0.000000, 0.000000
    0.000000, 0.000000, 0.838819, 0.000000
    0.000000, 0.000000, 0.000000, 0.000000


    For a small matrix like this that's no problem. But often in machine learning the sparse matrix is very large, and M.A raises a MemoryError.



    I suppose one could try to write a 3 column csv with the row,col,data attributes of a coo format matrix, the same sort of numbers we get with:



    In [62]: print(M)
    (0, 0) 0.3044221604204369
    (1, 0) 0.29783571660339536
    (2, 2) 0.8388193913095385





    share|improve this answer






























      0














      This isn't a problem of how to convert npz to csv, but how to properly load the data from the npz, and then save that as csv. In general a npz is a file archive that contains several arrays. A csv on the other is a format for saving one 2d array.



      You could, in theory, write each file of the npz to its own csv. But if the npz saves some complex object, rather than a random set of array, that's probably not what you want to do. My guess is that you have a scipy.sparse matrix (possibly created in the course of some machine learning project). In that case you should focus on how to write a sparse matrix, or some representation of it, not on converting its npz save.



      Let's make a scipy sparse matrix and save it:



      In [45]: from scipy import sparse
      In [46]: M = sparse.random(4,4,.2,'csr')
      In [47]: M
      Out[47]:
      <4x4 sparse matrix of type '<class 'numpy.float64'>'
      with 3 stored elements in Compressed Sparse Row format>
      In [48]: M.A
      Out[48]:
      array([[0.30442216, 0. , 0. , 0. ],
      [0.29783572, 0. , 0. , 0. ],
      [0. , 0. , 0.83881939, 0. ],
      [0. , 0. , 0. , 0. ]])
      In [49]: sparse.save_npz('sparse.npz',M)


      Now load it:



      In [50]: sparse.load_npz('sparse.npz')
      Out[50]:
      <4x4 sparse matrix of type '<class 'numpy.float64'>'
      with 3 stored elements in Compressed Sparse Row format>


      That's the same thing that we saved.



      Now look at it with np.load:



      In [51]: data = np.load('sparse.npz')
      In [52]: list(data.keys())
      Out[52]: ['indices', 'indptr', 'format', 'shape', 'data']
      In [53]: data['indices']
      Out[53]: array([0, 0, 2], dtype=int32)
      In [54]: data['indptr']
      Out[54]: array([0, 1, 2, 3, 3], dtype=int32)
      In [55]: data['format']
      Out[55]: array(b'csr', dtype='|S3')
      In [56]: data['shape']
      Out[56]: array([4, 4])
      In [57]: data['data']
      Out[57]: array([0.30442216, 0.29783572, 0.83881939])


      I can save the dense equivalent of this sparse matrix to a csv with:



      In [60]: np.savetxt('sparse.csv', M.A, fmt='%10f',delimiter=',')
      In [61]: cat sparse.csv
      0.304422, 0.000000, 0.000000, 0.000000
      0.297836, 0.000000, 0.000000, 0.000000
      0.000000, 0.000000, 0.838819, 0.000000
      0.000000, 0.000000, 0.000000, 0.000000


      For a small matrix like this that's no problem. But often in machine learning the sparse matrix is very large, and M.A raises a MemoryError.



      I suppose one could try to write a 3 column csv with the row,col,data attributes of a coo format matrix, the same sort of numbers we get with:



      In [62]: print(M)
      (0, 0) 0.3044221604204369
      (1, 0) 0.29783571660339536
      (2, 2) 0.8388193913095385





      share|improve this answer




























        0












        0








        0







        This isn't a problem of how to convert npz to csv, but how to properly load the data from the npz, and then save that as csv. In general a npz is a file archive that contains several arrays. A csv on the other is a format for saving one 2d array.



        You could, in theory, write each file of the npz to its own csv. But if the npz saves some complex object, rather than a random set of array, that's probably not what you want to do. My guess is that you have a scipy.sparse matrix (possibly created in the course of some machine learning project). In that case you should focus on how to write a sparse matrix, or some representation of it, not on converting its npz save.



        Let's make a scipy sparse matrix and save it:



        In [45]: from scipy import sparse
        In [46]: M = sparse.random(4,4,.2,'csr')
        In [47]: M
        Out[47]:
        <4x4 sparse matrix of type '<class 'numpy.float64'>'
        with 3 stored elements in Compressed Sparse Row format>
        In [48]: M.A
        Out[48]:
        array([[0.30442216, 0. , 0. , 0. ],
        [0.29783572, 0. , 0. , 0. ],
        [0. , 0. , 0.83881939, 0. ],
        [0. , 0. , 0. , 0. ]])
        In [49]: sparse.save_npz('sparse.npz',M)


        Now load it:



        In [50]: sparse.load_npz('sparse.npz')
        Out[50]:
        <4x4 sparse matrix of type '<class 'numpy.float64'>'
        with 3 stored elements in Compressed Sparse Row format>


        That's the same thing that we saved.



        Now look at it with np.load:



        In [51]: data = np.load('sparse.npz')
        In [52]: list(data.keys())
        Out[52]: ['indices', 'indptr', 'format', 'shape', 'data']
        In [53]: data['indices']
        Out[53]: array([0, 0, 2], dtype=int32)
        In [54]: data['indptr']
        Out[54]: array([0, 1, 2, 3, 3], dtype=int32)
        In [55]: data['format']
        Out[55]: array(b'csr', dtype='|S3')
        In [56]: data['shape']
        Out[56]: array([4, 4])
        In [57]: data['data']
        Out[57]: array([0.30442216, 0.29783572, 0.83881939])


        I can save the dense equivalent of this sparse matrix to a csv with:



        In [60]: np.savetxt('sparse.csv', M.A, fmt='%10f',delimiter=',')
        In [61]: cat sparse.csv
        0.304422, 0.000000, 0.000000, 0.000000
        0.297836, 0.000000, 0.000000, 0.000000
        0.000000, 0.000000, 0.838819, 0.000000
        0.000000, 0.000000, 0.000000, 0.000000


        For a small matrix like this that's no problem. But often in machine learning the sparse matrix is very large, and M.A raises a MemoryError.



        I suppose one could try to write a 3 column csv with the row,col,data attributes of a coo format matrix, the same sort of numbers we get with:



        In [62]: print(M)
        (0, 0) 0.3044221604204369
        (1, 0) 0.29783571660339536
        (2, 2) 0.8388193913095385





        share|improve this answer















        This isn't a problem of how to convert npz to csv, but how to properly load the data from the npz, and then save that as csv. In general a npz is a file archive that contains several arrays. A csv on the other is a format for saving one 2d array.



        You could, in theory, write each file of the npz to its own csv. But if the npz saves some complex object, rather than a random set of array, that's probably not what you want to do. My guess is that you have a scipy.sparse matrix (possibly created in the course of some machine learning project). In that case you should focus on how to write a sparse matrix, or some representation of it, not on converting its npz save.



        Let's make a scipy sparse matrix and save it:



        In [45]: from scipy import sparse
        In [46]: M = sparse.random(4,4,.2,'csr')
        In [47]: M
        Out[47]:
        <4x4 sparse matrix of type '<class 'numpy.float64'>'
        with 3 stored elements in Compressed Sparse Row format>
        In [48]: M.A
        Out[48]:
        array([[0.30442216, 0. , 0. , 0. ],
        [0.29783572, 0. , 0. , 0. ],
        [0. , 0. , 0.83881939, 0. ],
        [0. , 0. , 0. , 0. ]])
        In [49]: sparse.save_npz('sparse.npz',M)


        Now load it:



        In [50]: sparse.load_npz('sparse.npz')
        Out[50]:
        <4x4 sparse matrix of type '<class 'numpy.float64'>'
        with 3 stored elements in Compressed Sparse Row format>


        That's the same thing that we saved.



        Now look at it with np.load:



        In [51]: data = np.load('sparse.npz')
        In [52]: list(data.keys())
        Out[52]: ['indices', 'indptr', 'format', 'shape', 'data']
        In [53]: data['indices']
        Out[53]: array([0, 0, 2], dtype=int32)
        In [54]: data['indptr']
        Out[54]: array([0, 1, 2, 3, 3], dtype=int32)
        In [55]: data['format']
        Out[55]: array(b'csr', dtype='|S3')
        In [56]: data['shape']
        Out[56]: array([4, 4])
        In [57]: data['data']
        Out[57]: array([0.30442216, 0.29783572, 0.83881939])


        I can save the dense equivalent of this sparse matrix to a csv with:



        In [60]: np.savetxt('sparse.csv', M.A, fmt='%10f',delimiter=',')
        In [61]: cat sparse.csv
        0.304422, 0.000000, 0.000000, 0.000000
        0.297836, 0.000000, 0.000000, 0.000000
        0.000000, 0.000000, 0.838819, 0.000000
        0.000000, 0.000000, 0.000000, 0.000000


        For a small matrix like this that's no problem. But often in machine learning the sparse matrix is very large, and M.A raises a MemoryError.



        I suppose one could try to write a 3 column csv with the row,col,data attributes of a coo format matrix, the same sort of numbers we get with:



        In [62]: print(M)
        (0, 0) 0.3044221604204369
        (1, 0) 0.29783571660339536
        (2, 2) 0.8388193913095385






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Jan 2 at 17:57

























        answered Jan 2 at 17:31









        hpauljhpaulj

        115k784155




        115k784155






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54001104%2fgetting-a-key-error-while-converting-npz-to-csv-format%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            WAvYa8U b lLe Jl,QjQPoOs1AEdN vnQ sCrzlrvqg7l UisKHasACnpcpUV1DFksuyj7yVUqfjv
            pjR,t4MZqDr8mRNKqPAoL xoFqVVHSjiE Skyzq5 KHf,RwKHdyoNptlt8RM6cwIwUMc,bp oXq5O uqPU39 rSQYQApBDkGqy

            Popular posts from this blog

            Monofisismo

            Angular Downloading a file using contenturl with Basic Authentication

            Olmecas