Python turn a hash into a dataframe





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







1















I have a hash file looks like this and the data is separated line by line:



Amy:0001:[{'name': 'Amy', 'age': '14', 'grade': '7', 'award': '0'}]
Carl:0024:[{'name': 'Carl', 'age': '12', 'grade': '6', 'award': '2'}, {'name': 'Carl', 'age': '18', 'grade': '12', 'award': '4'}, {'name': 'Carl', 'age': '13', 'grade': '6', 'award': '7'}]


and more ...



I want to have a dataframe that look like this:



           name     age      grade     award
Amy:0001 Amy 14 7 0
Carl:0024 Carl 12 6 2
Carl:0024 Carl 18 12 4
Carl:0024 Carl 13 6 7


I tried to strip the hash line by line



lines = [line.rstrip('n') for line in open("my_file.txt")]









share|improve this question































    1















    I have a hash file looks like this and the data is separated line by line:



    Amy:0001:[{'name': 'Amy', 'age': '14', 'grade': '7', 'award': '0'}]
    Carl:0024:[{'name': 'Carl', 'age': '12', 'grade': '6', 'award': '2'}, {'name': 'Carl', 'age': '18', 'grade': '12', 'award': '4'}, {'name': 'Carl', 'age': '13', 'grade': '6', 'award': '7'}]


    and more ...



    I want to have a dataframe that look like this:



               name     age      grade     award
    Amy:0001 Amy 14 7 0
    Carl:0024 Carl 12 6 2
    Carl:0024 Carl 18 12 4
    Carl:0024 Carl 13 6 7


    I tried to strip the hash line by line



    lines = [line.rstrip('n') for line in open("my_file.txt")]









    share|improve this question



























      1












      1








      1








      I have a hash file looks like this and the data is separated line by line:



      Amy:0001:[{'name': 'Amy', 'age': '14', 'grade': '7', 'award': '0'}]
      Carl:0024:[{'name': 'Carl', 'age': '12', 'grade': '6', 'award': '2'}, {'name': 'Carl', 'age': '18', 'grade': '12', 'award': '4'}, {'name': 'Carl', 'age': '13', 'grade': '6', 'award': '7'}]


      and more ...



      I want to have a dataframe that look like this:



                 name     age      grade     award
      Amy:0001 Amy 14 7 0
      Carl:0024 Carl 12 6 2
      Carl:0024 Carl 18 12 4
      Carl:0024 Carl 13 6 7


      I tried to strip the hash line by line



      lines = [line.rstrip('n') for line in open("my_file.txt")]









      share|improve this question
















      I have a hash file looks like this and the data is separated line by line:



      Amy:0001:[{'name': 'Amy', 'age': '14', 'grade': '7', 'award': '0'}]
      Carl:0024:[{'name': 'Carl', 'age': '12', 'grade': '6', 'award': '2'}, {'name': 'Carl', 'age': '18', 'grade': '12', 'award': '4'}, {'name': 'Carl', 'age': '13', 'grade': '6', 'award': '7'}]


      and more ...



      I want to have a dataframe that look like this:



                 name     age      grade     award
      Amy:0001 Amy 14 7 0
      Carl:0024 Carl 12 6 2
      Carl:0024 Carl 18 12 4
      Carl:0024 Carl 13 6 7


      I tried to strip the hash line by line



      lines = [line.rstrip('n') for line in open("my_file.txt")]






      python pandas dictionary hash






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 4 at 18:28









      Yuca

      3,0792826




      3,0792826










      asked Jan 4 at 18:27









      Matt-powMatt-pow

      155416




      155416
























          2 Answers
          2






          active

          oldest

          votes


















          2














          Start with an empty DataFrame:



          df = pd.DataFrame(columns=['key','name','age','grade','award'])


          Line by line read the hash file into the dataframe:



          import json
          with open(hash_path, 'r') as f:
          for line in f:
          key = ":".join(line.split(":", 2)[:2])
          rows = line.split(":", 2)[-1]
          # json requires double quotes for strings
          rows = json.loads(rows.replace("'",'"'))
          for row in rows:
          row['key'] = key
          df = df.append(pd.Series(row), ignore_index=True)
          # set the 'key' column to the index
          df.set_index('key', inplace=True)





          share|improve this answer

































            1














            Here's a solution using ast.literal_eval which doesn't require explicit line-by-line iteration. You should find it considerably more efficient.



            from io import StringIO
            from ast import literal_eval

            x = """Amy:0001:[{'name': 'Amy', 'age': '14', 'grade': '7', 'award': '0'}]
            Carl:0024:[{'name': 'Carl', 'age': '12', 'grade': '6', 'award': '2'}, {'name': 'Carl', 'age': '18', 'grade': '12', 'award': '4'}, {'name': 'Carl', 'age': '13', 'grade': '6', 'award': '7'}]"""

            df = pd.read_csv(StringIO(x), delimiter='[', header=None, names=['id', 'data'])

            df['id'] = df['id'].str[:-1]
            df['data'] = df['data'].map(lambda x: literal_eval(f'[{x}'))

            lens = df['data'].str.len()

            df = pd.DataFrame({'id': np.repeat(df['id'].values, lens)})
            .join(pd.DataFrame(list(chain.from_iterable(df['data']))))
            .set_index('id')

            print(df)

            age award grade name
            id
            Amy:0001 14 0 7 Amy
            Carl:0024 12 2 6 Carl
            Carl:0024 18 4 12 Carl
            Carl:0024 13 7 6 Carl





            share|improve this answer


























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54044270%2fpython-turn-a-hash-into-a-dataframe%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              2














              Start with an empty DataFrame:



              df = pd.DataFrame(columns=['key','name','age','grade','award'])


              Line by line read the hash file into the dataframe:



              import json
              with open(hash_path, 'r') as f:
              for line in f:
              key = ":".join(line.split(":", 2)[:2])
              rows = line.split(":", 2)[-1]
              # json requires double quotes for strings
              rows = json.loads(rows.replace("'",'"'))
              for row in rows:
              row['key'] = key
              df = df.append(pd.Series(row), ignore_index=True)
              # set the 'key' column to the index
              df.set_index('key', inplace=True)





              share|improve this answer






























                2














                Start with an empty DataFrame:



                df = pd.DataFrame(columns=['key','name','age','grade','award'])


                Line by line read the hash file into the dataframe:



                import json
                with open(hash_path, 'r') as f:
                for line in f:
                key = ":".join(line.split(":", 2)[:2])
                rows = line.split(":", 2)[-1]
                # json requires double quotes for strings
                rows = json.loads(rows.replace("'",'"'))
                for row in rows:
                row['key'] = key
                df = df.append(pd.Series(row), ignore_index=True)
                # set the 'key' column to the index
                df.set_index('key', inplace=True)





                share|improve this answer




























                  2












                  2








                  2







                  Start with an empty DataFrame:



                  df = pd.DataFrame(columns=['key','name','age','grade','award'])


                  Line by line read the hash file into the dataframe:



                  import json
                  with open(hash_path, 'r') as f:
                  for line in f:
                  key = ":".join(line.split(":", 2)[:2])
                  rows = line.split(":", 2)[-1]
                  # json requires double quotes for strings
                  rows = json.loads(rows.replace("'",'"'))
                  for row in rows:
                  row['key'] = key
                  df = df.append(pd.Series(row), ignore_index=True)
                  # set the 'key' column to the index
                  df.set_index('key', inplace=True)





                  share|improve this answer















                  Start with an empty DataFrame:



                  df = pd.DataFrame(columns=['key','name','age','grade','award'])


                  Line by line read the hash file into the dataframe:



                  import json
                  with open(hash_path, 'r') as f:
                  for line in f:
                  key = ":".join(line.split(":", 2)[:2])
                  rows = line.split(":", 2)[-1]
                  # json requires double quotes for strings
                  rows = json.loads(rows.replace("'",'"'))
                  for row in rows:
                  row['key'] = key
                  df = df.append(pd.Series(row), ignore_index=True)
                  # set the 'key' column to the index
                  df.set_index('key', inplace=True)






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Jan 4 at 18:55

























                  answered Jan 4 at 18:49









                  chet-the-wizardchet-the-wizard

                  833514




                  833514

























                      1














                      Here's a solution using ast.literal_eval which doesn't require explicit line-by-line iteration. You should find it considerably more efficient.



                      from io import StringIO
                      from ast import literal_eval

                      x = """Amy:0001:[{'name': 'Amy', 'age': '14', 'grade': '7', 'award': '0'}]
                      Carl:0024:[{'name': 'Carl', 'age': '12', 'grade': '6', 'award': '2'}, {'name': 'Carl', 'age': '18', 'grade': '12', 'award': '4'}, {'name': 'Carl', 'age': '13', 'grade': '6', 'award': '7'}]"""

                      df = pd.read_csv(StringIO(x), delimiter='[', header=None, names=['id', 'data'])

                      df['id'] = df['id'].str[:-1]
                      df['data'] = df['data'].map(lambda x: literal_eval(f'[{x}'))

                      lens = df['data'].str.len()

                      df = pd.DataFrame({'id': np.repeat(df['id'].values, lens)})
                      .join(pd.DataFrame(list(chain.from_iterable(df['data']))))
                      .set_index('id')

                      print(df)

                      age award grade name
                      id
                      Amy:0001 14 0 7 Amy
                      Carl:0024 12 2 6 Carl
                      Carl:0024 18 4 12 Carl
                      Carl:0024 13 7 6 Carl





                      share|improve this answer






























                        1














                        Here's a solution using ast.literal_eval which doesn't require explicit line-by-line iteration. You should find it considerably more efficient.



                        from io import StringIO
                        from ast import literal_eval

                        x = """Amy:0001:[{'name': 'Amy', 'age': '14', 'grade': '7', 'award': '0'}]
                        Carl:0024:[{'name': 'Carl', 'age': '12', 'grade': '6', 'award': '2'}, {'name': 'Carl', 'age': '18', 'grade': '12', 'award': '4'}, {'name': 'Carl', 'age': '13', 'grade': '6', 'award': '7'}]"""

                        df = pd.read_csv(StringIO(x), delimiter='[', header=None, names=['id', 'data'])

                        df['id'] = df['id'].str[:-1]
                        df['data'] = df['data'].map(lambda x: literal_eval(f'[{x}'))

                        lens = df['data'].str.len()

                        df = pd.DataFrame({'id': np.repeat(df['id'].values, lens)})
                        .join(pd.DataFrame(list(chain.from_iterable(df['data']))))
                        .set_index('id')

                        print(df)

                        age award grade name
                        id
                        Amy:0001 14 0 7 Amy
                        Carl:0024 12 2 6 Carl
                        Carl:0024 18 4 12 Carl
                        Carl:0024 13 7 6 Carl





                        share|improve this answer




























                          1












                          1








                          1







                          Here's a solution using ast.literal_eval which doesn't require explicit line-by-line iteration. You should find it considerably more efficient.



                          from io import StringIO
                          from ast import literal_eval

                          x = """Amy:0001:[{'name': 'Amy', 'age': '14', 'grade': '7', 'award': '0'}]
                          Carl:0024:[{'name': 'Carl', 'age': '12', 'grade': '6', 'award': '2'}, {'name': 'Carl', 'age': '18', 'grade': '12', 'award': '4'}, {'name': 'Carl', 'age': '13', 'grade': '6', 'award': '7'}]"""

                          df = pd.read_csv(StringIO(x), delimiter='[', header=None, names=['id', 'data'])

                          df['id'] = df['id'].str[:-1]
                          df['data'] = df['data'].map(lambda x: literal_eval(f'[{x}'))

                          lens = df['data'].str.len()

                          df = pd.DataFrame({'id': np.repeat(df['id'].values, lens)})
                          .join(pd.DataFrame(list(chain.from_iterable(df['data']))))
                          .set_index('id')

                          print(df)

                          age award grade name
                          id
                          Amy:0001 14 0 7 Amy
                          Carl:0024 12 2 6 Carl
                          Carl:0024 18 4 12 Carl
                          Carl:0024 13 7 6 Carl





                          share|improve this answer















                          Here's a solution using ast.literal_eval which doesn't require explicit line-by-line iteration. You should find it considerably more efficient.



                          from io import StringIO
                          from ast import literal_eval

                          x = """Amy:0001:[{'name': 'Amy', 'age': '14', 'grade': '7', 'award': '0'}]
                          Carl:0024:[{'name': 'Carl', 'age': '12', 'grade': '6', 'award': '2'}, {'name': 'Carl', 'age': '18', 'grade': '12', 'award': '4'}, {'name': 'Carl', 'age': '13', 'grade': '6', 'award': '7'}]"""

                          df = pd.read_csv(StringIO(x), delimiter='[', header=None, names=['id', 'data'])

                          df['id'] = df['id'].str[:-1]
                          df['data'] = df['data'].map(lambda x: literal_eval(f'[{x}'))

                          lens = df['data'].str.len()

                          df = pd.DataFrame({'id': np.repeat(df['id'].values, lens)})
                          .join(pd.DataFrame(list(chain.from_iterable(df['data']))))
                          .set_index('id')

                          print(df)

                          age award grade name
                          id
                          Amy:0001 14 0 7 Amy
                          Carl:0024 12 2 6 Carl
                          Carl:0024 18 4 12 Carl
                          Carl:0024 13 7 6 Carl






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Jan 4 at 21:25

























                          answered Jan 4 at 19:07









                          jppjpp

                          103k2167117




                          103k2167117






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54044270%2fpython-turn-a-hash-into-a-dataframe%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Mossoró

                              Error while reading .h5 file using the rhdf5 package in R

                              Pushsharp Apns notification error: 'InvalidToken'