How can I extract tuples from a string?












4















I have the following string:



r"(A1,B1,C1,D1),(A2,B2,C2,D2),..."



and I want to extract a list of tuples



[(A1,B1,C1,D1),(A2,B2,C2,D2),...]



A, B and D are integers, while C is a string enclosed in single quotes. The hard part is given by the fact that C might contain any character, included escaped single quotes ('), commas (,), escaped backslashes (\) and integers. I am trying to solve this problem using regexes, but I can't figure out how to do it.



So far, I've tried to match the end of the string by looking at the first single quote which is preceded by an even number of backslashes (0, 2, 4, ...), but I can't make it working. Any idea?



Expected results:





  • r"(21,3,'abc',57',1993)" --> (21,3,'abc',57',1993)


  • r"(21,3,'abc\',1993)" --> (21,3,'abc\',1993)


  • r"(21,3,'abc\\',57\\',1993)" --> (21,3,'abc\\',57\\',1993)










share|improve this question


















  • 1





    You should look into ast.literal_eval.

    – Scott Hunter
    Dec 31 '18 at 0:00











  • Try re.findall(r"""((d+),(d+),('[^'\]*(?:\.[^'\]*)*'),(d+))""", s), see regex101.com/r/3DMXyZ/1 and ideone.com/DlP6we

    – Wiktor Stribiżew
    Dec 31 '18 at 0:09













  • What is the source of these strings?

    – juanpa.arrivillaga
    Dec 31 '18 at 2:24
















4















I have the following string:



r"(A1,B1,C1,D1),(A2,B2,C2,D2),..."



and I want to extract a list of tuples



[(A1,B1,C1,D1),(A2,B2,C2,D2),...]



A, B and D are integers, while C is a string enclosed in single quotes. The hard part is given by the fact that C might contain any character, included escaped single quotes ('), commas (,), escaped backslashes (\) and integers. I am trying to solve this problem using regexes, but I can't figure out how to do it.



So far, I've tried to match the end of the string by looking at the first single quote which is preceded by an even number of backslashes (0, 2, 4, ...), but I can't make it working. Any idea?



Expected results:





  • r"(21,3,'abc',57',1993)" --> (21,3,'abc',57',1993)


  • r"(21,3,'abc\',1993)" --> (21,3,'abc\',1993)


  • r"(21,3,'abc\\',57\\',1993)" --> (21,3,'abc\\',57\\',1993)










share|improve this question


















  • 1





    You should look into ast.literal_eval.

    – Scott Hunter
    Dec 31 '18 at 0:00











  • Try re.findall(r"""((d+),(d+),('[^'\]*(?:\.[^'\]*)*'),(d+))""", s), see regex101.com/r/3DMXyZ/1 and ideone.com/DlP6we

    – Wiktor Stribiżew
    Dec 31 '18 at 0:09













  • What is the source of these strings?

    – juanpa.arrivillaga
    Dec 31 '18 at 2:24














4












4








4








I have the following string:



r"(A1,B1,C1,D1),(A2,B2,C2,D2),..."



and I want to extract a list of tuples



[(A1,B1,C1,D1),(A2,B2,C2,D2),...]



A, B and D are integers, while C is a string enclosed in single quotes. The hard part is given by the fact that C might contain any character, included escaped single quotes ('), commas (,), escaped backslashes (\) and integers. I am trying to solve this problem using regexes, but I can't figure out how to do it.



So far, I've tried to match the end of the string by looking at the first single quote which is preceded by an even number of backslashes (0, 2, 4, ...), but I can't make it working. Any idea?



Expected results:





  • r"(21,3,'abc',57',1993)" --> (21,3,'abc',57',1993)


  • r"(21,3,'abc\',1993)" --> (21,3,'abc\',1993)


  • r"(21,3,'abc\\',57\\',1993)" --> (21,3,'abc\\',57\\',1993)










share|improve this question














I have the following string:



r"(A1,B1,C1,D1),(A2,B2,C2,D2),..."



and I want to extract a list of tuples



[(A1,B1,C1,D1),(A2,B2,C2,D2),...]



A, B and D are integers, while C is a string enclosed in single quotes. The hard part is given by the fact that C might contain any character, included escaped single quotes ('), commas (,), escaped backslashes (\) and integers. I am trying to solve this problem using regexes, but I can't figure out how to do it.



So far, I've tried to match the end of the string by looking at the first single quote which is preceded by an even number of backslashes (0, 2, 4, ...), but I can't make it working. Any idea?



Expected results:





  • r"(21,3,'abc',57',1993)" --> (21,3,'abc',57',1993)


  • r"(21,3,'abc\',1993)" --> (21,3,'abc\',1993)


  • r"(21,3,'abc\\',57\\',1993)" --> (21,3,'abc\\',57\\',1993)







python regex






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Dec 30 '18 at 23:52









Riccardo BuccoRiccardo Bucco

1267




1267








  • 1





    You should look into ast.literal_eval.

    – Scott Hunter
    Dec 31 '18 at 0:00











  • Try re.findall(r"""((d+),(d+),('[^'\]*(?:\.[^'\]*)*'),(d+))""", s), see regex101.com/r/3DMXyZ/1 and ideone.com/DlP6we

    – Wiktor Stribiżew
    Dec 31 '18 at 0:09













  • What is the source of these strings?

    – juanpa.arrivillaga
    Dec 31 '18 at 2:24














  • 1





    You should look into ast.literal_eval.

    – Scott Hunter
    Dec 31 '18 at 0:00











  • Try re.findall(r"""((d+),(d+),('[^'\]*(?:\.[^'\]*)*'),(d+))""", s), see regex101.com/r/3DMXyZ/1 and ideone.com/DlP6we

    – Wiktor Stribiżew
    Dec 31 '18 at 0:09













  • What is the source of these strings?

    – juanpa.arrivillaga
    Dec 31 '18 at 2:24








1




1





You should look into ast.literal_eval.

– Scott Hunter
Dec 31 '18 at 0:00





You should look into ast.literal_eval.

– Scott Hunter
Dec 31 '18 at 0:00













Try re.findall(r"""((d+),(d+),('[^'\]*(?:\.[^'\]*)*'),(d+))""", s), see regex101.com/r/3DMXyZ/1 and ideone.com/DlP6we

– Wiktor Stribiżew
Dec 31 '18 at 0:09







Try re.findall(r"""((d+),(d+),('[^'\]*(?:\.[^'\]*)*'),(d+))""", s), see regex101.com/r/3DMXyZ/1 and ideone.com/DlP6we

– Wiktor Stribiżew
Dec 31 '18 at 0:09















What is the source of these strings?

– juanpa.arrivillaga
Dec 31 '18 at 2:24





What is the source of these strings?

– juanpa.arrivillaga
Dec 31 '18 at 2:24












2 Answers
2






active

oldest

votes


















3














You can use ast.literal_eval to evaluate string containing python literals,



import ast
ip = r"(21,3,'abc',57',1993)"
op = ast.literal_eval(ip)

print(op)
# output,
# (21, 3, "abc',57", 1993)


# verify that they are correct types,
for i in op:
print("{} is {}".format(i, type(i)))

# output,
# 21 is <class 'int'>
# 3 is <class 'int'>
# abc',57 is <class 'str'>
# 1993 is <class 'int'>





share|improve this answer

































    0














    You can use the pattern



    (?<=')(?:\\|\'|[^'])+(?=',)|d+


    For the string content (looks ahead and behind for 's), it'll repeat a group composed of either:





    • \\ - two backslashes (that is, represents a single literal backslash)


    • \' - an escaped ' (that is, represents a single literal ')


    • [^'] - Anything but a quote character


    Or, it'll match d+, the integers.



    https://regex101.com/r/5beqXJ/1






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53982397%2fhow-can-i-extract-tuples-from-a-string%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      3














      You can use ast.literal_eval to evaluate string containing python literals,



      import ast
      ip = r"(21,3,'abc',57',1993)"
      op = ast.literal_eval(ip)

      print(op)
      # output,
      # (21, 3, "abc',57", 1993)


      # verify that they are correct types,
      for i in op:
      print("{} is {}".format(i, type(i)))

      # output,
      # 21 is <class 'int'>
      # 3 is <class 'int'>
      # abc',57 is <class 'str'>
      # 1993 is <class 'int'>





      share|improve this answer






























        3














        You can use ast.literal_eval to evaluate string containing python literals,



        import ast
        ip = r"(21,3,'abc',57',1993)"
        op = ast.literal_eval(ip)

        print(op)
        # output,
        # (21, 3, "abc',57", 1993)


        # verify that they are correct types,
        for i in op:
        print("{} is {}".format(i, type(i)))

        # output,
        # 21 is <class 'int'>
        # 3 is <class 'int'>
        # abc',57 is <class 'str'>
        # 1993 is <class 'int'>





        share|improve this answer




























          3












          3








          3







          You can use ast.literal_eval to evaluate string containing python literals,



          import ast
          ip = r"(21,3,'abc',57',1993)"
          op = ast.literal_eval(ip)

          print(op)
          # output,
          # (21, 3, "abc',57", 1993)


          # verify that they are correct types,
          for i in op:
          print("{} is {}".format(i, type(i)))

          # output,
          # 21 is <class 'int'>
          # 3 is <class 'int'>
          # abc',57 is <class 'str'>
          # 1993 is <class 'int'>





          share|improve this answer















          You can use ast.literal_eval to evaluate string containing python literals,



          import ast
          ip = r"(21,3,'abc',57',1993)"
          op = ast.literal_eval(ip)

          print(op)
          # output,
          # (21, 3, "abc',57", 1993)


          # verify that they are correct types,
          for i in op:
          print("{} is {}".format(i, type(i)))

          # output,
          # 21 is <class 'int'>
          # 3 is <class 'int'>
          # abc',57 is <class 'str'>
          # 1993 is <class 'int'>






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Dec 31 '18 at 0:12

























          answered Dec 31 '18 at 0:09









          Sufiyan GhoriSufiyan Ghori

          11.4k95781




          11.4k95781

























              0














              You can use the pattern



              (?<=')(?:\\|\'|[^'])+(?=',)|d+


              For the string content (looks ahead and behind for 's), it'll repeat a group composed of either:





              • \\ - two backslashes (that is, represents a single literal backslash)


              • \' - an escaped ' (that is, represents a single literal ')


              • [^'] - Anything but a quote character


              Or, it'll match d+, the integers.



              https://regex101.com/r/5beqXJ/1






              share|improve this answer




























                0














                You can use the pattern



                (?<=')(?:\\|\'|[^'])+(?=',)|d+


                For the string content (looks ahead and behind for 's), it'll repeat a group composed of either:





                • \\ - two backslashes (that is, represents a single literal backslash)


                • \' - an escaped ' (that is, represents a single literal ')


                • [^'] - Anything but a quote character


                Or, it'll match d+, the integers.



                https://regex101.com/r/5beqXJ/1






                share|improve this answer


























                  0












                  0








                  0







                  You can use the pattern



                  (?<=')(?:\\|\'|[^'])+(?=',)|d+


                  For the string content (looks ahead and behind for 's), it'll repeat a group composed of either:





                  • \\ - two backslashes (that is, represents a single literal backslash)


                  • \' - an escaped ' (that is, represents a single literal ')


                  • [^'] - Anything but a quote character


                  Or, it'll match d+, the integers.



                  https://regex101.com/r/5beqXJ/1






                  share|improve this answer













                  You can use the pattern



                  (?<=')(?:\\|\'|[^'])+(?=',)|d+


                  For the string content (looks ahead and behind for 's), it'll repeat a group composed of either:





                  • \\ - two backslashes (that is, represents a single literal backslash)


                  • \' - an escaped ' (that is, represents a single literal ')


                  • [^'] - Anything but a quote character


                  Or, it'll match d+, the integers.



                  https://regex101.com/r/5beqXJ/1







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Dec 31 '18 at 0:08









                  CertainPerformanceCertainPerformance

                  83.8k144168




                  83.8k144168






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53982397%2fhow-can-i-extract-tuples-from-a-string%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Monofisismo

                      Angular Downloading a file using contenturl with Basic Authentication

                      Olmecas