only retain lines with first instance of pattern, for multiple patterns





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







-2















I have a file here with many lines and a number of columns, and I would like to keep lines only that have the first occurrence of a pattern/string, but for any repeated string/pattern in that column.



e.g.



cat exp.txt 
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
etc......


And I would like to remove lines that have the same starting ID (in the ID column), up to the "_" character...



e.g. (after script run)



cat post.exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS


I am not sure how to approach due to the the fact that I want to remove lines with the subsequent occurrence(s) of any pattern (up to the _ character) in the ID column, not just a particular pattern. Is this even possible?



Thanks -
LP










share|improve this question























  • You will get a much more friendly reception and much better help here if you show what code you have tried so far and describe what problems you were having with it. Without code, your question looks like a request for free consulting and many people don't like that.

    – John1024
    Jan 3 at 21:56


















-2















I have a file here with many lines and a number of columns, and I would like to keep lines only that have the first occurrence of a pattern/string, but for any repeated string/pattern in that column.



e.g.



cat exp.txt 
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
etc......


And I would like to remove lines that have the same starting ID (in the ID column), up to the "_" character...



e.g. (after script run)



cat post.exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS


I am not sure how to approach due to the the fact that I want to remove lines with the subsequent occurrence(s) of any pattern (up to the _ character) in the ID column, not just a particular pattern. Is this even possible?



Thanks -
LP










share|improve this question























  • You will get a much more friendly reception and much better help here if you show what code you have tried so far and describe what problems you were having with it. Without code, your question looks like a request for free consulting and many people don't like that.

    – John1024
    Jan 3 at 21:56














-2












-2








-2


1






I have a file here with many lines and a number of columns, and I would like to keep lines only that have the first occurrence of a pattern/string, but for any repeated string/pattern in that column.



e.g.



cat exp.txt 
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
etc......


And I would like to remove lines that have the same starting ID (in the ID column), up to the "_" character...



e.g. (after script run)



cat post.exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS


I am not sure how to approach due to the the fact that I want to remove lines with the subsequent occurrence(s) of any pattern (up to the _ character) in the ID column, not just a particular pattern. Is this even possible?



Thanks -
LP










share|improve this question














I have a file here with many lines and a number of columns, and I would like to keep lines only that have the first occurrence of a pattern/string, but for any repeated string/pattern in that column.



e.g.



cat exp.txt 
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
etc......


And I would like to remove lines that have the same starting ID (in the ID column), up to the "_" character...



e.g. (after script run)



cat post.exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS


I am not sure how to approach due to the the fact that I want to remove lines with the subsequent occurrence(s) of any pattern (up to the _ character) in the ID column, not just a particular pattern. Is this even possible?



Thanks -
LP







bash awk sed






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 3 at 21:46









LP_640LP_640

1821111




1821111













  • You will get a much more friendly reception and much better help here if you show what code you have tried so far and describe what problems you were having with it. Without code, your question looks like a request for free consulting and many people don't like that.

    – John1024
    Jan 3 at 21:56



















  • You will get a much more friendly reception and much better help here if you show what code you have tried so far and describe what problems you were having with it. Without code, your question looks like a request for free consulting and many people don't like that.

    – John1024
    Jan 3 at 21:56

















You will get a much more friendly reception and much better help here if you show what code you have tried so far and describe what problems you were having with it. Without code, your question looks like a request for free consulting and many people don't like that.

– John1024
Jan 3 at 21:56





You will get a much more friendly reception and much better help here if you show what code you have tried so far and describe what problems you were having with it. Without code, your question looks like a request for free consulting and many people don't like that.

– John1024
Jan 3 at 21:56












5 Answers
5






active

oldest

votes


















1














awk '!a[$2]++' FS='[ _]*' exp.txt





share|improve this answer































    0














    Use an associative array to hold keys that have already been seen:



    {
    if (split($2, a, /_/) > 0 )
    {
    key = a[1]
    if (!value[key])
    {
    value[key] = 1
    print $0
    }
    }
    }





    share|improve this answer































      0














      awk



      $ cat exp.txt
      POS ID REF ALT QUAL FILTER
      182 3_12 G A . PASS
      192 3_22 A A . PASS
      199 4_22 G A . PASS
      201 10_22 A A . PASS
      214 10_35 A G . PASS
      220 10_41 C T . PASS

      $ awk ' { split($2,t,"_"); if( ! a[t[1]] ) { print ; a[t[1]]++ } }' exp.txt
      POS ID REF ALT QUAL FILTER
      182 3_12 G A . PASS
      199 4_22 G A . PASS
      201 10_22 A A . PASS





      share|improve this answer































        0














        if _ is not used in the first field William Pursell's answer is the best, if not, same concept applied after splitting the second field. Note that if there are no _ in the field the whole value will be used.



        $ awk '{split($2,p,"_")} !a[p[1]]++' file

        POS ID REF ALT QUAL FILTER
        182 3_12 G A . PASS
        199 4_22 G A . PASS
        201 10_22 A A . PASS





        share|improve this answer
























        • can be shortened further awk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt

          – stack0114106
          Jan 3 at 23:11











        • My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.

          – William Pursell
          Jan 4 at 4:06



















        0














        Perl



        $ perl -lane ' $F[1]=~/(.+)_/; print unless $kv{$1}++ ' exp.txt
        POS ID REF ALT QUAL FILTER
        182 3_12 G A . PASS
        199 4_22 G A . PASS
        201 10_22 A A . PASS





        share|improve this answer
























          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54030248%2fonly-retain-lines-with-first-instance-of-pattern-for-multiple-patterns%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          5 Answers
          5






          active

          oldest

          votes








          5 Answers
          5






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          awk '!a[$2]++' FS='[ _]*' exp.txt





          share|improve this answer




























            1














            awk '!a[$2]++' FS='[ _]*' exp.txt





            share|improve this answer


























              1












              1








              1







              awk '!a[$2]++' FS='[ _]*' exp.txt





              share|improve this answer













              awk '!a[$2]++' FS='[ _]*' exp.txt






              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Jan 3 at 22:44









              William PursellWilliam Pursell

              134k33208241




              134k33208241

























                  0














                  Use an associative array to hold keys that have already been seen:



                  {
                  if (split($2, a, /_/) > 0 )
                  {
                  key = a[1]
                  if (!value[key])
                  {
                  value[key] = 1
                  print $0
                  }
                  }
                  }





                  share|improve this answer




























                    0














                    Use an associative array to hold keys that have already been seen:



                    {
                    if (split($2, a, /_/) > 0 )
                    {
                    key = a[1]
                    if (!value[key])
                    {
                    value[key] = 1
                    print $0
                    }
                    }
                    }





                    share|improve this answer


























                      0












                      0








                      0







                      Use an associative array to hold keys that have already been seen:



                      {
                      if (split($2, a, /_/) > 0 )
                      {
                      key = a[1]
                      if (!value[key])
                      {
                      value[key] = 1
                      print $0
                      }
                      }
                      }





                      share|improve this answer













                      Use an associative array to hold keys that have already been seen:



                      {
                      if (split($2, a, /_/) > 0 )
                      {
                      key = a[1]
                      if (!value[key])
                      {
                      value[key] = 1
                      print $0
                      }
                      }
                      }






                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Jan 3 at 22:15









                      wefwef

                      26327




                      26327























                          0














                          awk



                          $ cat exp.txt
                          POS ID REF ALT QUAL FILTER
                          182 3_12 G A . PASS
                          192 3_22 A A . PASS
                          199 4_22 G A . PASS
                          201 10_22 A A . PASS
                          214 10_35 A G . PASS
                          220 10_41 C T . PASS

                          $ awk ' { split($2,t,"_"); if( ! a[t[1]] ) { print ; a[t[1]]++ } }' exp.txt
                          POS ID REF ALT QUAL FILTER
                          182 3_12 G A . PASS
                          199 4_22 G A . PASS
                          201 10_22 A A . PASS





                          share|improve this answer




























                            0














                            awk



                            $ cat exp.txt
                            POS ID REF ALT QUAL FILTER
                            182 3_12 G A . PASS
                            192 3_22 A A . PASS
                            199 4_22 G A . PASS
                            201 10_22 A A . PASS
                            214 10_35 A G . PASS
                            220 10_41 C T . PASS

                            $ awk ' { split($2,t,"_"); if( ! a[t[1]] ) { print ; a[t[1]]++ } }' exp.txt
                            POS ID REF ALT QUAL FILTER
                            182 3_12 G A . PASS
                            199 4_22 G A . PASS
                            201 10_22 A A . PASS





                            share|improve this answer


























                              0












                              0








                              0







                              awk



                              $ cat exp.txt
                              POS ID REF ALT QUAL FILTER
                              182 3_12 G A . PASS
                              192 3_22 A A . PASS
                              199 4_22 G A . PASS
                              201 10_22 A A . PASS
                              214 10_35 A G . PASS
                              220 10_41 C T . PASS

                              $ awk ' { split($2,t,"_"); if( ! a[t[1]] ) { print ; a[t[1]]++ } }' exp.txt
                              POS ID REF ALT QUAL FILTER
                              182 3_12 G A . PASS
                              199 4_22 G A . PASS
                              201 10_22 A A . PASS





                              share|improve this answer













                              awk



                              $ cat exp.txt
                              POS ID REF ALT QUAL FILTER
                              182 3_12 G A . PASS
                              192 3_22 A A . PASS
                              199 4_22 G A . PASS
                              201 10_22 A A . PASS
                              214 10_35 A G . PASS
                              220 10_41 C T . PASS

                              $ awk ' { split($2,t,"_"); if( ! a[t[1]] ) { print ; a[t[1]]++ } }' exp.txt
                              POS ID REF ALT QUAL FILTER
                              182 3_12 G A . PASS
                              199 4_22 G A . PASS
                              201 10_22 A A . PASS






                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Jan 3 at 22:34









                              stack0114106stack0114106

                              4,9832423




                              4,9832423























                                  0














                                  if _ is not used in the first field William Pursell's answer is the best, if not, same concept applied after splitting the second field. Note that if there are no _ in the field the whole value will be used.



                                  $ awk '{split($2,p,"_")} !a[p[1]]++' file

                                  POS ID REF ALT QUAL FILTER
                                  182 3_12 G A . PASS
                                  199 4_22 G A . PASS
                                  201 10_22 A A . PASS





                                  share|improve this answer
























                                  • can be shortened further awk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt

                                    – stack0114106
                                    Jan 3 at 23:11











                                  • My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.

                                    – William Pursell
                                    Jan 4 at 4:06
















                                  0














                                  if _ is not used in the first field William Pursell's answer is the best, if not, same concept applied after splitting the second field. Note that if there are no _ in the field the whole value will be used.



                                  $ awk '{split($2,p,"_")} !a[p[1]]++' file

                                  POS ID REF ALT QUAL FILTER
                                  182 3_12 G A . PASS
                                  199 4_22 G A . PASS
                                  201 10_22 A A . PASS





                                  share|improve this answer
























                                  • can be shortened further awk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt

                                    – stack0114106
                                    Jan 3 at 23:11











                                  • My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.

                                    – William Pursell
                                    Jan 4 at 4:06














                                  0












                                  0








                                  0







                                  if _ is not used in the first field William Pursell's answer is the best, if not, same concept applied after splitting the second field. Note that if there are no _ in the field the whole value will be used.



                                  $ awk '{split($2,p,"_")} !a[p[1]]++' file

                                  POS ID REF ALT QUAL FILTER
                                  182 3_12 G A . PASS
                                  199 4_22 G A . PASS
                                  201 10_22 A A . PASS





                                  share|improve this answer













                                  if _ is not used in the first field William Pursell's answer is the best, if not, same concept applied after splitting the second field. Note that if there are no _ in the field the whole value will be used.



                                  $ awk '{split($2,p,"_")} !a[p[1]]++' file

                                  POS ID REF ALT QUAL FILTER
                                  182 3_12 G A . PASS
                                  199 4_22 G A . PASS
                                  201 10_22 A A . PASS






                                  share|improve this answer












                                  share|improve this answer



                                  share|improve this answer










                                  answered Jan 3 at 22:49









                                  karakfakarakfa

                                  50.8k52940




                                  50.8k52940













                                  • can be shortened further awk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt

                                    – stack0114106
                                    Jan 3 at 23:11











                                  • My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.

                                    – William Pursell
                                    Jan 4 at 4:06



















                                  • can be shortened further awk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt

                                    – stack0114106
                                    Jan 3 at 23:11











                                  • My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.

                                    – William Pursell
                                    Jan 4 at 4:06

















                                  can be shortened further awk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt

                                  – stack0114106
                                  Jan 3 at 23:11





                                  can be shortened further awk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt

                                  – stack0114106
                                  Jan 3 at 23:11













                                  My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.

                                  – William Pursell
                                  Jan 4 at 4:06





                                  My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.

                                  – William Pursell
                                  Jan 4 at 4:06











                                  0














                                  Perl



                                  $ perl -lane ' $F[1]=~/(.+)_/; print unless $kv{$1}++ ' exp.txt
                                  POS ID REF ALT QUAL FILTER
                                  182 3_12 G A . PASS
                                  199 4_22 G A . PASS
                                  201 10_22 A A . PASS





                                  share|improve this answer




























                                    0














                                    Perl



                                    $ perl -lane ' $F[1]=~/(.+)_/; print unless $kv{$1}++ ' exp.txt
                                    POS ID REF ALT QUAL FILTER
                                    182 3_12 G A . PASS
                                    199 4_22 G A . PASS
                                    201 10_22 A A . PASS





                                    share|improve this answer


























                                      0












                                      0








                                      0







                                      Perl



                                      $ perl -lane ' $F[1]=~/(.+)_/; print unless $kv{$1}++ ' exp.txt
                                      POS ID REF ALT QUAL FILTER
                                      182 3_12 G A . PASS
                                      199 4_22 G A . PASS
                                      201 10_22 A A . PASS





                                      share|improve this answer













                                      Perl



                                      $ perl -lane ' $F[1]=~/(.+)_/; print unless $kv{$1}++ ' exp.txt
                                      POS ID REF ALT QUAL FILTER
                                      182 3_12 G A . PASS
                                      199 4_22 G A . PASS
                                      201 10_22 A A . PASS






                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Jan 3 at 23:05









                                      stack0114106stack0114106

                                      4,9832423




                                      4,9832423






























                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Stack Overflow!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54030248%2fonly-retain-lines-with-first-instance-of-pattern-for-multiple-patterns%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          Monofisismo

                                          Angular Downloading a file using contenturl with Basic Authentication

                                          Olmecas