How to replace letters in lines in fasta file using bash loops?












2















I want to change all n in the sequence into -, but I don't know how to make my bash script not change the n that show up in sequence names. I'm not experienced with sed or regex to make sure my bash script reads only the lines that do not start with >, as that indicates the header.



Example file:



>Name_with_nnn
nnnatgcnnnatttg
>Name2_with_nnn
atgggnnnnGGtnnn


At the same time I want to convert all lowercase letters into uppercase, only in the sequence lines. I don't even know how to begin using sed, I find it really tricky to understand.



Expected output:



>Name_with_nnn
---ATGC---ATTTG
>Name2_with_nnn
ATGGG----GGT---


So after I created my sequence files I tried to continue my script with:



while IFS= read -r line
do
if [[ $line == ">"* ]]
then
echo "Ignoring header line: $line"
else
echo "Converting to uppercase and then N-to-gaps"
# sed or tr?? do call $line or do I call $OUTFILE? so confused..
fi
done









share|improve this question





























    2















    I want to change all n in the sequence into -, but I don't know how to make my bash script not change the n that show up in sequence names. I'm not experienced with sed or regex to make sure my bash script reads only the lines that do not start with >, as that indicates the header.



    Example file:



    >Name_with_nnn
    nnnatgcnnnatttg
    >Name2_with_nnn
    atgggnnnnGGtnnn


    At the same time I want to convert all lowercase letters into uppercase, only in the sequence lines. I don't even know how to begin using sed, I find it really tricky to understand.



    Expected output:



    >Name_with_nnn
    ---ATGC---ATTTG
    >Name2_with_nnn
    ATGGG----GGT---


    So after I created my sequence files I tried to continue my script with:



    while IFS= read -r line
    do
    if [[ $line == ">"* ]]
    then
    echo "Ignoring header line: $line"
    else
    echo "Converting to uppercase and then N-to-gaps"
    # sed or tr?? do call $line or do I call $OUTFILE? so confused..
    fi
    done









    share|improve this question



























      2












      2








      2


      1






      I want to change all n in the sequence into -, but I don't know how to make my bash script not change the n that show up in sequence names. I'm not experienced with sed or regex to make sure my bash script reads only the lines that do not start with >, as that indicates the header.



      Example file:



      >Name_with_nnn
      nnnatgcnnnatttg
      >Name2_with_nnn
      atgggnnnnGGtnnn


      At the same time I want to convert all lowercase letters into uppercase, only in the sequence lines. I don't even know how to begin using sed, I find it really tricky to understand.



      Expected output:



      >Name_with_nnn
      ---ATGC---ATTTG
      >Name2_with_nnn
      ATGGG----GGT---


      So after I created my sequence files I tried to continue my script with:



      while IFS= read -r line
      do
      if [[ $line == ">"* ]]
      then
      echo "Ignoring header line: $line"
      else
      echo "Converting to uppercase and then N-to-gaps"
      # sed or tr?? do call $line or do I call $OUTFILE? so confused..
      fi
      done









      share|improve this question
















      I want to change all n in the sequence into -, but I don't know how to make my bash script not change the n that show up in sequence names. I'm not experienced with sed or regex to make sure my bash script reads only the lines that do not start with >, as that indicates the header.



      Example file:



      >Name_with_nnn
      nnnatgcnnnatttg
      >Name2_with_nnn
      atgggnnnnGGtnnn


      At the same time I want to convert all lowercase letters into uppercase, only in the sequence lines. I don't even know how to begin using sed, I find it really tricky to understand.



      Expected output:



      >Name_with_nnn
      ---ATGC---ATTTG
      >Name2_with_nnn
      ATGGG----GGT---


      So after I created my sequence files I tried to continue my script with:



      while IFS= read -r line
      do
      if [[ $line == ">"* ]]
      then
      echo "Ignoring header line: $line"
      else
      echo "Converting to uppercase and then N-to-gaps"
      # sed or tr?? do call $line or do I call $OUTFILE? so confused..
      fi
      done






      bash sed






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 3 at 18:49









      Benjamin W.

      21.6k135257




      21.6k135257










      asked Jan 3 at 18:27









      DNAngelDNAngel

      1491110




      1491110
























          4 Answers
          4






          active

          oldest

          votes


















          2














          You can resolve this with sed with below line:



          sed -i "/^>/! {s/n/-/g; s/(.*)/U1/g}" text.txt


          And your output would be:



          >Name_with_nnn
          ---ATGC---ATTTG
          >Name2_with_nnn
          ATGGG----GGT---





          share|improve this answer


























          • @DNAngel be aware I updated the script as converting to uppercase was missing.

            – Cedric Zoppolo
            Jan 3 at 18:55






          • 1





            @CedricZoppolo Not only uppercase, you updated start with > too, might be worth mentioning.

            – Tiw
            Jan 3 at 19:05











          • @Tiw is correct. I also updated to ensure only lines not starting with > will be converted. I was missing ^ character in order to get the lines starting with > and not lines containing such character in any part of line.

            – Cedric Zoppolo
            Jan 3 at 19:12





















          2














          You may use this simple gnu sed:



          sed '/^>/!{s/n/-/g; s/.*/U&/;}' file




          >Name_with_nnn
          ---ATGC---ATTTG
          >Name2_with_nnn
          ATGGG----GGT---





          share|improve this answer































            2














            In pure Bash, likely quite slow for larger inputs:



            while IFS= read -r line; do
            case $line in
            '>'*)
            printf '%sn' "$line"
            ;;
            *)
            line=${line//n/-}
            printf '%sn' "${line^^}"
            ;;
            esac
            done < infile


            This uses a case statement with pattern matching to test if a line starts with > or not; to modify the lines, parameter expansions are used. The ${parameter^^} expansion requires Bash 4.0 or newer.






            share|improve this answer































              0














              How about awk ?



              awk '/^[^>]/{gsub("n","-");print toupper($0);next;}1' data


              Output:



              >Name_with_nnn
              ---ATGC---ATTTG
              >Name2_with_nnn
              ATGGG----GGT---


              However, sed can do it too (GNU sed):



              sed -E '/^[^>]/{s/n/-/g;s/(.*)/U1/g;}' data


              It's the same as:



              sed -E '/^>/!{s/n/-/g;s/(.*)/U1/g;}' data


              If you want to change in place, you can add -i switch to sed.






              share|improve this answer


























                Your Answer






                StackExchange.ifUsing("editor", function () {
                StackExchange.using("externalEditor", function () {
                StackExchange.using("snippets", function () {
                StackExchange.snippets.init();
                });
                });
                }, "code-snippets");

                StackExchange.ready(function() {
                var channelOptions = {
                tags: "".split(" "),
                id: "1"
                };
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function() {
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled) {
                StackExchange.using("snippets", function() {
                createEditor();
                });
                }
                else {
                createEditor();
                }
                });

                function createEditor() {
                StackExchange.prepareEditor({
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader: {
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                },
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                });


                }
                });














                draft saved

                draft discarded


















                StackExchange.ready(
                function () {
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54027847%2fhow-to-replace-letters-in-lines-in-fasta-file-using-bash-loops%23new-answer', 'question_page');
                }
                );

                Post as a guest















                Required, but never shown

























                4 Answers
                4






                active

                oldest

                votes








                4 Answers
                4






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                2














                You can resolve this with sed with below line:



                sed -i "/^>/! {s/n/-/g; s/(.*)/U1/g}" text.txt


                And your output would be:



                >Name_with_nnn
                ---ATGC---ATTTG
                >Name2_with_nnn
                ATGGG----GGT---





                share|improve this answer


























                • @DNAngel be aware I updated the script as converting to uppercase was missing.

                  – Cedric Zoppolo
                  Jan 3 at 18:55






                • 1





                  @CedricZoppolo Not only uppercase, you updated start with > too, might be worth mentioning.

                  – Tiw
                  Jan 3 at 19:05











                • @Tiw is correct. I also updated to ensure only lines not starting with > will be converted. I was missing ^ character in order to get the lines starting with > and not lines containing such character in any part of line.

                  – Cedric Zoppolo
                  Jan 3 at 19:12


















                2














                You can resolve this with sed with below line:



                sed -i "/^>/! {s/n/-/g; s/(.*)/U1/g}" text.txt


                And your output would be:



                >Name_with_nnn
                ---ATGC---ATTTG
                >Name2_with_nnn
                ATGGG----GGT---





                share|improve this answer


























                • @DNAngel be aware I updated the script as converting to uppercase was missing.

                  – Cedric Zoppolo
                  Jan 3 at 18:55






                • 1





                  @CedricZoppolo Not only uppercase, you updated start with > too, might be worth mentioning.

                  – Tiw
                  Jan 3 at 19:05











                • @Tiw is correct. I also updated to ensure only lines not starting with > will be converted. I was missing ^ character in order to get the lines starting with > and not lines containing such character in any part of line.

                  – Cedric Zoppolo
                  Jan 3 at 19:12
















                2












                2








                2







                You can resolve this with sed with below line:



                sed -i "/^>/! {s/n/-/g; s/(.*)/U1/g}" text.txt


                And your output would be:



                >Name_with_nnn
                ---ATGC---ATTTG
                >Name2_with_nnn
                ATGGG----GGT---





                share|improve this answer















                You can resolve this with sed with below line:



                sed -i "/^>/! {s/n/-/g; s/(.*)/U1/g}" text.txt


                And your output would be:



                >Name_with_nnn
                ---ATGC---ATTTG
                >Name2_with_nnn
                ATGGG----GGT---






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Jan 3 at 18:59

























                answered Jan 3 at 18:38









                Cedric ZoppoloCedric Zoppolo

                1,36211529




                1,36211529













                • @DNAngel be aware I updated the script as converting to uppercase was missing.

                  – Cedric Zoppolo
                  Jan 3 at 18:55






                • 1





                  @CedricZoppolo Not only uppercase, you updated start with > too, might be worth mentioning.

                  – Tiw
                  Jan 3 at 19:05











                • @Tiw is correct. I also updated to ensure only lines not starting with > will be converted. I was missing ^ character in order to get the lines starting with > and not lines containing such character in any part of line.

                  – Cedric Zoppolo
                  Jan 3 at 19:12





















                • @DNAngel be aware I updated the script as converting to uppercase was missing.

                  – Cedric Zoppolo
                  Jan 3 at 18:55






                • 1





                  @CedricZoppolo Not only uppercase, you updated start with > too, might be worth mentioning.

                  – Tiw
                  Jan 3 at 19:05











                • @Tiw is correct. I also updated to ensure only lines not starting with > will be converted. I was missing ^ character in order to get the lines starting with > and not lines containing such character in any part of line.

                  – Cedric Zoppolo
                  Jan 3 at 19:12



















                @DNAngel be aware I updated the script as converting to uppercase was missing.

                – Cedric Zoppolo
                Jan 3 at 18:55





                @DNAngel be aware I updated the script as converting to uppercase was missing.

                – Cedric Zoppolo
                Jan 3 at 18:55




                1




                1





                @CedricZoppolo Not only uppercase, you updated start with > too, might be worth mentioning.

                – Tiw
                Jan 3 at 19:05





                @CedricZoppolo Not only uppercase, you updated start with > too, might be worth mentioning.

                – Tiw
                Jan 3 at 19:05













                @Tiw is correct. I also updated to ensure only lines not starting with > will be converted. I was missing ^ character in order to get the lines starting with > and not lines containing such character in any part of line.

                – Cedric Zoppolo
                Jan 3 at 19:12







                @Tiw is correct. I also updated to ensure only lines not starting with > will be converted. I was missing ^ character in order to get the lines starting with > and not lines containing such character in any part of line.

                – Cedric Zoppolo
                Jan 3 at 19:12















                2














                You may use this simple gnu sed:



                sed '/^>/!{s/n/-/g; s/.*/U&/;}' file




                >Name_with_nnn
                ---ATGC---ATTTG
                >Name2_with_nnn
                ATGGG----GGT---





                share|improve this answer




























                  2














                  You may use this simple gnu sed:



                  sed '/^>/!{s/n/-/g; s/.*/U&/;}' file




                  >Name_with_nnn
                  ---ATGC---ATTTG
                  >Name2_with_nnn
                  ATGGG----GGT---





                  share|improve this answer


























                    2












                    2








                    2







                    You may use this simple gnu sed:



                    sed '/^>/!{s/n/-/g; s/.*/U&/;}' file




                    >Name_with_nnn
                    ---ATGC---ATTTG
                    >Name2_with_nnn
                    ATGGG----GGT---





                    share|improve this answer













                    You may use this simple gnu sed:



                    sed '/^>/!{s/n/-/g; s/.*/U&/;}' file




                    >Name_with_nnn
                    ---ATGC---ATTTG
                    >Name2_with_nnn
                    ATGGG----GGT---






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Jan 3 at 18:35









                    anubhavaanubhava

                    534k48332409




                    534k48332409























                        2














                        In pure Bash, likely quite slow for larger inputs:



                        while IFS= read -r line; do
                        case $line in
                        '>'*)
                        printf '%sn' "$line"
                        ;;
                        *)
                        line=${line//n/-}
                        printf '%sn' "${line^^}"
                        ;;
                        esac
                        done < infile


                        This uses a case statement with pattern matching to test if a line starts with > or not; to modify the lines, parameter expansions are used. The ${parameter^^} expansion requires Bash 4.0 or newer.






                        share|improve this answer




























                          2














                          In pure Bash, likely quite slow for larger inputs:



                          while IFS= read -r line; do
                          case $line in
                          '>'*)
                          printf '%sn' "$line"
                          ;;
                          *)
                          line=${line//n/-}
                          printf '%sn' "${line^^}"
                          ;;
                          esac
                          done < infile


                          This uses a case statement with pattern matching to test if a line starts with > or not; to modify the lines, parameter expansions are used. The ${parameter^^} expansion requires Bash 4.0 or newer.






                          share|improve this answer


























                            2












                            2








                            2







                            In pure Bash, likely quite slow for larger inputs:



                            while IFS= read -r line; do
                            case $line in
                            '>'*)
                            printf '%sn' "$line"
                            ;;
                            *)
                            line=${line//n/-}
                            printf '%sn' "${line^^}"
                            ;;
                            esac
                            done < infile


                            This uses a case statement with pattern matching to test if a line starts with > or not; to modify the lines, parameter expansions are used. The ${parameter^^} expansion requires Bash 4.0 or newer.






                            share|improve this answer













                            In pure Bash, likely quite slow for larger inputs:



                            while IFS= read -r line; do
                            case $line in
                            '>'*)
                            printf '%sn' "$line"
                            ;;
                            *)
                            line=${line//n/-}
                            printf '%sn' "${line^^}"
                            ;;
                            esac
                            done < infile


                            This uses a case statement with pattern matching to test if a line starts with > or not; to modify the lines, parameter expansions are used. The ${parameter^^} expansion requires Bash 4.0 or newer.







                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Jan 3 at 18:58









                            Benjamin W.Benjamin W.

                            21.6k135257




                            21.6k135257























                                0














                                How about awk ?



                                awk '/^[^>]/{gsub("n","-");print toupper($0);next;}1' data


                                Output:



                                >Name_with_nnn
                                ---ATGC---ATTTG
                                >Name2_with_nnn
                                ATGGG----GGT---


                                However, sed can do it too (GNU sed):



                                sed -E '/^[^>]/{s/n/-/g;s/(.*)/U1/g;}' data


                                It's the same as:



                                sed -E '/^>/!{s/n/-/g;s/(.*)/U1/g;}' data


                                If you want to change in place, you can add -i switch to sed.






                                share|improve this answer






























                                  0














                                  How about awk ?



                                  awk '/^[^>]/{gsub("n","-");print toupper($0);next;}1' data


                                  Output:



                                  >Name_with_nnn
                                  ---ATGC---ATTTG
                                  >Name2_with_nnn
                                  ATGGG----GGT---


                                  However, sed can do it too (GNU sed):



                                  sed -E '/^[^>]/{s/n/-/g;s/(.*)/U1/g;}' data


                                  It's the same as:



                                  sed -E '/^>/!{s/n/-/g;s/(.*)/U1/g;}' data


                                  If you want to change in place, you can add -i switch to sed.






                                  share|improve this answer




























                                    0












                                    0








                                    0







                                    How about awk ?



                                    awk '/^[^>]/{gsub("n","-");print toupper($0);next;}1' data


                                    Output:



                                    >Name_with_nnn
                                    ---ATGC---ATTTG
                                    >Name2_with_nnn
                                    ATGGG----GGT---


                                    However, sed can do it too (GNU sed):



                                    sed -E '/^[^>]/{s/n/-/g;s/(.*)/U1/g;}' data


                                    It's the same as:



                                    sed -E '/^>/!{s/n/-/g;s/(.*)/U1/g;}' data


                                    If you want to change in place, you can add -i switch to sed.






                                    share|improve this answer















                                    How about awk ?



                                    awk '/^[^>]/{gsub("n","-");print toupper($0);next;}1' data


                                    Output:



                                    >Name_with_nnn
                                    ---ATGC---ATTTG
                                    >Name2_with_nnn
                                    ATGGG----GGT---


                                    However, sed can do it too (GNU sed):



                                    sed -E '/^[^>]/{s/n/-/g;s/(.*)/U1/g;}' data


                                    It's the same as:



                                    sed -E '/^>/!{s/n/-/g;s/(.*)/U1/g;}' data


                                    If you want to change in place, you can add -i switch to sed.







                                    share|improve this answer














                                    share|improve this answer



                                    share|improve this answer








                                    edited Jan 3 at 18:52

























                                    answered Jan 3 at 18:29









                                    TiwTiw

                                    4,35461630




                                    4,35461630






























                                        draft saved

                                        draft discarded




















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid



                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.


                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function () {
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54027847%2fhow-to-replace-letters-in-lines-in-fasta-file-using-bash-loops%23new-answer', 'question_page');
                                        }
                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Monofisismo

                                        Angular Downloading a file using contenturl with Basic Authentication

                                        Olmecas