only retain lines with first instance of pattern, for multiple patterns
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I have a file here with many lines and a number of columns, and I would like to keep lines only that have the first occurrence of a pattern/string, but for any repeated string/pattern in that column.
e.g.
cat exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
etc......
And I would like to remove lines that have the same starting ID (in the ID column), up to the "_" character...
e.g. (after script run)
cat post.exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
I am not sure how to approach due to the the fact that I want to remove lines with the subsequent occurrence(s) of any pattern (up to the _ character) in the ID column, not just a particular pattern. Is this even possible?
Thanks -
LP
bash awk sed
add a comment |
I have a file here with many lines and a number of columns, and I would like to keep lines only that have the first occurrence of a pattern/string, but for any repeated string/pattern in that column.
e.g.
cat exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
etc......
And I would like to remove lines that have the same starting ID (in the ID column), up to the "_" character...
e.g. (after script run)
cat post.exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
I am not sure how to approach due to the the fact that I want to remove lines with the subsequent occurrence(s) of any pattern (up to the _ character) in the ID column, not just a particular pattern. Is this even possible?
Thanks -
LP
bash awk sed
You will get a much more friendly reception and much better help here if you show what code you have tried so far and describe what problems you were having with it. Without code, your question looks like a request for free consulting and many people don't like that.
– John1024
Jan 3 at 21:56
add a comment |
I have a file here with many lines and a number of columns, and I would like to keep lines only that have the first occurrence of a pattern/string, but for any repeated string/pattern in that column.
e.g.
cat exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
etc......
And I would like to remove lines that have the same starting ID (in the ID column), up to the "_" character...
e.g. (after script run)
cat post.exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
I am not sure how to approach due to the the fact that I want to remove lines with the subsequent occurrence(s) of any pattern (up to the _ character) in the ID column, not just a particular pattern. Is this even possible?
Thanks -
LP
bash awk sed
I have a file here with many lines and a number of columns, and I would like to keep lines only that have the first occurrence of a pattern/string, but for any repeated string/pattern in that column.
e.g.
cat exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
etc......
And I would like to remove lines that have the same starting ID (in the ID column), up to the "_" character...
e.g. (after script run)
cat post.exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
I am not sure how to approach due to the the fact that I want to remove lines with the subsequent occurrence(s) of any pattern (up to the _ character) in the ID column, not just a particular pattern. Is this even possible?
Thanks -
LP
bash awk sed
bash awk sed
asked Jan 3 at 21:46
LP_640LP_640
1821111
1821111
You will get a much more friendly reception and much better help here if you show what code you have tried so far and describe what problems you were having with it. Without code, your question looks like a request for free consulting and many people don't like that.
– John1024
Jan 3 at 21:56
add a comment |
You will get a much more friendly reception and much better help here if you show what code you have tried so far and describe what problems you were having with it. Without code, your question looks like a request for free consulting and many people don't like that.
– John1024
Jan 3 at 21:56
You will get a much more friendly reception and much better help here if you show what code you have tried so far and describe what problems you were having with it. Without code, your question looks like a request for free consulting and many people don't like that.
– John1024
Jan 3 at 21:56
You will get a much more friendly reception and much better help here if you show what code you have tried so far and describe what problems you were having with it. Without code, your question looks like a request for free consulting and many people don't like that.
– John1024
Jan 3 at 21:56
add a comment |
5 Answers
5
active
oldest
votes
awk '!a[$2]++' FS='[ _]*' exp.txt
add a comment |
Use an associative array to hold keys that have already been seen:
{
if (split($2, a, /_/) > 0 )
{
key = a[1]
if (!value[key])
{
value[key] = 1
print $0
}
}
}
add a comment |
awk
$ cat exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
$ awk ' { split($2,t,"_"); if( ! a[t[1]] ) { print ; a[t[1]]++ } }' exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
add a comment |
if _
is not used in the first field William Pursell's answer is the best, if not, same concept applied after splitting the second field. Note that if there are no _
in the field the whole value will be used.
$ awk '{split($2,p,"_")} !a[p[1]]++' file
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
can be shortened furtherawk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt
– stack0114106
Jan 3 at 23:11
My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.
– William Pursell
Jan 4 at 4:06
add a comment |
Perl
$ perl -lane ' $F[1]=~/(.+)_/; print unless $kv{$1}++ ' exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54030248%2fonly-retain-lines-with-first-instance-of-pattern-for-multiple-patterns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
awk '!a[$2]++' FS='[ _]*' exp.txt
add a comment |
awk '!a[$2]++' FS='[ _]*' exp.txt
add a comment |
awk '!a[$2]++' FS='[ _]*' exp.txt
awk '!a[$2]++' FS='[ _]*' exp.txt
answered Jan 3 at 22:44
William PursellWilliam Pursell
134k33208241
134k33208241
add a comment |
add a comment |
Use an associative array to hold keys that have already been seen:
{
if (split($2, a, /_/) > 0 )
{
key = a[1]
if (!value[key])
{
value[key] = 1
print $0
}
}
}
add a comment |
Use an associative array to hold keys that have already been seen:
{
if (split($2, a, /_/) > 0 )
{
key = a[1]
if (!value[key])
{
value[key] = 1
print $0
}
}
}
add a comment |
Use an associative array to hold keys that have already been seen:
{
if (split($2, a, /_/) > 0 )
{
key = a[1]
if (!value[key])
{
value[key] = 1
print $0
}
}
}
Use an associative array to hold keys that have already been seen:
{
if (split($2, a, /_/) > 0 )
{
key = a[1]
if (!value[key])
{
value[key] = 1
print $0
}
}
}
answered Jan 3 at 22:15
wefwef
26327
26327
add a comment |
add a comment |
awk
$ cat exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
$ awk ' { split($2,t,"_"); if( ! a[t[1]] ) { print ; a[t[1]]++ } }' exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
add a comment |
awk
$ cat exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
$ awk ' { split($2,t,"_"); if( ! a[t[1]] ) { print ; a[t[1]]++ } }' exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
add a comment |
awk
$ cat exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
$ awk ' { split($2,t,"_"); if( ! a[t[1]] ) { print ; a[t[1]]++ } }' exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
awk
$ cat exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
192 3_22 A A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
214 10_35 A G . PASS
220 10_41 C T . PASS
$ awk ' { split($2,t,"_"); if( ! a[t[1]] ) { print ; a[t[1]]++ } }' exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
answered Jan 3 at 22:34
stack0114106stack0114106
4,9832423
4,9832423
add a comment |
add a comment |
if _
is not used in the first field William Pursell's answer is the best, if not, same concept applied after splitting the second field. Note that if there are no _
in the field the whole value will be used.
$ awk '{split($2,p,"_")} !a[p[1]]++' file
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
can be shortened furtherawk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt
– stack0114106
Jan 3 at 23:11
My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.
– William Pursell
Jan 4 at 4:06
add a comment |
if _
is not used in the first field William Pursell's answer is the best, if not, same concept applied after splitting the second field. Note that if there are no _
in the field the whole value will be used.
$ awk '{split($2,p,"_")} !a[p[1]]++' file
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
can be shortened furtherawk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt
– stack0114106
Jan 3 at 23:11
My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.
– William Pursell
Jan 4 at 4:06
add a comment |
if _
is not used in the first field William Pursell's answer is the best, if not, same concept applied after splitting the second field. Note that if there are no _
in the field the whole value will be used.
$ awk '{split($2,p,"_")} !a[p[1]]++' file
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
if _
is not used in the first field William Pursell's answer is the best, if not, same concept applied after splitting the second field. Note that if there are no _
in the field the whole value will be used.
$ awk '{split($2,p,"_")} !a[p[1]]++' file
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
answered Jan 3 at 22:49
karakfakarakfa
50.8k52940
50.8k52940
can be shortened furtherawk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt
– stack0114106
Jan 3 at 23:11
My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.
– William Pursell
Jan 4 at 4:06
add a comment |
can be shortened furtherawk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt
– stack0114106
Jan 3 at 23:11
My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.
– William Pursell
Jan 4 at 4:06
can be shortened further
awk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt
– stack0114106
Jan 3 at 23:11
can be shortened further
awk ' split($2,p,"_") && ! a[p[1]]++ ' exp.txt
– stack0114106
Jan 3 at 23:11
My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.
– William Pursell
Jan 4 at 4:06
My solution is the code golf winner, but I'm not sure it's "best". A bit too obscure, really.
– William Pursell
Jan 4 at 4:06
add a comment |
Perl
$ perl -lane ' $F[1]=~/(.+)_/; print unless $kv{$1}++ ' exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
add a comment |
Perl
$ perl -lane ' $F[1]=~/(.+)_/; print unless $kv{$1}++ ' exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
add a comment |
Perl
$ perl -lane ' $F[1]=~/(.+)_/; print unless $kv{$1}++ ' exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
Perl
$ perl -lane ' $F[1]=~/(.+)_/; print unless $kv{$1}++ ' exp.txt
POS ID REF ALT QUAL FILTER
182 3_12 G A . PASS
199 4_22 G A . PASS
201 10_22 A A . PASS
answered Jan 3 at 23:05
stack0114106stack0114106
4,9832423
4,9832423
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54030248%2fonly-retain-lines-with-first-instance-of-pattern-for-multiple-patterns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
You will get a much more friendly reception and much better help here if you show what code you have tried so far and describe what problems you were having with it. Without code, your question looks like a request for free consulting and many people don't like that.
– John1024
Jan 3 at 21:56