Duplicated data count as per baseline date
I’d like to create a query that checks if there is any duplicated ID that has been used as per date.
To this end, 60 days prior to, not the subsequent to, the baseline date, should be reviewed for any duplication.
A table example is as below.
CREATE TABLE SampleTable (
pKey INT PRIMARY KEY,
personalID INT NOT NULL,
createDate DATETIME NOT NULL,
value INT NULL
);
The baseline date corresponds to DATE(createDate)
whereas ID to be checked for duplication is personallD
.
Information needed herein can boil down to the number of today’s data and that of duplicated data.
The number of today’s data can be easily queried as follows.
SELECT
COUNT(*)
FROM SampleTable
WHERE
DATE(createDate) = DATE(NOW())
Out of today’s data, in addition, the number of duplicated data can be checked as follows.
SELECT
COUNT(*)
FROM (
SELECT
personalID,
COUNT(*)
FROM SampleTable
WHERE
DATEDIFF(NOW(), trDate) <= 60
GROUP BY personalID HAVING COUNT(*) > 1
) AS T
In conclusion, all I need to do is get the number of total data by date and the number of data that have the same personalID
during the previous days.
[Sample Data]
pKey personalID createDate value
1 1 2018-01-01 100
2 2 2018-01-01 300
3 3 2018-01-01 500
7 1 2018-01-02 100
8 2 2018-01-02 200
9 3 2018-01-02 200
10 4 2018-01-02 100
11 5 2018-01-02 100
12 3 2018-01-03 200
13 4 2018-01-03 100
14 5 2018-01-03 100
15 6 2018-01-03 50
[Desired result]
date totalCount duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 4 3
mysql sql
add a comment |
I’d like to create a query that checks if there is any duplicated ID that has been used as per date.
To this end, 60 days prior to, not the subsequent to, the baseline date, should be reviewed for any duplication.
A table example is as below.
CREATE TABLE SampleTable (
pKey INT PRIMARY KEY,
personalID INT NOT NULL,
createDate DATETIME NOT NULL,
value INT NULL
);
The baseline date corresponds to DATE(createDate)
whereas ID to be checked for duplication is personallD
.
Information needed herein can boil down to the number of today’s data and that of duplicated data.
The number of today’s data can be easily queried as follows.
SELECT
COUNT(*)
FROM SampleTable
WHERE
DATE(createDate) = DATE(NOW())
Out of today’s data, in addition, the number of duplicated data can be checked as follows.
SELECT
COUNT(*)
FROM (
SELECT
personalID,
COUNT(*)
FROM SampleTable
WHERE
DATEDIFF(NOW(), trDate) <= 60
GROUP BY personalID HAVING COUNT(*) > 1
) AS T
In conclusion, all I need to do is get the number of total data by date and the number of data that have the same personalID
during the previous days.
[Sample Data]
pKey personalID createDate value
1 1 2018-01-01 100
2 2 2018-01-01 300
3 3 2018-01-01 500
7 1 2018-01-02 100
8 2 2018-01-02 200
9 3 2018-01-02 200
10 4 2018-01-02 100
11 5 2018-01-02 100
12 3 2018-01-03 200
13 4 2018-01-03 100
14 5 2018-01-03 100
15 6 2018-01-03 50
[Desired result]
date totalCount duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 4 3
mysql sql
Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.
– Gordon Linoff
Dec 31 '18 at 1:42
add a comment |
I’d like to create a query that checks if there is any duplicated ID that has been used as per date.
To this end, 60 days prior to, not the subsequent to, the baseline date, should be reviewed for any duplication.
A table example is as below.
CREATE TABLE SampleTable (
pKey INT PRIMARY KEY,
personalID INT NOT NULL,
createDate DATETIME NOT NULL,
value INT NULL
);
The baseline date corresponds to DATE(createDate)
whereas ID to be checked for duplication is personallD
.
Information needed herein can boil down to the number of today’s data and that of duplicated data.
The number of today’s data can be easily queried as follows.
SELECT
COUNT(*)
FROM SampleTable
WHERE
DATE(createDate) = DATE(NOW())
Out of today’s data, in addition, the number of duplicated data can be checked as follows.
SELECT
COUNT(*)
FROM (
SELECT
personalID,
COUNT(*)
FROM SampleTable
WHERE
DATEDIFF(NOW(), trDate) <= 60
GROUP BY personalID HAVING COUNT(*) > 1
) AS T
In conclusion, all I need to do is get the number of total data by date and the number of data that have the same personalID
during the previous days.
[Sample Data]
pKey personalID createDate value
1 1 2018-01-01 100
2 2 2018-01-01 300
3 3 2018-01-01 500
7 1 2018-01-02 100
8 2 2018-01-02 200
9 3 2018-01-02 200
10 4 2018-01-02 100
11 5 2018-01-02 100
12 3 2018-01-03 200
13 4 2018-01-03 100
14 5 2018-01-03 100
15 6 2018-01-03 50
[Desired result]
date totalCount duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 4 3
mysql sql
I’d like to create a query that checks if there is any duplicated ID that has been used as per date.
To this end, 60 days prior to, not the subsequent to, the baseline date, should be reviewed for any duplication.
A table example is as below.
CREATE TABLE SampleTable (
pKey INT PRIMARY KEY,
personalID INT NOT NULL,
createDate DATETIME NOT NULL,
value INT NULL
);
The baseline date corresponds to DATE(createDate)
whereas ID to be checked for duplication is personallD
.
Information needed herein can boil down to the number of today’s data and that of duplicated data.
The number of today’s data can be easily queried as follows.
SELECT
COUNT(*)
FROM SampleTable
WHERE
DATE(createDate) = DATE(NOW())
Out of today’s data, in addition, the number of duplicated data can be checked as follows.
SELECT
COUNT(*)
FROM (
SELECT
personalID,
COUNT(*)
FROM SampleTable
WHERE
DATEDIFF(NOW(), trDate) <= 60
GROUP BY personalID HAVING COUNT(*) > 1
) AS T
In conclusion, all I need to do is get the number of total data by date and the number of data that have the same personalID
during the previous days.
[Sample Data]
pKey personalID createDate value
1 1 2018-01-01 100
2 2 2018-01-01 300
3 3 2018-01-01 500
7 1 2018-01-02 100
8 2 2018-01-02 200
9 3 2018-01-02 200
10 4 2018-01-02 100
11 5 2018-01-02 100
12 3 2018-01-03 200
13 4 2018-01-03 100
14 5 2018-01-03 100
15 6 2018-01-03 50
[Desired result]
date totalCount duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 4 3
mysql sql
mysql sql
edited Dec 31 '18 at 1:50
Keith Park
asked Dec 31 '18 at 1:41
Keith ParkKeith Park
2521314
2521314
Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.
– Gordon Linoff
Dec 31 '18 at 1:42
add a comment |
Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.
– Gordon Linoff
Dec 31 '18 at 1:42
Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.
– Gordon Linoff
Dec 31 '18 at 1:42
Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.
– Gordon Linoff
Dec 31 '18 at 1:42
add a comment |
2 Answers
2
active
oldest
votes
If you want the ids that have multiple rows in the past 60 days:
select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2;
If you want to also insist that the personid
appears on the most recent date:
select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2 and date(max(trdate)) = curdate();
EDIT:
This seems to be what you want, assuming no duplicates on a given day
select trdate, count(*) as num_persons,
sum(num_dups > 0) as num_dups
from (select sd.*,
(select count(*)
from sampledata sd2
where sd2.personid = sd.personid and
sd2.trdate < sd.trdate and
sd2.trdate >= sd.trdate - interval 60 day
) as num_dups
from sampledata
) sd
group by trdate;
Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.
– Keith Park
Dec 31 '18 at 1:54
Works pretty fine. But, considering performance, will it be okay?
– Keith Park
Dec 31 '18 at 5:52
@KeithPark . . . An index on (personId, trdate)` would help performance.
– Gordon Linoff
Dec 31 '18 at 12:43
add a comment |
You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.
Create table Testtbl (pkey int, personalID int, createddate date, value int);
insert into Testtbl values
(1 , 1, '2018-01-01' , 100) ,
(2 , 2, '2018-01-01' , 300) ,
(3 , 3, '2018-01-01' , 500) ,
(4 , 1, '2018-01-02' , 100) ,
(5 , 2, '2018-01-02' , 200) ,
(6 , 3, '2018-01-02' , 200) ,
(7 , 4, '2018-01-02' , 100) ,
(8 , 5, '2018-01-02' , 100) ,
(9 , 3, '2018-01-03' , 200) ,
(14 , 3, '2018-01-03' , 500) ,
(10 , 4, '2018-01-03' , 100) ,
(11 , 5, '2018-01-03' , 100) ,
(12 , 6, '2018-01-03' , 50),
(13 , 6, '2018-01-03' , 100)
Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .
select t.createddate, count(Distinct t.pkey) TotalCount,
case when t.Createddate > t1.createddate
then Count(distinct t1.PersonalID) + case when t.Createddate =
t1.createddate and
t.personalID = t1.personalID and t.pkey != t1.pkey then Count(distinct
t1.PersonalID)
else 0 end else 0
end Duplicated from Testtbl t
left join Testtbl t1 on t.personalID = t1.personalID
and t.Createddate >= t1.Createddate and t.pkey != t1.pkey
and DATEDIFF(t1.Createddate, t.Createddate) <= 60
Group by t.createddate
Output:
createddate TotalCount Duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 6 5
Your query doesn't consider duplication on the same day.
– Keith Park
Dec 31 '18 at 5:10
@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6
– Avi
Dec 31 '18 at 5:34
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53982852%2fduplicated-data-count-as-per-baseline-date%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
If you want the ids that have multiple rows in the past 60 days:
select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2;
If you want to also insist that the personid
appears on the most recent date:
select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2 and date(max(trdate)) = curdate();
EDIT:
This seems to be what you want, assuming no duplicates on a given day
select trdate, count(*) as num_persons,
sum(num_dups > 0) as num_dups
from (select sd.*,
(select count(*)
from sampledata sd2
where sd2.personid = sd.personid and
sd2.trdate < sd.trdate and
sd2.trdate >= sd.trdate - interval 60 day
) as num_dups
from sampledata
) sd
group by trdate;
Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.
– Keith Park
Dec 31 '18 at 1:54
Works pretty fine. But, considering performance, will it be okay?
– Keith Park
Dec 31 '18 at 5:52
@KeithPark . . . An index on (personId, trdate)` would help performance.
– Gordon Linoff
Dec 31 '18 at 12:43
add a comment |
If you want the ids that have multiple rows in the past 60 days:
select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2;
If you want to also insist that the personid
appears on the most recent date:
select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2 and date(max(trdate)) = curdate();
EDIT:
This seems to be what you want, assuming no duplicates on a given day
select trdate, count(*) as num_persons,
sum(num_dups > 0) as num_dups
from (select sd.*,
(select count(*)
from sampledata sd2
where sd2.personid = sd.personid and
sd2.trdate < sd.trdate and
sd2.trdate >= sd.trdate - interval 60 day
) as num_dups
from sampledata
) sd
group by trdate;
Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.
– Keith Park
Dec 31 '18 at 1:54
Works pretty fine. But, considering performance, will it be okay?
– Keith Park
Dec 31 '18 at 5:52
@KeithPark . . . An index on (personId, trdate)` would help performance.
– Gordon Linoff
Dec 31 '18 at 12:43
add a comment |
If you want the ids that have multiple rows in the past 60 days:
select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2;
If you want to also insist that the personid
appears on the most recent date:
select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2 and date(max(trdate)) = curdate();
EDIT:
This seems to be what you want, assuming no duplicates on a given day
select trdate, count(*) as num_persons,
sum(num_dups > 0) as num_dups
from (select sd.*,
(select count(*)
from sampledata sd2
where sd2.personid = sd.personid and
sd2.trdate < sd.trdate and
sd2.trdate >= sd.trdate - interval 60 day
) as num_dups
from sampledata
) sd
group by trdate;
If you want the ids that have multiple rows in the past 60 days:
select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2;
If you want to also insist that the personid
appears on the most recent date:
select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2 and date(max(trdate)) = curdate();
EDIT:
This seems to be what you want, assuming no duplicates on a given day
select trdate, count(*) as num_persons,
sum(num_dups > 0) as num_dups
from (select sd.*,
(select count(*)
from sampledata sd2
where sd2.personid = sd.personid and
sd2.trdate < sd.trdate and
sd2.trdate >= sd.trdate - interval 60 day
) as num_dups
from sampledata
) sd
group by trdate;
edited Dec 31 '18 at 2:03
answered Dec 31 '18 at 1:44
Gordon LinoffGordon Linoff
770k35304404
770k35304404
Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.
– Keith Park
Dec 31 '18 at 1:54
Works pretty fine. But, considering performance, will it be okay?
– Keith Park
Dec 31 '18 at 5:52
@KeithPark . . . An index on (personId, trdate)` would help performance.
– Gordon Linoff
Dec 31 '18 at 12:43
add a comment |
Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.
– Keith Park
Dec 31 '18 at 1:54
Works pretty fine. But, considering performance, will it be okay?
– Keith Park
Dec 31 '18 at 5:52
@KeithPark . . . An index on (personId, trdate)` would help performance.
– Gordon Linoff
Dec 31 '18 at 12:43
Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.
– Keith Park
Dec 31 '18 at 1:54
Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.
– Keith Park
Dec 31 '18 at 1:54
Works pretty fine. But, considering performance, will it be okay?
– Keith Park
Dec 31 '18 at 5:52
Works pretty fine. But, considering performance, will it be okay?
– Keith Park
Dec 31 '18 at 5:52
@KeithPark . . . An index on (personId, trdate)` would help performance.
– Gordon Linoff
Dec 31 '18 at 12:43
@KeithPark . . . An index on (personId, trdate)` would help performance.
– Gordon Linoff
Dec 31 '18 at 12:43
add a comment |
You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.
Create table Testtbl (pkey int, personalID int, createddate date, value int);
insert into Testtbl values
(1 , 1, '2018-01-01' , 100) ,
(2 , 2, '2018-01-01' , 300) ,
(3 , 3, '2018-01-01' , 500) ,
(4 , 1, '2018-01-02' , 100) ,
(5 , 2, '2018-01-02' , 200) ,
(6 , 3, '2018-01-02' , 200) ,
(7 , 4, '2018-01-02' , 100) ,
(8 , 5, '2018-01-02' , 100) ,
(9 , 3, '2018-01-03' , 200) ,
(14 , 3, '2018-01-03' , 500) ,
(10 , 4, '2018-01-03' , 100) ,
(11 , 5, '2018-01-03' , 100) ,
(12 , 6, '2018-01-03' , 50),
(13 , 6, '2018-01-03' , 100)
Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .
select t.createddate, count(Distinct t.pkey) TotalCount,
case when t.Createddate > t1.createddate
then Count(distinct t1.PersonalID) + case when t.Createddate =
t1.createddate and
t.personalID = t1.personalID and t.pkey != t1.pkey then Count(distinct
t1.PersonalID)
else 0 end else 0
end Duplicated from Testtbl t
left join Testtbl t1 on t.personalID = t1.personalID
and t.Createddate >= t1.Createddate and t.pkey != t1.pkey
and DATEDIFF(t1.Createddate, t.Createddate) <= 60
Group by t.createddate
Output:
createddate TotalCount Duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 6 5
Your query doesn't consider duplication on the same day.
– Keith Park
Dec 31 '18 at 5:10
@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6
– Avi
Dec 31 '18 at 5:34
add a comment |
You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.
Create table Testtbl (pkey int, personalID int, createddate date, value int);
insert into Testtbl values
(1 , 1, '2018-01-01' , 100) ,
(2 , 2, '2018-01-01' , 300) ,
(3 , 3, '2018-01-01' , 500) ,
(4 , 1, '2018-01-02' , 100) ,
(5 , 2, '2018-01-02' , 200) ,
(6 , 3, '2018-01-02' , 200) ,
(7 , 4, '2018-01-02' , 100) ,
(8 , 5, '2018-01-02' , 100) ,
(9 , 3, '2018-01-03' , 200) ,
(14 , 3, '2018-01-03' , 500) ,
(10 , 4, '2018-01-03' , 100) ,
(11 , 5, '2018-01-03' , 100) ,
(12 , 6, '2018-01-03' , 50),
(13 , 6, '2018-01-03' , 100)
Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .
select t.createddate, count(Distinct t.pkey) TotalCount,
case when t.Createddate > t1.createddate
then Count(distinct t1.PersonalID) + case when t.Createddate =
t1.createddate and
t.personalID = t1.personalID and t.pkey != t1.pkey then Count(distinct
t1.PersonalID)
else 0 end else 0
end Duplicated from Testtbl t
left join Testtbl t1 on t.personalID = t1.personalID
and t.Createddate >= t1.Createddate and t.pkey != t1.pkey
and DATEDIFF(t1.Createddate, t.Createddate) <= 60
Group by t.createddate
Output:
createddate TotalCount Duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 6 5
Your query doesn't consider duplication on the same day.
– Keith Park
Dec 31 '18 at 5:10
@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6
– Avi
Dec 31 '18 at 5:34
add a comment |
You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.
Create table Testtbl (pkey int, personalID int, createddate date, value int);
insert into Testtbl values
(1 , 1, '2018-01-01' , 100) ,
(2 , 2, '2018-01-01' , 300) ,
(3 , 3, '2018-01-01' , 500) ,
(4 , 1, '2018-01-02' , 100) ,
(5 , 2, '2018-01-02' , 200) ,
(6 , 3, '2018-01-02' , 200) ,
(7 , 4, '2018-01-02' , 100) ,
(8 , 5, '2018-01-02' , 100) ,
(9 , 3, '2018-01-03' , 200) ,
(14 , 3, '2018-01-03' , 500) ,
(10 , 4, '2018-01-03' , 100) ,
(11 , 5, '2018-01-03' , 100) ,
(12 , 6, '2018-01-03' , 50),
(13 , 6, '2018-01-03' , 100)
Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .
select t.createddate, count(Distinct t.pkey) TotalCount,
case when t.Createddate > t1.createddate
then Count(distinct t1.PersonalID) + case when t.Createddate =
t1.createddate and
t.personalID = t1.personalID and t.pkey != t1.pkey then Count(distinct
t1.PersonalID)
else 0 end else 0
end Duplicated from Testtbl t
left join Testtbl t1 on t.personalID = t1.personalID
and t.Createddate >= t1.Createddate and t.pkey != t1.pkey
and DATEDIFF(t1.Createddate, t.Createddate) <= 60
Group by t.createddate
Output:
createddate TotalCount Duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 6 5
You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.
Create table Testtbl (pkey int, personalID int, createddate date, value int);
insert into Testtbl values
(1 , 1, '2018-01-01' , 100) ,
(2 , 2, '2018-01-01' , 300) ,
(3 , 3, '2018-01-01' , 500) ,
(4 , 1, '2018-01-02' , 100) ,
(5 , 2, '2018-01-02' , 200) ,
(6 , 3, '2018-01-02' , 200) ,
(7 , 4, '2018-01-02' , 100) ,
(8 , 5, '2018-01-02' , 100) ,
(9 , 3, '2018-01-03' , 200) ,
(14 , 3, '2018-01-03' , 500) ,
(10 , 4, '2018-01-03' , 100) ,
(11 , 5, '2018-01-03' , 100) ,
(12 , 6, '2018-01-03' , 50),
(13 , 6, '2018-01-03' , 100)
Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .
select t.createddate, count(Distinct t.pkey) TotalCount,
case when t.Createddate > t1.createddate
then Count(distinct t1.PersonalID) + case when t.Createddate =
t1.createddate and
t.personalID = t1.personalID and t.pkey != t1.pkey then Count(distinct
t1.PersonalID)
else 0 end else 0
end Duplicated from Testtbl t
left join Testtbl t1 on t.personalID = t1.personalID
and t.Createddate >= t1.Createddate and t.pkey != t1.pkey
and DATEDIFF(t1.Createddate, t.Createddate) <= 60
Group by t.createddate
Output:
createddate TotalCount Duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 6 5
edited Dec 31 '18 at 5:33
answered Dec 31 '18 at 3:13
AviAvi
735212
735212
Your query doesn't consider duplication on the same day.
– Keith Park
Dec 31 '18 at 5:10
@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6
– Avi
Dec 31 '18 at 5:34
add a comment |
Your query doesn't consider duplication on the same day.
– Keith Park
Dec 31 '18 at 5:10
@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6
– Avi
Dec 31 '18 at 5:34
Your query doesn't consider duplication on the same day.
– Keith Park
Dec 31 '18 at 5:10
Your query doesn't consider duplication on the same day.
– Keith Park
Dec 31 '18 at 5:10
@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6
– Avi
Dec 31 '18 at 5:34
@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6
– Avi
Dec 31 '18 at 5:34
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53982852%2fduplicated-data-count-as-per-baseline-date%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.
– Gordon Linoff
Dec 31 '18 at 1:42