Duplicated data count as per baseline date












0















I’d like to create a query that checks if there is any duplicated ID that has been used as per date.



To this end, 60 days prior to, not the subsequent to, the baseline date, should be reviewed for any duplication.
A table example is as below.



CREATE TABLE SampleTable (
pKey INT PRIMARY KEY,
personalID INT NOT NULL,
createDate DATETIME NOT NULL,
value INT NULL
);


The baseline date corresponds to DATE(createDate) whereas ID to be checked for duplication is personallD.
Information needed herein can boil down to the number of today’s data and that of duplicated data.
The number of today’s data can be easily queried as follows.



SELECT
COUNT(*)
FROM SampleTable
WHERE
DATE(createDate) = DATE(NOW())


Out of today’s data, in addition, the number of duplicated data can be checked as follows.



SELECT
COUNT(*)
FROM (
SELECT
personalID,
COUNT(*)
FROM SampleTable
WHERE
DATEDIFF(NOW(), trDate) <= 60
GROUP BY personalID HAVING COUNT(*) > 1
) AS T


In conclusion, all I need to do is get the number of total data by date and the number of data that have the same personalID during the previous days.



[Sample Data]



pKey    personalID  createDate  value
1 1 2018-01-01 100
2 2 2018-01-01 300
3 3 2018-01-01 500
7 1 2018-01-02 100
8 2 2018-01-02 200
9 3 2018-01-02 200
10 4 2018-01-02 100
11 5 2018-01-02 100
12 3 2018-01-03 200
13 4 2018-01-03 100
14 5 2018-01-03 100
15 6 2018-01-03 50


[Desired result]



date        totalCount  duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 4 3









share|improve this question

























  • Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.

    – Gordon Linoff
    Dec 31 '18 at 1:42
















0















I’d like to create a query that checks if there is any duplicated ID that has been used as per date.



To this end, 60 days prior to, not the subsequent to, the baseline date, should be reviewed for any duplication.
A table example is as below.



CREATE TABLE SampleTable (
pKey INT PRIMARY KEY,
personalID INT NOT NULL,
createDate DATETIME NOT NULL,
value INT NULL
);


The baseline date corresponds to DATE(createDate) whereas ID to be checked for duplication is personallD.
Information needed herein can boil down to the number of today’s data and that of duplicated data.
The number of today’s data can be easily queried as follows.



SELECT
COUNT(*)
FROM SampleTable
WHERE
DATE(createDate) = DATE(NOW())


Out of today’s data, in addition, the number of duplicated data can be checked as follows.



SELECT
COUNT(*)
FROM (
SELECT
personalID,
COUNT(*)
FROM SampleTable
WHERE
DATEDIFF(NOW(), trDate) <= 60
GROUP BY personalID HAVING COUNT(*) > 1
) AS T


In conclusion, all I need to do is get the number of total data by date and the number of data that have the same personalID during the previous days.



[Sample Data]



pKey    personalID  createDate  value
1 1 2018-01-01 100
2 2 2018-01-01 300
3 3 2018-01-01 500
7 1 2018-01-02 100
8 2 2018-01-02 200
9 3 2018-01-02 200
10 4 2018-01-02 100
11 5 2018-01-02 100
12 3 2018-01-03 200
13 4 2018-01-03 100
14 5 2018-01-03 100
15 6 2018-01-03 50


[Desired result]



date        totalCount  duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 4 3









share|improve this question

























  • Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.

    – Gordon Linoff
    Dec 31 '18 at 1:42














0












0








0








I’d like to create a query that checks if there is any duplicated ID that has been used as per date.



To this end, 60 days prior to, not the subsequent to, the baseline date, should be reviewed for any duplication.
A table example is as below.



CREATE TABLE SampleTable (
pKey INT PRIMARY KEY,
personalID INT NOT NULL,
createDate DATETIME NOT NULL,
value INT NULL
);


The baseline date corresponds to DATE(createDate) whereas ID to be checked for duplication is personallD.
Information needed herein can boil down to the number of today’s data and that of duplicated data.
The number of today’s data can be easily queried as follows.



SELECT
COUNT(*)
FROM SampleTable
WHERE
DATE(createDate) = DATE(NOW())


Out of today’s data, in addition, the number of duplicated data can be checked as follows.



SELECT
COUNT(*)
FROM (
SELECT
personalID,
COUNT(*)
FROM SampleTable
WHERE
DATEDIFF(NOW(), trDate) <= 60
GROUP BY personalID HAVING COUNT(*) > 1
) AS T


In conclusion, all I need to do is get the number of total data by date and the number of data that have the same personalID during the previous days.



[Sample Data]



pKey    personalID  createDate  value
1 1 2018-01-01 100
2 2 2018-01-01 300
3 3 2018-01-01 500
7 1 2018-01-02 100
8 2 2018-01-02 200
9 3 2018-01-02 200
10 4 2018-01-02 100
11 5 2018-01-02 100
12 3 2018-01-03 200
13 4 2018-01-03 100
14 5 2018-01-03 100
15 6 2018-01-03 50


[Desired result]



date        totalCount  duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 4 3









share|improve this question
















I’d like to create a query that checks if there is any duplicated ID that has been used as per date.



To this end, 60 days prior to, not the subsequent to, the baseline date, should be reviewed for any duplication.
A table example is as below.



CREATE TABLE SampleTable (
pKey INT PRIMARY KEY,
personalID INT NOT NULL,
createDate DATETIME NOT NULL,
value INT NULL
);


The baseline date corresponds to DATE(createDate) whereas ID to be checked for duplication is personallD.
Information needed herein can boil down to the number of today’s data and that of duplicated data.
The number of today’s data can be easily queried as follows.



SELECT
COUNT(*)
FROM SampleTable
WHERE
DATE(createDate) = DATE(NOW())


Out of today’s data, in addition, the number of duplicated data can be checked as follows.



SELECT
COUNT(*)
FROM (
SELECT
personalID,
COUNT(*)
FROM SampleTable
WHERE
DATEDIFF(NOW(), trDate) <= 60
GROUP BY personalID HAVING COUNT(*) > 1
) AS T


In conclusion, all I need to do is get the number of total data by date and the number of data that have the same personalID during the previous days.



[Sample Data]



pKey    personalID  createDate  value
1 1 2018-01-01 100
2 2 2018-01-01 300
3 3 2018-01-01 500
7 1 2018-01-02 100
8 2 2018-01-02 200
9 3 2018-01-02 200
10 4 2018-01-02 100
11 5 2018-01-02 100
12 3 2018-01-03 200
13 4 2018-01-03 100
14 5 2018-01-03 100
15 6 2018-01-03 50


[Desired result]



date        totalCount  duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 4 3






mysql sql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 31 '18 at 1:50







Keith Park

















asked Dec 31 '18 at 1:41









Keith ParkKeith Park

2521314




2521314













  • Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.

    – Gordon Linoff
    Dec 31 '18 at 1:42



















  • Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.

    – Gordon Linoff
    Dec 31 '18 at 1:42

















Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.

– Gordon Linoff
Dec 31 '18 at 1:42





Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.

– Gordon Linoff
Dec 31 '18 at 1:42












2 Answers
2






active

oldest

votes


















1














If you want the ids that have multiple rows in the past 60 days:



select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2;


If you want to also insist that the personid appears on the most recent date:



select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2 and date(max(trdate)) = curdate();


EDIT:



This seems to be what you want, assuming no duplicates on a given day



select trdate, count(*) as num_persons,
sum(num_dups > 0) as num_dups
from (select sd.*,
(select count(*)
from sampledata sd2
where sd2.personid = sd.personid and
sd2.trdate < sd.trdate and
sd2.trdate >= sd.trdate - interval 60 day
) as num_dups
from sampledata
) sd
group by trdate;





share|improve this answer


























  • Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

    – Keith Park
    Dec 31 '18 at 1:54











  • Works pretty fine. But, considering performance, will it be okay?

    – Keith Park
    Dec 31 '18 at 5:52











  • @KeithPark . . . An index on (personId, trdate)` would help performance.

    – Gordon Linoff
    Dec 31 '18 at 12:43



















1














You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.



Create table Testtbl (pkey int, personalID int, createddate date, value int); 

insert into Testtbl values
(1 , 1, '2018-01-01' , 100) ,
(2 , 2, '2018-01-01' , 300) ,
(3 , 3, '2018-01-01' , 500) ,
(4 , 1, '2018-01-02' , 100) ,
(5 , 2, '2018-01-02' , 200) ,
(6 , 3, '2018-01-02' , 200) ,
(7 , 4, '2018-01-02' , 100) ,
(8 , 5, '2018-01-02' , 100) ,
(9 , 3, '2018-01-03' , 200) ,
(14 , 3, '2018-01-03' , 500) ,
(10 , 4, '2018-01-03' , 100) ,
(11 , 5, '2018-01-03' , 100) ,
(12 , 6, '2018-01-03' , 50),
(13 , 6, '2018-01-03' , 100)


Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .



  select  t.createddate, count(Distinct t.pkey) TotalCount, 
case when t.Createddate > t1.createddate
then Count(distinct t1.PersonalID) + case when t.Createddate =
t1.createddate and
t.personalID = t1.personalID and t.pkey != t1.pkey then Count(distinct
t1.PersonalID)
else 0 end else 0
end Duplicated from Testtbl t
left join Testtbl t1 on t.personalID = t1.personalID
and t.Createddate >= t1.Createddate and t.pkey != t1.pkey
and DATEDIFF(t1.Createddate, t.Createddate) <= 60
Group by t.createddate


Output:



createddate  TotalCount Duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 6 5





share|improve this answer


























  • Your query doesn't consider duplication on the same day.

    – Keith Park
    Dec 31 '18 at 5:10











  • @KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

    – Avi
    Dec 31 '18 at 5:34











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53982852%2fduplicated-data-count-as-per-baseline-date%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














If you want the ids that have multiple rows in the past 60 days:



select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2;


If you want to also insist that the personid appears on the most recent date:



select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2 and date(max(trdate)) = curdate();


EDIT:



This seems to be what you want, assuming no duplicates on a given day



select trdate, count(*) as num_persons,
sum(num_dups > 0) as num_dups
from (select sd.*,
(select count(*)
from sampledata sd2
where sd2.personid = sd.personid and
sd2.trdate < sd.trdate and
sd2.trdate >= sd.trdate - interval 60 day
) as num_dups
from sampledata
) sd
group by trdate;





share|improve this answer


























  • Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

    – Keith Park
    Dec 31 '18 at 1:54











  • Works pretty fine. But, considering performance, will it be okay?

    – Keith Park
    Dec 31 '18 at 5:52











  • @KeithPark . . . An index on (personId, trdate)` would help performance.

    – Gordon Linoff
    Dec 31 '18 at 12:43
















1














If you want the ids that have multiple rows in the past 60 days:



select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2;


If you want to also insist that the personid appears on the most recent date:



select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2 and date(max(trdate)) = curdate();


EDIT:



This seems to be what you want, assuming no duplicates on a given day



select trdate, count(*) as num_persons,
sum(num_dups > 0) as num_dups
from (select sd.*,
(select count(*)
from sampledata sd2
where sd2.personid = sd.personid and
sd2.trdate < sd.trdate and
sd2.trdate >= sd.trdate - interval 60 day
) as num_dups
from sampledata
) sd
group by trdate;





share|improve this answer


























  • Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

    – Keith Park
    Dec 31 '18 at 1:54











  • Works pretty fine. But, considering performance, will it be okay?

    – Keith Park
    Dec 31 '18 at 5:52











  • @KeithPark . . . An index on (personId, trdate)` would help performance.

    – Gordon Linoff
    Dec 31 '18 at 12:43














1












1








1







If you want the ids that have multiple rows in the past 60 days:



select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2;


If you want to also insist that the personid appears on the most recent date:



select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2 and date(max(trdate)) = curdate();


EDIT:



This seems to be what you want, assuming no duplicates on a given day



select trdate, count(*) as num_persons,
sum(num_dups > 0) as num_dups
from (select sd.*,
(select count(*)
from sampledata sd2
where sd2.personid = sd.personid and
sd2.trdate < sd.trdate and
sd2.trdate >= sd.trdate - interval 60 day
) as num_dups
from sampledata
) sd
group by trdate;





share|improve this answer















If you want the ids that have multiple rows in the past 60 days:



select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2;


If you want to also insist that the personid appears on the most recent date:



select personid
from sampledata
where trdate >= curdate() - interval 60 day
group by personid
having count(*) >= 2 and date(max(trdate)) = curdate();


EDIT:



This seems to be what you want, assuming no duplicates on a given day



select trdate, count(*) as num_persons,
sum(num_dups > 0) as num_dups
from (select sd.*,
(select count(*)
from sampledata sd2
where sd2.personid = sd.personid and
sd2.trdate < sd.trdate and
sd2.trdate >= sd.trdate - interval 60 day
) as num_dups
from sampledata
) sd
group by trdate;






share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 31 '18 at 2:03

























answered Dec 31 '18 at 1:44









Gordon LinoffGordon Linoff

770k35304404




770k35304404













  • Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

    – Keith Park
    Dec 31 '18 at 1:54











  • Works pretty fine. But, considering performance, will it be okay?

    – Keith Park
    Dec 31 '18 at 5:52











  • @KeithPark . . . An index on (personId, trdate)` would help performance.

    – Gordon Linoff
    Dec 31 '18 at 12:43



















  • Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

    – Keith Park
    Dec 31 '18 at 1:54











  • Works pretty fine. But, considering performance, will it be okay?

    – Keith Park
    Dec 31 '18 at 5:52











  • @KeithPark . . . An index on (personId, trdate)` would help performance.

    – Gordon Linoff
    Dec 31 '18 at 12:43

















Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

– Keith Park
Dec 31 '18 at 1:54





Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

– Keith Park
Dec 31 '18 at 1:54













Works pretty fine. But, considering performance, will it be okay?

– Keith Park
Dec 31 '18 at 5:52





Works pretty fine. But, considering performance, will it be okay?

– Keith Park
Dec 31 '18 at 5:52













@KeithPark . . . An index on (personId, trdate)` would help performance.

– Gordon Linoff
Dec 31 '18 at 12:43





@KeithPark . . . An index on (personId, trdate)` would help performance.

– Gordon Linoff
Dec 31 '18 at 12:43













1














You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.



Create table Testtbl (pkey int, personalID int, createddate date, value int); 

insert into Testtbl values
(1 , 1, '2018-01-01' , 100) ,
(2 , 2, '2018-01-01' , 300) ,
(3 , 3, '2018-01-01' , 500) ,
(4 , 1, '2018-01-02' , 100) ,
(5 , 2, '2018-01-02' , 200) ,
(6 , 3, '2018-01-02' , 200) ,
(7 , 4, '2018-01-02' , 100) ,
(8 , 5, '2018-01-02' , 100) ,
(9 , 3, '2018-01-03' , 200) ,
(14 , 3, '2018-01-03' , 500) ,
(10 , 4, '2018-01-03' , 100) ,
(11 , 5, '2018-01-03' , 100) ,
(12 , 6, '2018-01-03' , 50),
(13 , 6, '2018-01-03' , 100)


Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .



  select  t.createddate, count(Distinct t.pkey) TotalCount, 
case when t.Createddate > t1.createddate
then Count(distinct t1.PersonalID) + case when t.Createddate =
t1.createddate and
t.personalID = t1.personalID and t.pkey != t1.pkey then Count(distinct
t1.PersonalID)
else 0 end else 0
end Duplicated from Testtbl t
left join Testtbl t1 on t.personalID = t1.personalID
and t.Createddate >= t1.Createddate and t.pkey != t1.pkey
and DATEDIFF(t1.Createddate, t.Createddate) <= 60
Group by t.createddate


Output:



createddate  TotalCount Duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 6 5





share|improve this answer


























  • Your query doesn't consider duplication on the same day.

    – Keith Park
    Dec 31 '18 at 5:10











  • @KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

    – Avi
    Dec 31 '18 at 5:34
















1














You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.



Create table Testtbl (pkey int, personalID int, createddate date, value int); 

insert into Testtbl values
(1 , 1, '2018-01-01' , 100) ,
(2 , 2, '2018-01-01' , 300) ,
(3 , 3, '2018-01-01' , 500) ,
(4 , 1, '2018-01-02' , 100) ,
(5 , 2, '2018-01-02' , 200) ,
(6 , 3, '2018-01-02' , 200) ,
(7 , 4, '2018-01-02' , 100) ,
(8 , 5, '2018-01-02' , 100) ,
(9 , 3, '2018-01-03' , 200) ,
(14 , 3, '2018-01-03' , 500) ,
(10 , 4, '2018-01-03' , 100) ,
(11 , 5, '2018-01-03' , 100) ,
(12 , 6, '2018-01-03' , 50),
(13 , 6, '2018-01-03' , 100)


Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .



  select  t.createddate, count(Distinct t.pkey) TotalCount, 
case when t.Createddate > t1.createddate
then Count(distinct t1.PersonalID) + case when t.Createddate =
t1.createddate and
t.personalID = t1.personalID and t.pkey != t1.pkey then Count(distinct
t1.PersonalID)
else 0 end else 0
end Duplicated from Testtbl t
left join Testtbl t1 on t.personalID = t1.personalID
and t.Createddate >= t1.Createddate and t.pkey != t1.pkey
and DATEDIFF(t1.Createddate, t.Createddate) <= 60
Group by t.createddate


Output:



createddate  TotalCount Duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 6 5





share|improve this answer


























  • Your query doesn't consider duplication on the same day.

    – Keith Park
    Dec 31 '18 at 5:10











  • @KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

    – Avi
    Dec 31 '18 at 5:34














1












1








1







You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.



Create table Testtbl (pkey int, personalID int, createddate date, value int); 

insert into Testtbl values
(1 , 1, '2018-01-01' , 100) ,
(2 , 2, '2018-01-01' , 300) ,
(3 , 3, '2018-01-01' , 500) ,
(4 , 1, '2018-01-02' , 100) ,
(5 , 2, '2018-01-02' , 200) ,
(6 , 3, '2018-01-02' , 200) ,
(7 , 4, '2018-01-02' , 100) ,
(8 , 5, '2018-01-02' , 100) ,
(9 , 3, '2018-01-03' , 200) ,
(14 , 3, '2018-01-03' , 500) ,
(10 , 4, '2018-01-03' , 100) ,
(11 , 5, '2018-01-03' , 100) ,
(12 , 6, '2018-01-03' , 50),
(13 , 6, '2018-01-03' , 100)


Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .



  select  t.createddate, count(Distinct t.pkey) TotalCount, 
case when t.Createddate > t1.createddate
then Count(distinct t1.PersonalID) + case when t.Createddate =
t1.createddate and
t.personalID = t1.personalID and t.pkey != t1.pkey then Count(distinct
t1.PersonalID)
else 0 end else 0
end Duplicated from Testtbl t
left join Testtbl t1 on t.personalID = t1.personalID
and t.Createddate >= t1.Createddate and t.pkey != t1.pkey
and DATEDIFF(t1.Createddate, t.Createddate) <= 60
Group by t.createddate


Output:



createddate  TotalCount Duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 6 5





share|improve this answer















You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.



Create table Testtbl (pkey int, personalID int, createddate date, value int); 

insert into Testtbl values
(1 , 1, '2018-01-01' , 100) ,
(2 , 2, '2018-01-01' , 300) ,
(3 , 3, '2018-01-01' , 500) ,
(4 , 1, '2018-01-02' , 100) ,
(5 , 2, '2018-01-02' , 200) ,
(6 , 3, '2018-01-02' , 200) ,
(7 , 4, '2018-01-02' , 100) ,
(8 , 5, '2018-01-02' , 100) ,
(9 , 3, '2018-01-03' , 200) ,
(14 , 3, '2018-01-03' , 500) ,
(10 , 4, '2018-01-03' , 100) ,
(11 , 5, '2018-01-03' , 100) ,
(12 , 6, '2018-01-03' , 50),
(13 , 6, '2018-01-03' , 100)


Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .



  select  t.createddate, count(Distinct t.pkey) TotalCount, 
case when t.Createddate > t1.createddate
then Count(distinct t1.PersonalID) + case when t.Createddate =
t1.createddate and
t.personalID = t1.personalID and t.pkey != t1.pkey then Count(distinct
t1.PersonalID)
else 0 end else 0
end Duplicated from Testtbl t
left join Testtbl t1 on t.personalID = t1.personalID
and t.Createddate >= t1.Createddate and t.pkey != t1.pkey
and DATEDIFF(t1.Createddate, t.Createddate) <= 60
Group by t.createddate


Output:



createddate  TotalCount Duplicated
2018-01-01 3 0
2018-01-02 5 3
2018-01-03 6 5






share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 31 '18 at 5:33

























answered Dec 31 '18 at 3:13









AviAvi

735212




735212













  • Your query doesn't consider duplication on the same day.

    – Keith Park
    Dec 31 '18 at 5:10











  • @KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

    – Avi
    Dec 31 '18 at 5:34



















  • Your query doesn't consider duplication on the same day.

    – Keith Park
    Dec 31 '18 at 5:10











  • @KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

    – Avi
    Dec 31 '18 at 5:34

















Your query doesn't consider duplication on the same day.

– Keith Park
Dec 31 '18 at 5:10





Your query doesn't consider duplication on the same day.

– Keith Park
Dec 31 '18 at 5:10













@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

– Avi
Dec 31 '18 at 5:34





@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

– Avi
Dec 31 '18 at 5:34


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53982852%2fduplicated-data-count-as-per-baseline-date%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas