Duplicated data count as per baseline date

I’d like to create a query that checks if there is any duplicated ID that has been used as per date.

To this end, 60 days prior to, not the subsequent to, the baseline date, should be reviewed for any duplication.
A table example is as below.

CREATE TABLE SampleTable (

   pKey INT PRIMARY KEY,

   personalID INT NOT NULL,

   createDate DATETIME NOT NULL,

   value INT NULL

);

The baseline date corresponds to DATE(createDate) whereas ID to be checked for duplication is personallD.
Information needed herein can boil down to the number of today’s data and that of duplicated data.
The number of today’s data can be easily queried as follows.

SELECT

   COUNT(*)

FROM SampleTable

WHERE

   DATE(createDate) = DATE(NOW())

Out of today’s data, in addition, the number of duplicated data can be checked as follows.

SELECT

   COUNT(*)

FROM (

   SELECT

      personalID,

      COUNT(*)

   FROM SampleTable

   WHERE

      DATEDIFF(NOW(), trDate) <= 60

   GROUP BY personalID HAVING COUNT(*) > 1

) AS T

In conclusion, all I need to do is get the number of total data by date and the number of data that have the same personalID during the previous days.

[Sample Data]

pKey    personalID  createDate  value

1       1           2018-01-01  100

2       2           2018-01-01  300

3       3           2018-01-01  500

7       1           2018-01-02  100

8       2           2018-01-02  200

9       3           2018-01-02  200

10      4           2018-01-02  100

11      5           2018-01-02  100

12      3           2018-01-03  200

13      4           2018-01-03  100

14      5           2018-01-03  100

15      6           2018-01-03  50

[Desired result]

date        totalCount  duplicated

2018-01-01  3           0

2018-01-02  5           3

2018-01-03  4           3

edited Dec 31 '18 at 1:50

asked Dec 31 '18 at 1:41

Keith Park

2521314

Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.

– Gordon Linoff
Dec 31 '18 at 1:42

add a comment |

I’d like to create a query that checks if there is any duplicated ID that has been used as per date.

To this end, 60 days prior to, not the subsequent to, the baseline date, should be reviewed for any duplication.
A table example is as below.

CREATE TABLE SampleTable (

   pKey INT PRIMARY KEY,

   personalID INT NOT NULL,

   createDate DATETIME NOT NULL,

   value INT NULL

);

SELECT

   COUNT(*)

FROM SampleTable

WHERE

   DATE(createDate) = DATE(NOW())

Out of today’s data, in addition, the number of duplicated data can be checked as follows.

SELECT

   COUNT(*)

FROM (

   SELECT

      personalID,

      COUNT(*)

   FROM SampleTable

   WHERE

      DATEDIFF(NOW(), trDate) <= 60

   GROUP BY personalID HAVING COUNT(*) > 1

) AS T

In conclusion, all I need to do is get the number of total data by date and the number of data that have the same personalID during the previous days.

[Sample Data]

pKey    personalID  createDate  value

1       1           2018-01-01  100

2       2           2018-01-01  300

3       3           2018-01-01  500

7       1           2018-01-02  100

8       2           2018-01-02  200

9       3           2018-01-02  200

10      4           2018-01-02  100

11      5           2018-01-02  100

12      3           2018-01-03  200

13      4           2018-01-03  100

14      5           2018-01-03  100

15      6           2018-01-03  50

[Desired result]

date        totalCount  duplicated

2018-01-01  3           0

2018-01-02  5           3

2018-01-03  4           3

edited Dec 31 '18 at 1:50

asked Dec 31 '18 at 1:41

Keith Park

2521314

Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.

– Gordon Linoff
Dec 31 '18 at 1:42

add a comment |

I’d like to create a query that checks if there is any duplicated ID that has been used as per date.

To this end, 60 days prior to, not the subsequent to, the baseline date, should be reviewed for any duplication.
A table example is as below.

CREATE TABLE SampleTable (

   pKey INT PRIMARY KEY,

   personalID INT NOT NULL,

   createDate DATETIME NOT NULL,

   value INT NULL

);

SELECT

   COUNT(*)

FROM SampleTable

WHERE

   DATE(createDate) = DATE(NOW())

Out of today’s data, in addition, the number of duplicated data can be checked as follows.

SELECT

   COUNT(*)

FROM (

   SELECT

      personalID,

      COUNT(*)

   FROM SampleTable

   WHERE

      DATEDIFF(NOW(), trDate) <= 60

   GROUP BY personalID HAVING COUNT(*) > 1

) AS T

In conclusion, all I need to do is get the number of total data by date and the number of data that have the same personalID during the previous days.

[Sample Data]

pKey    personalID  createDate  value

1       1           2018-01-01  100

2       2           2018-01-01  300

3       3           2018-01-01  500

7       1           2018-01-02  100

8       2           2018-01-02  200

9       3           2018-01-02  200

10      4           2018-01-02  100

11      5           2018-01-02  100

12      3           2018-01-03  200

13      4           2018-01-03  100

14      5           2018-01-03  100

15      6           2018-01-03  50

[Desired result]

date        totalCount  duplicated

2018-01-01  3           0

2018-01-02  5           3

2018-01-03  4           3

edited Dec 31 '18 at 1:50

asked Dec 31 '18 at 1:41

Keith Park

2521314

I’d like to create a query that checks if there is any duplicated ID that has been used as per date.

To this end, 60 days prior to, not the subsequent to, the baseline date, should be reviewed for any duplication.
A table example is as below.

CREATE TABLE SampleTable (

   pKey INT PRIMARY KEY,

   personalID INT NOT NULL,

   createDate DATETIME NOT NULL,

   value INT NULL

);

SELECT

   COUNT(*)

FROM SampleTable

WHERE

   DATE(createDate) = DATE(NOW())

Out of today’s data, in addition, the number of duplicated data can be checked as follows.

SELECT

   COUNT(*)

FROM (

   SELECT

      personalID,

      COUNT(*)

   FROM SampleTable

   WHERE

      DATEDIFF(NOW(), trDate) <= 60

   GROUP BY personalID HAVING COUNT(*) > 1

) AS T

In conclusion, all I need to do is get the number of total data by date and the number of data that have the same personalID during the previous days.

[Sample Data]

pKey    personalID  createDate  value

1       1           2018-01-01  100

2       2           2018-01-01  300

3       3           2018-01-01  500

7       1           2018-01-02  100

8       2           2018-01-02  200

9       3           2018-01-02  200

10      4           2018-01-02  100

11      5           2018-01-02  100

12      3           2018-01-03  200

13      4           2018-01-03  100

14      5           2018-01-03  100

15      6           2018-01-03  50

[Desired result]

date        totalCount  duplicated

2018-01-01  3           0

2018-01-02  5           3

2018-01-03  4           3

mysql sql

edited Dec 31 '18 at 1:50

asked Dec 31 '18 at 1:41

Keith Park

2521314

edited Dec 31 '18 at 1:50

asked Dec 31 '18 at 1:41

Keith Park

2521314

edited Dec 31 '18 at 1:50

asked Dec 31 '18 at 1:41

Keith Park

2521314

asked Dec 31 '18 at 1:41

Keith Park

2521314

asked Dec 31 '18 at 1:41

Keith Park

2521314

Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.

– Gordon Linoff
Dec 31 '18 at 1:42

add a comment |

Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.

– Gordon Linoff
Dec 31 '18 at 1:42

Sample data and desired results would really help. I can't tell if you want ids with data on different days or ids that appear multiple times on the same day or ids that have data today and data in the past.

– Gordon Linoff
Dec 31 '18 at 1:42

add a comment |

2 Answers
2

active

oldest

votes

If you want the ids that have multiple rows in the past 60 days:

select personid

from sampledata

where trdate >= curdate() - interval 60 day

group by personid

having count(*) >= 2;

If you want to also insist that the personid appears on the most recent date:

select personid

from sampledata

where trdate >= curdate() - interval 60 day

group by personid

having count(*) >= 2 and date(max(trdate)) = curdate();

EDIT:

This seems to be what you want, assuming no duplicates on a given day

select trdate, count(*) as num_persons,

       sum(num_dups > 0) as num_dups

from (select sd.*,

             (select count(*)

              from sampledata sd2

              where sd2.personid = sd.personid and

                    sd2.trdate < sd.trdate and

                    sd2.trdate >= sd.trdate - interval 60 day

             ) as num_dups

      from sampledata

     ) sd

group by trdate;

edited Dec 31 '18 at 2:03

answered Dec 31 '18 at 1:44

Gordon Linoff

770k35304404

Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

– Keith Park
Dec 31 '18 at 1:54

Works pretty fine. But, considering performance, will it be okay?

– Keith Park
Dec 31 '18 at 5:52

@KeithPark . . . An index on (personId, trdate)` would help performance.

– Gordon Linoff
Dec 31 '18 at 12:43

add a comment |

You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.

Create table Testtbl (pkey int, personalID int, createddate date, value int); 



insert into Testtbl values 

(1  ,     1,           '2018-01-01' , 100) , 

(2 ,     2,           '2018-01-01' , 300) ,

(3  ,     3,           '2018-01-01' , 500) ,

(4  ,     1,           '2018-01-02' , 100) ,

(5  ,     2,           '2018-01-02' , 200) ,

(6  ,     3,           '2018-01-02' , 200) ,

(7  ,     4,           '2018-01-02' , 100) ,

(8  ,     5,           '2018-01-02' , 100) ,

(9  ,     3,           '2018-01-03' , 200) ,

(14  ,     3,           '2018-01-03' , 500) ,

(10  ,     4,           '2018-01-03' , 100) ,

(11  ,     5,           '2018-01-03' , 100) ,

(12  ,     6,           '2018-01-03' , 50),

(13  ,     6,           '2018-01-03' , 100)

Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .

  select  t.createddate, count(Distinct t.pkey) TotalCount, 

  case when t.Createddate > t1.createddate  

  then Count(distinct t1.PersonalID) + case when t.Createddate = 

  t1.createddate and 

  t.personalID = t1.personalID  and t.pkey != t1.pkey then Count(distinct 

  t1.PersonalID) 

  else 0 end else 0     

  end   Duplicated   from Testtbl t 

  left join Testtbl t1 on t.personalID = t1.personalID 

                    and t.Createddate >= t1.Createddate and t.pkey != t1.pkey 

                    and DATEDIFF(t1.Createddate, t.Createddate) <= 60 

 Group by t.createddate

Output:

createddate  TotalCount Duplicated

2018-01-01     3         0

2018-01-02     5         3

2018-01-03     6         5

edited Dec 31 '18 at 5:33

answered Dec 31 '18 at 3:13

Avi

735212

Your query doesn't consider duplication on the same day.

– Keith Park
Dec 31 '18 at 5:10

@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

– Avi
Dec 31 '18 at 5:34

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53982852%2fduplicated-data-count-as-per-baseline-date%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

If you want the ids that have multiple rows in the past 60 days:

select personid

from sampledata

where trdate >= curdate() - interval 60 day

group by personid

having count(*) >= 2;

If you want to also insist that the personid appears on the most recent date:

select personid

from sampledata

where trdate >= curdate() - interval 60 day

group by personid

having count(*) >= 2 and date(max(trdate)) = curdate();

EDIT:

This seems to be what you want, assuming no duplicates on a given day

select trdate, count(*) as num_persons,

       sum(num_dups > 0) as num_dups

from (select sd.*,

             (select count(*)

              from sampledata sd2

              where sd2.personid = sd.personid and

                    sd2.trdate < sd.trdate and

                    sd2.trdate >= sd.trdate - interval 60 day

             ) as num_dups

      from sampledata

     ) sd

group by trdate;

edited Dec 31 '18 at 2:03

answered Dec 31 '18 at 1:44

Gordon Linoff

770k35304404

Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

– Keith Park
Dec 31 '18 at 1:54

Works pretty fine. But, considering performance, will it be okay?

– Keith Park
Dec 31 '18 at 5:52

@KeithPark . . . An index on (personId, trdate)` would help performance.

– Gordon Linoff
Dec 31 '18 at 12:43

add a comment |

If you want the ids that have multiple rows in the past 60 days:

select personid

from sampledata

where trdate >= curdate() - interval 60 day

group by personid

having count(*) >= 2;

If you want to also insist that the personid appears on the most recent date:

select personid

from sampledata

where trdate >= curdate() - interval 60 day

group by personid

having count(*) >= 2 and date(max(trdate)) = curdate();

EDIT:

This seems to be what you want, assuming no duplicates on a given day

select trdate, count(*) as num_persons,

       sum(num_dups > 0) as num_dups

from (select sd.*,

             (select count(*)

              from sampledata sd2

              where sd2.personid = sd.personid and

                    sd2.trdate < sd.trdate and

                    sd2.trdate >= sd.trdate - interval 60 day

             ) as num_dups

      from sampledata

     ) sd

group by trdate;

edited Dec 31 '18 at 2:03

answered Dec 31 '18 at 1:44

Gordon Linoff

770k35304404

Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

– Keith Park
Dec 31 '18 at 1:54

Works pretty fine. But, considering performance, will it be okay?

– Keith Park
Dec 31 '18 at 5:52

@KeithPark . . . An index on (personId, trdate)` would help performance.

– Gordon Linoff
Dec 31 '18 at 12:43

add a comment |

If you want the ids that have multiple rows in the past 60 days:

select personid

from sampledata

where trdate >= curdate() - interval 60 day

group by personid

having count(*) >= 2;

If you want to also insist that the personid appears on the most recent date:

select personid

from sampledata

where trdate >= curdate() - interval 60 day

group by personid

having count(*) >= 2 and date(max(trdate)) = curdate();

EDIT:

This seems to be what you want, assuming no duplicates on a given day

select trdate, count(*) as num_persons,

       sum(num_dups > 0) as num_dups

from (select sd.*,

             (select count(*)

              from sampledata sd2

              where sd2.personid = sd.personid and

                    sd2.trdate < sd.trdate and

                    sd2.trdate >= sd.trdate - interval 60 day

             ) as num_dups

      from sampledata

     ) sd

group by trdate;

edited Dec 31 '18 at 2:03

answered Dec 31 '18 at 1:44

Gordon Linoff

770k35304404

If you want the ids that have multiple rows in the past 60 days:

select personid

from sampledata

where trdate >= curdate() - interval 60 day

group by personid

having count(*) >= 2;

If you want to also insist that the personid appears on the most recent date:

select personid

from sampledata

where trdate >= curdate() - interval 60 day

group by personid

having count(*) >= 2 and date(max(trdate)) = curdate();

EDIT:

This seems to be what you want, assuming no duplicates on a given day

select trdate, count(*) as num_persons,

       sum(num_dups > 0) as num_dups

from (select sd.*,

             (select count(*)

              from sampledata sd2

              where sd2.personid = sd.personid and

                    sd2.trdate < sd.trdate and

                    sd2.trdate >= sd.trdate - interval 60 day

             ) as num_dups

      from sampledata

     ) sd

group by trdate;

edited Dec 31 '18 at 2:03

answered Dec 31 '18 at 1:44

Gordon Linoff

770k35304404

edited Dec 31 '18 at 2:03

answered Dec 31 '18 at 1:44

Gordon Linoff

770k35304404

answered Dec 31 '18 at 1:44

Gordon Linoff

770k35304404

answered Dec 31 '18 at 1:44

Gordon Linoff

770k35304404

Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

– Keith Park
Dec 31 '18 at 1:54

Works pretty fine. But, considering performance, will it be okay?

– Keith Park
Dec 31 '18 at 5:52

@KeithPark . . . An index on (personId, trdate)` would help performance.

– Gordon Linoff
Dec 31 '18 at 12:43

add a comment |

Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

– Keith Park
Dec 31 '18 at 1:54

Works pretty fine. But, considering performance, will it be okay?

– Keith Park
Dec 31 '18 at 5:52

@KeithPark . . . An index on (personId, trdate)` would help performance.

– Gordon Linoff
Dec 31 '18 at 12:43

Thanks Gordon, unfortunately, that's not I want. I updated sample data and desired result to clarify.

– Keith Park
Dec 31 '18 at 1:54

Works pretty fine. But, considering performance, will it be okay?

– Keith Park
Dec 31 '18 at 5:52

@KeithPark . . . An index on (personId, trdate)` would help performance.

– Gordon Linoff
Dec 31 '18 at 12:43

add a comment |

You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.

Create table Testtbl (pkey int, personalID int, createddate date, value int); 



insert into Testtbl values 

(1  ,     1,           '2018-01-01' , 100) , 

(2 ,     2,           '2018-01-01' , 300) ,

(3  ,     3,           '2018-01-01' , 500) ,

(4  ,     1,           '2018-01-02' , 100) ,

(5  ,     2,           '2018-01-02' , 200) ,

(6  ,     3,           '2018-01-02' , 200) ,

(7  ,     4,           '2018-01-02' , 100) ,

(8  ,     5,           '2018-01-02' , 100) ,

(9  ,     3,           '2018-01-03' , 200) ,

(14  ,     3,           '2018-01-03' , 500) ,

(10  ,     4,           '2018-01-03' , 100) ,

(11  ,     5,           '2018-01-03' , 100) ,

(12  ,     6,           '2018-01-03' , 50),

(13  ,     6,           '2018-01-03' , 100)

Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .

  select  t.createddate, count(Distinct t.pkey) TotalCount, 

  case when t.Createddate > t1.createddate  

  then Count(distinct t1.PersonalID) + case when t.Createddate = 

  t1.createddate and 

  t.personalID = t1.personalID  and t.pkey != t1.pkey then Count(distinct 

  t1.PersonalID) 

  else 0 end else 0     

  end   Duplicated   from Testtbl t 

  left join Testtbl t1 on t.personalID = t1.personalID 

                    and t.Createddate >= t1.Createddate and t.pkey != t1.pkey 

                    and DATEDIFF(t1.Createddate, t.Createddate) <= 60 

 Group by t.createddate

Output:

createddate  TotalCount Duplicated

2018-01-01     3         0

2018-01-02     5         3

2018-01-03     6         5

edited Dec 31 '18 at 5:33

answered Dec 31 '18 at 3:13

Avi

735212

Your query doesn't consider duplication on the same day.

– Keith Park
Dec 31 '18 at 5:10

@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

– Avi
Dec 31 '18 at 5:34

add a comment |

You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.

Create table Testtbl (pkey int, personalID int, createddate date, value int); 



insert into Testtbl values 

(1  ,     1,           '2018-01-01' , 100) , 

(2 ,     2,           '2018-01-01' , 300) ,

(3  ,     3,           '2018-01-01' , 500) ,

(4  ,     1,           '2018-01-02' , 100) ,

(5  ,     2,           '2018-01-02' , 200) ,

(6  ,     3,           '2018-01-02' , 200) ,

(7  ,     4,           '2018-01-02' , 100) ,

(8  ,     5,           '2018-01-02' , 100) ,

(9  ,     3,           '2018-01-03' , 200) ,

(14  ,     3,           '2018-01-03' , 500) ,

(10  ,     4,           '2018-01-03' , 100) ,

(11  ,     5,           '2018-01-03' , 100) ,

(12  ,     6,           '2018-01-03' , 50),

(13  ,     6,           '2018-01-03' , 100)

Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .

  select  t.createddate, count(Distinct t.pkey) TotalCount, 

  case when t.Createddate > t1.createddate  

  then Count(distinct t1.PersonalID) + case when t.Createddate = 

  t1.createddate and 

  t.personalID = t1.personalID  and t.pkey != t1.pkey then Count(distinct 

  t1.PersonalID) 

  else 0 end else 0     

  end   Duplicated   from Testtbl t 

  left join Testtbl t1 on t.personalID = t1.personalID 

                    and t.Createddate >= t1.Createddate and t.pkey != t1.pkey 

                    and DATEDIFF(t1.Createddate, t.Createddate) <= 60 

 Group by t.createddate

Output:

createddate  TotalCount Duplicated

2018-01-01     3         0

2018-01-02     5         3

2018-01-03     6         5

edited Dec 31 '18 at 5:33

answered Dec 31 '18 at 3:13

Avi

735212

Your query doesn't consider duplication on the same day.

– Keith Park
Dec 31 '18 at 5:10

@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

– Avi
Dec 31 '18 at 5:34

add a comment |

You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.

Create table Testtbl (pkey int, personalID int, createddate date, value int); 



insert into Testtbl values 

(1  ,     1,           '2018-01-01' , 100) , 

(2 ,     2,           '2018-01-01' , 300) ,

(3  ,     3,           '2018-01-01' , 500) ,

(4  ,     1,           '2018-01-02' , 100) ,

(5  ,     2,           '2018-01-02' , 200) ,

(6  ,     3,           '2018-01-02' , 200) ,

(7  ,     4,           '2018-01-02' , 100) ,

(8  ,     5,           '2018-01-02' , 100) ,

(9  ,     3,           '2018-01-03' , 200) ,

(14  ,     3,           '2018-01-03' , 500) ,

(10  ,     4,           '2018-01-03' , 100) ,

(11  ,     5,           '2018-01-03' , 100) ,

(12  ,     6,           '2018-01-03' , 50),

(13  ,     6,           '2018-01-03' , 100)

Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .

  select  t.createddate, count(Distinct t.pkey) TotalCount, 

  case when t.Createddate > t1.createddate  

  then Count(distinct t1.PersonalID) + case when t.Createddate = 

  t1.createddate and 

  t.personalID = t1.personalID  and t.pkey != t1.pkey then Count(distinct 

  t1.PersonalID) 

  else 0 end else 0     

  end   Duplicated   from Testtbl t 

  left join Testtbl t1 on t.personalID = t1.personalID 

                    and t.Createddate >= t1.Createddate and t.pkey != t1.pkey 

                    and DATEDIFF(t1.Createddate, t.Createddate) <= 60 

 Group by t.createddate

Output:

createddate  TotalCount Duplicated

2018-01-01     3         0

2018-01-02     5         3

2018-01-03     6         5

edited Dec 31 '18 at 5:33

answered Dec 31 '18 at 3:13

Avi

735212

You can use the self join approach as well to find out this type of data. If you need to find out what ID repeated by comparing with previous dates , you can also use this approach.

Create table Testtbl (pkey int, personalID int, createddate date, value int); 



insert into Testtbl values 

(1  ,     1,           '2018-01-01' , 100) , 

(2 ,     2,           '2018-01-01' , 300) ,

(3  ,     3,           '2018-01-01' , 500) ,

(4  ,     1,           '2018-01-02' , 100) ,

(5  ,     2,           '2018-01-02' , 200) ,

(6  ,     3,           '2018-01-02' , 200) ,

(7  ,     4,           '2018-01-02' , 100) ,

(8  ,     5,           '2018-01-02' , 100) ,

(9  ,     3,           '2018-01-03' , 200) ,

(14  ,     3,           '2018-01-03' , 500) ,

(10  ,     4,           '2018-01-03' , 100) ,

(11  ,     5,           '2018-01-03' , 100) ,

(12  ,     6,           '2018-01-03' , 50),

(13  ,     6,           '2018-01-03' , 100)

Query: Left join will help to find out duplicated data without loss of total count. Distinct is to make sure that the same ID is not counted twice .

  select  t.createddate, count(Distinct t.pkey) TotalCount, 

  case when t.Createddate > t1.createddate  

  then Count(distinct t1.PersonalID) + case when t.Createddate = 

  t1.createddate and 

  t.personalID = t1.personalID  and t.pkey != t1.pkey then Count(distinct 

  t1.PersonalID) 

  else 0 end else 0     

  end   Duplicated   from Testtbl t 

  left join Testtbl t1 on t.personalID = t1.personalID 

                    and t.Createddate >= t1.Createddate and t.pkey != t1.pkey 

                    and DATEDIFF(t1.Createddate, t.Createddate) <= 60 

 Group by t.createddate

Output:

createddate  TotalCount Duplicated

2018-01-01     3         0

2018-01-02     5         3

2018-01-03     6         5

edited Dec 31 '18 at 5:33

answered Dec 31 '18 at 3:13

Avi

735212

edited Dec 31 '18 at 5:33

answered Dec 31 '18 at 3:13

Avi

735212

answered Dec 31 '18 at 3:13

Avi

735212

answered Dec 31 '18 at 3:13

Avi

735212

Your query doesn't consider duplication on the same day.

– Keith Park
Dec 31 '18 at 5:10

@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

– Avi
Dec 31 '18 at 5:34

add a comment |

Your query doesn't consider duplication on the same day.

– Keith Park
Dec 31 '18 at 5:10

@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

– Avi
Dec 31 '18 at 5:34

Your query doesn't consider duplication on the same day.

– Keith Park
Dec 31 '18 at 5:10

@KeithPark I thought it was comparison against previous days,I did not realize that. I added to consider same day duplication as well. If you see I added 6 and 3 on the same day, now the output will be 5 for duplicated (comparing all the unique IDs for previous days and within the same day as well and total count will be 6

– Avi
Dec 31 '18 at 5:34

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk