How to self join on date calculation and other fields in Redshift












1















Got a query that gets bookings data grouped by date, operator, source, route and I want to self join it so that I have the same data against it but joined on the previous year's date (add 1 day to align the weekday).



For example, if we had



date      |num|
----------|---|
2017-01-01|300|
2017-01-02|301|
2017-01-03|302|
2017-01-04|303|
2017-01-05|304|
2017-01-06|305|
2017-01-07|306|
2017-01-08|307|
2017-01-09|308|
2017-01-10|309|
...
2018-01-01|501|
2018-01-02|502|
2018-01-03|503|
2018-01-04|504|
2018-01-05|505|
2018-01-06|506|
2018-01-07|507|
2018-01-08|508|
2018-01-09|509|
2018-01-10|510|
...


Then I would want result to be



date      |num| date      |num|
----------|---| ----------|---|
2017-01-01|300| null |null|
2017-01-02|301| null |null|
2017-01-03|302| null |null|
2017-01-04|303| null |null|
2017-01-05|304| null |null|
2017-01-06|305| null |null|
2017-01-07|306| null |null|
2017-01-08|307| null |null|
2017-01-09|308| null |null|
2017-01-10|309| null |null|
...
2018-01-01|501| 2017-01-02|301|
2018-01-02|502| 2017-01-03|302|
2018-01-03|503| 2017-01-04|303|
2018-01-04|504| 2017-01-05|304|
2018-01-05|505| 2017-01-06|305|
2018-01-06|506| 2017-01-07|306|
2018-01-07|507| 2017-01-08|307|
2018-01-08|508| 2017-01-09|308|
2018-01-09|509| 2017-01-10|309|
2018-01-10|510| 2017-01-11|...|


What I've got so far



select

a.date as date1
,a.source as source1
,a.operator as operator1
,a.route as route1
,a.sum as bookings1

,b.date as date2
,b.source as source2
,b.operator as operator2
,b.route as route2
,b.sum as bookings2

from

(select
date
,source
,operator
,route
,sum(record_count)
from booking
where transaction_date >= dateadd(year, -2, current_date)
group by 1,2,3,4) a

full outer join

(select
date
,source
,operator
,route
,sum(record_count)
from booking
where date between dateadd(year, -3, current_date) and dateadd(year, -1, current_date)
group by 1,2,3,4) b

on

b.date = dateadd(day, 1, dateadd(year, -1, a.date))
and a.source = b.source
and a.outbound_operator_name = b.outbound_operator_name
and a.outbound_route_name = b.outbound_route_name

;


The problem with the above query is that it show's the correct 2018 results but not the correct 2017 results. I presume there's something wrong with either the join type or something within the on clause e.g. the date calculation attempt...



Any help would be much appreciated










share|improve this question























  • What exactly is wrong with the results? And why would the sub-query "a" have a criteria on "transaction_date" but "b" on "date"?

    – LukStorms
    Jan 1 at 21:17











  • I aggregate the results in Excel or Tableau for each day and 2018 show numbers show correctly but 2017 always is less than what it should be. The reason for the criteria is so that I can match the same day against previous year (with an added extra day so that we're matching on the same weekday e.g. Monday vs Monday).

    – AK91
    Jan 2 at 9:18






  • 1





    With a FULL JOIN it would show all of "a" and all of "b". Linking them where there's a match. So if you say there are "missing", then they are already not in the resultsets of "a" or "b". Maybe you think they are missing because you select based on the current date, and not on the current month? And also, normally it's better to join on an id instead of a name, because names could have different spelling or typos.

    – LukStorms
    Jan 2 at 9:30






  • 1





    Interesting, if the intention is to join on the same weekday, then with that day+1 that would only work with comparing specific years. To make that work for any current date, this might become more complicated. Like joining on the same week number & day of the week?

    – LukStorms
    Jan 2 at 9:41






  • 1





    The problem with questions regarding Amazon-redshift is that for the people that want to answer it then there doesn't seem to be a free online tester like rextester.com of db-fiddle for it. But HERE is a db-fiddle for a Postgres 8.4 (which should be close enough I guess). I think that if you look at the date criteria, and the results then with a full join all are there.

    – LukStorms
    Jan 4 at 15:13


















1















Got a query that gets bookings data grouped by date, operator, source, route and I want to self join it so that I have the same data against it but joined on the previous year's date (add 1 day to align the weekday).



For example, if we had



date      |num|
----------|---|
2017-01-01|300|
2017-01-02|301|
2017-01-03|302|
2017-01-04|303|
2017-01-05|304|
2017-01-06|305|
2017-01-07|306|
2017-01-08|307|
2017-01-09|308|
2017-01-10|309|
...
2018-01-01|501|
2018-01-02|502|
2018-01-03|503|
2018-01-04|504|
2018-01-05|505|
2018-01-06|506|
2018-01-07|507|
2018-01-08|508|
2018-01-09|509|
2018-01-10|510|
...


Then I would want result to be



date      |num| date      |num|
----------|---| ----------|---|
2017-01-01|300| null |null|
2017-01-02|301| null |null|
2017-01-03|302| null |null|
2017-01-04|303| null |null|
2017-01-05|304| null |null|
2017-01-06|305| null |null|
2017-01-07|306| null |null|
2017-01-08|307| null |null|
2017-01-09|308| null |null|
2017-01-10|309| null |null|
...
2018-01-01|501| 2017-01-02|301|
2018-01-02|502| 2017-01-03|302|
2018-01-03|503| 2017-01-04|303|
2018-01-04|504| 2017-01-05|304|
2018-01-05|505| 2017-01-06|305|
2018-01-06|506| 2017-01-07|306|
2018-01-07|507| 2017-01-08|307|
2018-01-08|508| 2017-01-09|308|
2018-01-09|509| 2017-01-10|309|
2018-01-10|510| 2017-01-11|...|


What I've got so far



select

a.date as date1
,a.source as source1
,a.operator as operator1
,a.route as route1
,a.sum as bookings1

,b.date as date2
,b.source as source2
,b.operator as operator2
,b.route as route2
,b.sum as bookings2

from

(select
date
,source
,operator
,route
,sum(record_count)
from booking
where transaction_date >= dateadd(year, -2, current_date)
group by 1,2,3,4) a

full outer join

(select
date
,source
,operator
,route
,sum(record_count)
from booking
where date between dateadd(year, -3, current_date) and dateadd(year, -1, current_date)
group by 1,2,3,4) b

on

b.date = dateadd(day, 1, dateadd(year, -1, a.date))
and a.source = b.source
and a.outbound_operator_name = b.outbound_operator_name
and a.outbound_route_name = b.outbound_route_name

;


The problem with the above query is that it show's the correct 2018 results but not the correct 2017 results. I presume there's something wrong with either the join type or something within the on clause e.g. the date calculation attempt...



Any help would be much appreciated










share|improve this question























  • What exactly is wrong with the results? And why would the sub-query "a" have a criteria on "transaction_date" but "b" on "date"?

    – LukStorms
    Jan 1 at 21:17











  • I aggregate the results in Excel or Tableau for each day and 2018 show numbers show correctly but 2017 always is less than what it should be. The reason for the criteria is so that I can match the same day against previous year (with an added extra day so that we're matching on the same weekday e.g. Monday vs Monday).

    – AK91
    Jan 2 at 9:18






  • 1





    With a FULL JOIN it would show all of "a" and all of "b". Linking them where there's a match. So if you say there are "missing", then they are already not in the resultsets of "a" or "b". Maybe you think they are missing because you select based on the current date, and not on the current month? And also, normally it's better to join on an id instead of a name, because names could have different spelling or typos.

    – LukStorms
    Jan 2 at 9:30






  • 1





    Interesting, if the intention is to join on the same weekday, then with that day+1 that would only work with comparing specific years. To make that work for any current date, this might become more complicated. Like joining on the same week number & day of the week?

    – LukStorms
    Jan 2 at 9:41






  • 1





    The problem with questions regarding Amazon-redshift is that for the people that want to answer it then there doesn't seem to be a free online tester like rextester.com of db-fiddle for it. But HERE is a db-fiddle for a Postgres 8.4 (which should be close enough I guess). I think that if you look at the date criteria, and the results then with a full join all are there.

    – LukStorms
    Jan 4 at 15:13
















1












1








1








Got a query that gets bookings data grouped by date, operator, source, route and I want to self join it so that I have the same data against it but joined on the previous year's date (add 1 day to align the weekday).



For example, if we had



date      |num|
----------|---|
2017-01-01|300|
2017-01-02|301|
2017-01-03|302|
2017-01-04|303|
2017-01-05|304|
2017-01-06|305|
2017-01-07|306|
2017-01-08|307|
2017-01-09|308|
2017-01-10|309|
...
2018-01-01|501|
2018-01-02|502|
2018-01-03|503|
2018-01-04|504|
2018-01-05|505|
2018-01-06|506|
2018-01-07|507|
2018-01-08|508|
2018-01-09|509|
2018-01-10|510|
...


Then I would want result to be



date      |num| date      |num|
----------|---| ----------|---|
2017-01-01|300| null |null|
2017-01-02|301| null |null|
2017-01-03|302| null |null|
2017-01-04|303| null |null|
2017-01-05|304| null |null|
2017-01-06|305| null |null|
2017-01-07|306| null |null|
2017-01-08|307| null |null|
2017-01-09|308| null |null|
2017-01-10|309| null |null|
...
2018-01-01|501| 2017-01-02|301|
2018-01-02|502| 2017-01-03|302|
2018-01-03|503| 2017-01-04|303|
2018-01-04|504| 2017-01-05|304|
2018-01-05|505| 2017-01-06|305|
2018-01-06|506| 2017-01-07|306|
2018-01-07|507| 2017-01-08|307|
2018-01-08|508| 2017-01-09|308|
2018-01-09|509| 2017-01-10|309|
2018-01-10|510| 2017-01-11|...|


What I've got so far



select

a.date as date1
,a.source as source1
,a.operator as operator1
,a.route as route1
,a.sum as bookings1

,b.date as date2
,b.source as source2
,b.operator as operator2
,b.route as route2
,b.sum as bookings2

from

(select
date
,source
,operator
,route
,sum(record_count)
from booking
where transaction_date >= dateadd(year, -2, current_date)
group by 1,2,3,4) a

full outer join

(select
date
,source
,operator
,route
,sum(record_count)
from booking
where date between dateadd(year, -3, current_date) and dateadd(year, -1, current_date)
group by 1,2,3,4) b

on

b.date = dateadd(day, 1, dateadd(year, -1, a.date))
and a.source = b.source
and a.outbound_operator_name = b.outbound_operator_name
and a.outbound_route_name = b.outbound_route_name

;


The problem with the above query is that it show's the correct 2018 results but not the correct 2017 results. I presume there's something wrong with either the join type or something within the on clause e.g. the date calculation attempt...



Any help would be much appreciated










share|improve this question














Got a query that gets bookings data grouped by date, operator, source, route and I want to self join it so that I have the same data against it but joined on the previous year's date (add 1 day to align the weekday).



For example, if we had



date      |num|
----------|---|
2017-01-01|300|
2017-01-02|301|
2017-01-03|302|
2017-01-04|303|
2017-01-05|304|
2017-01-06|305|
2017-01-07|306|
2017-01-08|307|
2017-01-09|308|
2017-01-10|309|
...
2018-01-01|501|
2018-01-02|502|
2018-01-03|503|
2018-01-04|504|
2018-01-05|505|
2018-01-06|506|
2018-01-07|507|
2018-01-08|508|
2018-01-09|509|
2018-01-10|510|
...


Then I would want result to be



date      |num| date      |num|
----------|---| ----------|---|
2017-01-01|300| null |null|
2017-01-02|301| null |null|
2017-01-03|302| null |null|
2017-01-04|303| null |null|
2017-01-05|304| null |null|
2017-01-06|305| null |null|
2017-01-07|306| null |null|
2017-01-08|307| null |null|
2017-01-09|308| null |null|
2017-01-10|309| null |null|
...
2018-01-01|501| 2017-01-02|301|
2018-01-02|502| 2017-01-03|302|
2018-01-03|503| 2017-01-04|303|
2018-01-04|504| 2017-01-05|304|
2018-01-05|505| 2017-01-06|305|
2018-01-06|506| 2017-01-07|306|
2018-01-07|507| 2017-01-08|307|
2018-01-08|508| 2017-01-09|308|
2018-01-09|509| 2017-01-10|309|
2018-01-10|510| 2017-01-11|...|


What I've got so far



select

a.date as date1
,a.source as source1
,a.operator as operator1
,a.route as route1
,a.sum as bookings1

,b.date as date2
,b.source as source2
,b.operator as operator2
,b.route as route2
,b.sum as bookings2

from

(select
date
,source
,operator
,route
,sum(record_count)
from booking
where transaction_date >= dateadd(year, -2, current_date)
group by 1,2,3,4) a

full outer join

(select
date
,source
,operator
,route
,sum(record_count)
from booking
where date between dateadd(year, -3, current_date) and dateadd(year, -1, current_date)
group by 1,2,3,4) b

on

b.date = dateadd(day, 1, dateadd(year, -1, a.date))
and a.source = b.source
and a.outbound_operator_name = b.outbound_operator_name
and a.outbound_route_name = b.outbound_route_name

;


The problem with the above query is that it show's the correct 2018 results but not the correct 2017 results. I presume there's something wrong with either the join type or something within the on clause e.g. the date calculation attempt...



Any help would be much appreciated







sql amazon-redshift






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jan 1 at 19:16









AK91AK91

757




757













  • What exactly is wrong with the results? And why would the sub-query "a" have a criteria on "transaction_date" but "b" on "date"?

    – LukStorms
    Jan 1 at 21:17











  • I aggregate the results in Excel or Tableau for each day and 2018 show numbers show correctly but 2017 always is less than what it should be. The reason for the criteria is so that I can match the same day against previous year (with an added extra day so that we're matching on the same weekday e.g. Monday vs Monday).

    – AK91
    Jan 2 at 9:18






  • 1





    With a FULL JOIN it would show all of "a" and all of "b". Linking them where there's a match. So if you say there are "missing", then they are already not in the resultsets of "a" or "b". Maybe you think they are missing because you select based on the current date, and not on the current month? And also, normally it's better to join on an id instead of a name, because names could have different spelling or typos.

    – LukStorms
    Jan 2 at 9:30






  • 1





    Interesting, if the intention is to join on the same weekday, then with that day+1 that would only work with comparing specific years. To make that work for any current date, this might become more complicated. Like joining on the same week number & day of the week?

    – LukStorms
    Jan 2 at 9:41






  • 1





    The problem with questions regarding Amazon-redshift is that for the people that want to answer it then there doesn't seem to be a free online tester like rextester.com of db-fiddle for it. But HERE is a db-fiddle for a Postgres 8.4 (which should be close enough I guess). I think that if you look at the date criteria, and the results then with a full join all are there.

    – LukStorms
    Jan 4 at 15:13





















  • What exactly is wrong with the results? And why would the sub-query "a" have a criteria on "transaction_date" but "b" on "date"?

    – LukStorms
    Jan 1 at 21:17











  • I aggregate the results in Excel or Tableau for each day and 2018 show numbers show correctly but 2017 always is less than what it should be. The reason for the criteria is so that I can match the same day against previous year (with an added extra day so that we're matching on the same weekday e.g. Monday vs Monday).

    – AK91
    Jan 2 at 9:18






  • 1





    With a FULL JOIN it would show all of "a" and all of "b". Linking them where there's a match. So if you say there are "missing", then they are already not in the resultsets of "a" or "b". Maybe you think they are missing because you select based on the current date, and not on the current month? And also, normally it's better to join on an id instead of a name, because names could have different spelling or typos.

    – LukStorms
    Jan 2 at 9:30






  • 1





    Interesting, if the intention is to join on the same weekday, then with that day+1 that would only work with comparing specific years. To make that work for any current date, this might become more complicated. Like joining on the same week number & day of the week?

    – LukStorms
    Jan 2 at 9:41






  • 1





    The problem with questions regarding Amazon-redshift is that for the people that want to answer it then there doesn't seem to be a free online tester like rextester.com of db-fiddle for it. But HERE is a db-fiddle for a Postgres 8.4 (which should be close enough I guess). I think that if you look at the date criteria, and the results then with a full join all are there.

    – LukStorms
    Jan 4 at 15:13



















What exactly is wrong with the results? And why would the sub-query "a" have a criteria on "transaction_date" but "b" on "date"?

– LukStorms
Jan 1 at 21:17





What exactly is wrong with the results? And why would the sub-query "a" have a criteria on "transaction_date" but "b" on "date"?

– LukStorms
Jan 1 at 21:17













I aggregate the results in Excel or Tableau for each day and 2018 show numbers show correctly but 2017 always is less than what it should be. The reason for the criteria is so that I can match the same day against previous year (with an added extra day so that we're matching on the same weekday e.g. Monday vs Monday).

– AK91
Jan 2 at 9:18





I aggregate the results in Excel or Tableau for each day and 2018 show numbers show correctly but 2017 always is less than what it should be. The reason for the criteria is so that I can match the same day against previous year (with an added extra day so that we're matching on the same weekday e.g. Monday vs Monday).

– AK91
Jan 2 at 9:18




1




1





With a FULL JOIN it would show all of "a" and all of "b". Linking them where there's a match. So if you say there are "missing", then they are already not in the resultsets of "a" or "b". Maybe you think they are missing because you select based on the current date, and not on the current month? And also, normally it's better to join on an id instead of a name, because names could have different spelling or typos.

– LukStorms
Jan 2 at 9:30





With a FULL JOIN it would show all of "a" and all of "b". Linking them where there's a match. So if you say there are "missing", then they are already not in the resultsets of "a" or "b". Maybe you think they are missing because you select based on the current date, and not on the current month? And also, normally it's better to join on an id instead of a name, because names could have different spelling or typos.

– LukStorms
Jan 2 at 9:30




1




1





Interesting, if the intention is to join on the same weekday, then with that day+1 that would only work with comparing specific years. To make that work for any current date, this might become more complicated. Like joining on the same week number & day of the week?

– LukStorms
Jan 2 at 9:41





Interesting, if the intention is to join on the same weekday, then with that day+1 that would only work with comparing specific years. To make that work for any current date, this might become more complicated. Like joining on the same week number & day of the week?

– LukStorms
Jan 2 at 9:41




1




1





The problem with questions regarding Amazon-redshift is that for the people that want to answer it then there doesn't seem to be a free online tester like rextester.com of db-fiddle for it. But HERE is a db-fiddle for a Postgres 8.4 (which should be close enough I guess). I think that if you look at the date criteria, and the results then with a full join all are there.

– LukStorms
Jan 4 at 15:13







The problem with questions regarding Amazon-redshift is that for the people that want to answer it then there doesn't seem to be a free online tester like rextester.com of db-fiddle for it. But HERE is a db-fiddle for a Postgres 8.4 (which should be close enough I guess). I think that if you look at the date criteria, and the results then with a full join all are there.

– LukStorms
Jan 4 at 15:13














0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53998245%2fhow-to-self-join-on-date-calculation-and-other-fields-in-redshift%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53998245%2fhow-to-self-join-on-date-calculation-and-other-fields-in-redshift%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas