How to self join on date calculation and other fields in Redshift
Got a query that gets bookings data grouped by date, operator, source, route and I want to self join it so that I have the same data against it but joined on the previous year's date (add 1 day to align the weekday).
For example, if we had
date |num|
----------|---|
2017-01-01|300|
2017-01-02|301|
2017-01-03|302|
2017-01-04|303|
2017-01-05|304|
2017-01-06|305|
2017-01-07|306|
2017-01-08|307|
2017-01-09|308|
2017-01-10|309|
...
2018-01-01|501|
2018-01-02|502|
2018-01-03|503|
2018-01-04|504|
2018-01-05|505|
2018-01-06|506|
2018-01-07|507|
2018-01-08|508|
2018-01-09|509|
2018-01-10|510|
...
Then I would want result to be
date |num| date |num|
----------|---| ----------|---|
2017-01-01|300| null |null|
2017-01-02|301| null |null|
2017-01-03|302| null |null|
2017-01-04|303| null |null|
2017-01-05|304| null |null|
2017-01-06|305| null |null|
2017-01-07|306| null |null|
2017-01-08|307| null |null|
2017-01-09|308| null |null|
2017-01-10|309| null |null|
...
2018-01-01|501| 2017-01-02|301|
2018-01-02|502| 2017-01-03|302|
2018-01-03|503| 2017-01-04|303|
2018-01-04|504| 2017-01-05|304|
2018-01-05|505| 2017-01-06|305|
2018-01-06|506| 2017-01-07|306|
2018-01-07|507| 2017-01-08|307|
2018-01-08|508| 2017-01-09|308|
2018-01-09|509| 2017-01-10|309|
2018-01-10|510| 2017-01-11|...|
What I've got so far
select
a.date as date1
,a.source as source1
,a.operator as operator1
,a.route as route1
,a.sum as bookings1
,b.date as date2
,b.source as source2
,b.operator as operator2
,b.route as route2
,b.sum as bookings2
from
(select
date
,source
,operator
,route
,sum(record_count)
from booking
where transaction_date >= dateadd(year, -2, current_date)
group by 1,2,3,4) a
full outer join
(select
date
,source
,operator
,route
,sum(record_count)
from booking
where date between dateadd(year, -3, current_date) and dateadd(year, -1, current_date)
group by 1,2,3,4) b
on
b.date = dateadd(day, 1, dateadd(year, -1, a.date))
and a.source = b.source
and a.outbound_operator_name = b.outbound_operator_name
and a.outbound_route_name = b.outbound_route_name
;
The problem with the above query is that it show's the correct 2018 results but not the correct 2017 results. I presume there's something wrong with either the join type or something within the on clause e.g. the date calculation attempt...
Any help would be much appreciated
sql amazon-redshift
|
show 3 more comments
Got a query that gets bookings data grouped by date, operator, source, route and I want to self join it so that I have the same data against it but joined on the previous year's date (add 1 day to align the weekday).
For example, if we had
date |num|
----------|---|
2017-01-01|300|
2017-01-02|301|
2017-01-03|302|
2017-01-04|303|
2017-01-05|304|
2017-01-06|305|
2017-01-07|306|
2017-01-08|307|
2017-01-09|308|
2017-01-10|309|
...
2018-01-01|501|
2018-01-02|502|
2018-01-03|503|
2018-01-04|504|
2018-01-05|505|
2018-01-06|506|
2018-01-07|507|
2018-01-08|508|
2018-01-09|509|
2018-01-10|510|
...
Then I would want result to be
date |num| date |num|
----------|---| ----------|---|
2017-01-01|300| null |null|
2017-01-02|301| null |null|
2017-01-03|302| null |null|
2017-01-04|303| null |null|
2017-01-05|304| null |null|
2017-01-06|305| null |null|
2017-01-07|306| null |null|
2017-01-08|307| null |null|
2017-01-09|308| null |null|
2017-01-10|309| null |null|
...
2018-01-01|501| 2017-01-02|301|
2018-01-02|502| 2017-01-03|302|
2018-01-03|503| 2017-01-04|303|
2018-01-04|504| 2017-01-05|304|
2018-01-05|505| 2017-01-06|305|
2018-01-06|506| 2017-01-07|306|
2018-01-07|507| 2017-01-08|307|
2018-01-08|508| 2017-01-09|308|
2018-01-09|509| 2017-01-10|309|
2018-01-10|510| 2017-01-11|...|
What I've got so far
select
a.date as date1
,a.source as source1
,a.operator as operator1
,a.route as route1
,a.sum as bookings1
,b.date as date2
,b.source as source2
,b.operator as operator2
,b.route as route2
,b.sum as bookings2
from
(select
date
,source
,operator
,route
,sum(record_count)
from booking
where transaction_date >= dateadd(year, -2, current_date)
group by 1,2,3,4) a
full outer join
(select
date
,source
,operator
,route
,sum(record_count)
from booking
where date between dateadd(year, -3, current_date) and dateadd(year, -1, current_date)
group by 1,2,3,4) b
on
b.date = dateadd(day, 1, dateadd(year, -1, a.date))
and a.source = b.source
and a.outbound_operator_name = b.outbound_operator_name
and a.outbound_route_name = b.outbound_route_name
;
The problem with the above query is that it show's the correct 2018 results but not the correct 2017 results. I presume there's something wrong with either the join type or something within the on clause e.g. the date calculation attempt...
Any help would be much appreciated
sql amazon-redshift
What exactly is wrong with the results? And why would the sub-query "a" have a criteria on "transaction_date" but "b" on "date"?
– LukStorms
Jan 1 at 21:17
I aggregate the results in Excel or Tableau for each day and 2018 show numbers show correctly but 2017 always is less than what it should be. The reason for the criteria is so that I can match the same day against previous year (with an added extra day so that we're matching on the same weekday e.g. Monday vs Monday).
– AK91
Jan 2 at 9:18
1
With a FULL JOIN it would show all of "a" and all of "b". Linking them where there's a match. So if you say there are "missing", then they are already not in the resultsets of "a" or "b". Maybe you think they are missing because you select based on the current date, and not on the current month? And also, normally it's better to join on an id instead of a name, because names could have different spelling or typos.
– LukStorms
Jan 2 at 9:30
1
Interesting, if the intention is to join on the same weekday, then with that day+1 that would only work with comparing specific years. To make that work for any current date, this might become more complicated. Like joining on the same week number & day of the week?
– LukStorms
Jan 2 at 9:41
1
The problem with questions regarding Amazon-redshift is that for the people that want to answer it then there doesn't seem to be a free online tester like rextester.com of db-fiddle for it. But HERE is a db-fiddle for a Postgres 8.4 (which should be close enough I guess). I think that if you look at the date criteria, and the results then with a full join all are there.
– LukStorms
Jan 4 at 15:13
|
show 3 more comments
Got a query that gets bookings data grouped by date, operator, source, route and I want to self join it so that I have the same data against it but joined on the previous year's date (add 1 day to align the weekday).
For example, if we had
date |num|
----------|---|
2017-01-01|300|
2017-01-02|301|
2017-01-03|302|
2017-01-04|303|
2017-01-05|304|
2017-01-06|305|
2017-01-07|306|
2017-01-08|307|
2017-01-09|308|
2017-01-10|309|
...
2018-01-01|501|
2018-01-02|502|
2018-01-03|503|
2018-01-04|504|
2018-01-05|505|
2018-01-06|506|
2018-01-07|507|
2018-01-08|508|
2018-01-09|509|
2018-01-10|510|
...
Then I would want result to be
date |num| date |num|
----------|---| ----------|---|
2017-01-01|300| null |null|
2017-01-02|301| null |null|
2017-01-03|302| null |null|
2017-01-04|303| null |null|
2017-01-05|304| null |null|
2017-01-06|305| null |null|
2017-01-07|306| null |null|
2017-01-08|307| null |null|
2017-01-09|308| null |null|
2017-01-10|309| null |null|
...
2018-01-01|501| 2017-01-02|301|
2018-01-02|502| 2017-01-03|302|
2018-01-03|503| 2017-01-04|303|
2018-01-04|504| 2017-01-05|304|
2018-01-05|505| 2017-01-06|305|
2018-01-06|506| 2017-01-07|306|
2018-01-07|507| 2017-01-08|307|
2018-01-08|508| 2017-01-09|308|
2018-01-09|509| 2017-01-10|309|
2018-01-10|510| 2017-01-11|...|
What I've got so far
select
a.date as date1
,a.source as source1
,a.operator as operator1
,a.route as route1
,a.sum as bookings1
,b.date as date2
,b.source as source2
,b.operator as operator2
,b.route as route2
,b.sum as bookings2
from
(select
date
,source
,operator
,route
,sum(record_count)
from booking
where transaction_date >= dateadd(year, -2, current_date)
group by 1,2,3,4) a
full outer join
(select
date
,source
,operator
,route
,sum(record_count)
from booking
where date between dateadd(year, -3, current_date) and dateadd(year, -1, current_date)
group by 1,2,3,4) b
on
b.date = dateadd(day, 1, dateadd(year, -1, a.date))
and a.source = b.source
and a.outbound_operator_name = b.outbound_operator_name
and a.outbound_route_name = b.outbound_route_name
;
The problem with the above query is that it show's the correct 2018 results but not the correct 2017 results. I presume there's something wrong with either the join type or something within the on clause e.g. the date calculation attempt...
Any help would be much appreciated
sql amazon-redshift
Got a query that gets bookings data grouped by date, operator, source, route and I want to self join it so that I have the same data against it but joined on the previous year's date (add 1 day to align the weekday).
For example, if we had
date |num|
----------|---|
2017-01-01|300|
2017-01-02|301|
2017-01-03|302|
2017-01-04|303|
2017-01-05|304|
2017-01-06|305|
2017-01-07|306|
2017-01-08|307|
2017-01-09|308|
2017-01-10|309|
...
2018-01-01|501|
2018-01-02|502|
2018-01-03|503|
2018-01-04|504|
2018-01-05|505|
2018-01-06|506|
2018-01-07|507|
2018-01-08|508|
2018-01-09|509|
2018-01-10|510|
...
Then I would want result to be
date |num| date |num|
----------|---| ----------|---|
2017-01-01|300| null |null|
2017-01-02|301| null |null|
2017-01-03|302| null |null|
2017-01-04|303| null |null|
2017-01-05|304| null |null|
2017-01-06|305| null |null|
2017-01-07|306| null |null|
2017-01-08|307| null |null|
2017-01-09|308| null |null|
2017-01-10|309| null |null|
...
2018-01-01|501| 2017-01-02|301|
2018-01-02|502| 2017-01-03|302|
2018-01-03|503| 2017-01-04|303|
2018-01-04|504| 2017-01-05|304|
2018-01-05|505| 2017-01-06|305|
2018-01-06|506| 2017-01-07|306|
2018-01-07|507| 2017-01-08|307|
2018-01-08|508| 2017-01-09|308|
2018-01-09|509| 2017-01-10|309|
2018-01-10|510| 2017-01-11|...|
What I've got so far
select
a.date as date1
,a.source as source1
,a.operator as operator1
,a.route as route1
,a.sum as bookings1
,b.date as date2
,b.source as source2
,b.operator as operator2
,b.route as route2
,b.sum as bookings2
from
(select
date
,source
,operator
,route
,sum(record_count)
from booking
where transaction_date >= dateadd(year, -2, current_date)
group by 1,2,3,4) a
full outer join
(select
date
,source
,operator
,route
,sum(record_count)
from booking
where date between dateadd(year, -3, current_date) and dateadd(year, -1, current_date)
group by 1,2,3,4) b
on
b.date = dateadd(day, 1, dateadd(year, -1, a.date))
and a.source = b.source
and a.outbound_operator_name = b.outbound_operator_name
and a.outbound_route_name = b.outbound_route_name
;
The problem with the above query is that it show's the correct 2018 results but not the correct 2017 results. I presume there's something wrong with either the join type or something within the on clause e.g. the date calculation attempt...
Any help would be much appreciated
sql amazon-redshift
sql amazon-redshift
asked Jan 1 at 19:16
AK91AK91
757
757
What exactly is wrong with the results? And why would the sub-query "a" have a criteria on "transaction_date" but "b" on "date"?
– LukStorms
Jan 1 at 21:17
I aggregate the results in Excel or Tableau for each day and 2018 show numbers show correctly but 2017 always is less than what it should be. The reason for the criteria is so that I can match the same day against previous year (with an added extra day so that we're matching on the same weekday e.g. Monday vs Monday).
– AK91
Jan 2 at 9:18
1
With a FULL JOIN it would show all of "a" and all of "b". Linking them where there's a match. So if you say there are "missing", then they are already not in the resultsets of "a" or "b". Maybe you think they are missing because you select based on the current date, and not on the current month? And also, normally it's better to join on an id instead of a name, because names could have different spelling or typos.
– LukStorms
Jan 2 at 9:30
1
Interesting, if the intention is to join on the same weekday, then with that day+1 that would only work with comparing specific years. To make that work for any current date, this might become more complicated. Like joining on the same week number & day of the week?
– LukStorms
Jan 2 at 9:41
1
The problem with questions regarding Amazon-redshift is that for the people that want to answer it then there doesn't seem to be a free online tester like rextester.com of db-fiddle for it. But HERE is a db-fiddle for a Postgres 8.4 (which should be close enough I guess). I think that if you look at the date criteria, and the results then with a full join all are there.
– LukStorms
Jan 4 at 15:13
|
show 3 more comments
What exactly is wrong with the results? And why would the sub-query "a" have a criteria on "transaction_date" but "b" on "date"?
– LukStorms
Jan 1 at 21:17
I aggregate the results in Excel or Tableau for each day and 2018 show numbers show correctly but 2017 always is less than what it should be. The reason for the criteria is so that I can match the same day against previous year (with an added extra day so that we're matching on the same weekday e.g. Monday vs Monday).
– AK91
Jan 2 at 9:18
1
With a FULL JOIN it would show all of "a" and all of "b". Linking them where there's a match. So if you say there are "missing", then they are already not in the resultsets of "a" or "b". Maybe you think they are missing because you select based on the current date, and not on the current month? And also, normally it's better to join on an id instead of a name, because names could have different spelling or typos.
– LukStorms
Jan 2 at 9:30
1
Interesting, if the intention is to join on the same weekday, then with that day+1 that would only work with comparing specific years. To make that work for any current date, this might become more complicated. Like joining on the same week number & day of the week?
– LukStorms
Jan 2 at 9:41
1
The problem with questions regarding Amazon-redshift is that for the people that want to answer it then there doesn't seem to be a free online tester like rextester.com of db-fiddle for it. But HERE is a db-fiddle for a Postgres 8.4 (which should be close enough I guess). I think that if you look at the date criteria, and the results then with a full join all are there.
– LukStorms
Jan 4 at 15:13
What exactly is wrong with the results? And why would the sub-query "a" have a criteria on "transaction_date" but "b" on "date"?
– LukStorms
Jan 1 at 21:17
What exactly is wrong with the results? And why would the sub-query "a" have a criteria on "transaction_date" but "b" on "date"?
– LukStorms
Jan 1 at 21:17
I aggregate the results in Excel or Tableau for each day and 2018 show numbers show correctly but 2017 always is less than what it should be. The reason for the criteria is so that I can match the same day against previous year (with an added extra day so that we're matching on the same weekday e.g. Monday vs Monday).
– AK91
Jan 2 at 9:18
I aggregate the results in Excel or Tableau for each day and 2018 show numbers show correctly but 2017 always is less than what it should be. The reason for the criteria is so that I can match the same day against previous year (with an added extra day so that we're matching on the same weekday e.g. Monday vs Monday).
– AK91
Jan 2 at 9:18
1
1
With a FULL JOIN it would show all of "a" and all of "b". Linking them where there's a match. So if you say there are "missing", then they are already not in the resultsets of "a" or "b". Maybe you think they are missing because you select based on the current date, and not on the current month? And also, normally it's better to join on an id instead of a name, because names could have different spelling or typos.
– LukStorms
Jan 2 at 9:30
With a FULL JOIN it would show all of "a" and all of "b". Linking them where there's a match. So if you say there are "missing", then they are already not in the resultsets of "a" or "b". Maybe you think they are missing because you select based on the current date, and not on the current month? And also, normally it's better to join on an id instead of a name, because names could have different spelling or typos.
– LukStorms
Jan 2 at 9:30
1
1
Interesting, if the intention is to join on the same weekday, then with that day+1 that would only work with comparing specific years. To make that work for any current date, this might become more complicated. Like joining on the same week number & day of the week?
– LukStorms
Jan 2 at 9:41
Interesting, if the intention is to join on the same weekday, then with that day+1 that would only work with comparing specific years. To make that work for any current date, this might become more complicated. Like joining on the same week number & day of the week?
– LukStorms
Jan 2 at 9:41
1
1
The problem with questions regarding Amazon-redshift is that for the people that want to answer it then there doesn't seem to be a free online tester like rextester.com of db-fiddle for it. But HERE is a db-fiddle for a Postgres 8.4 (which should be close enough I guess). I think that if you look at the date criteria, and the results then with a full join all are there.
– LukStorms
Jan 4 at 15:13
The problem with questions regarding Amazon-redshift is that for the people that want to answer it then there doesn't seem to be a free online tester like rextester.com of db-fiddle for it. But HERE is a db-fiddle for a Postgres 8.4 (which should be close enough I guess). I think that if you look at the date criteria, and the results then with a full join all are there.
– LukStorms
Jan 4 at 15:13
|
show 3 more comments
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53998245%2fhow-to-self-join-on-date-calculation-and-other-fields-in-redshift%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53998245%2fhow-to-self-join-on-date-calculation-and-other-fields-in-redshift%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What exactly is wrong with the results? And why would the sub-query "a" have a criteria on "transaction_date" but "b" on "date"?
– LukStorms
Jan 1 at 21:17
I aggregate the results in Excel or Tableau for each day and 2018 show numbers show correctly but 2017 always is less than what it should be. The reason for the criteria is so that I can match the same day against previous year (with an added extra day so that we're matching on the same weekday e.g. Monday vs Monday).
– AK91
Jan 2 at 9:18
1
With a FULL JOIN it would show all of "a" and all of "b". Linking them where there's a match. So if you say there are "missing", then they are already not in the resultsets of "a" or "b". Maybe you think they are missing because you select based on the current date, and not on the current month? And also, normally it's better to join on an id instead of a name, because names could have different spelling or typos.
– LukStorms
Jan 2 at 9:30
1
Interesting, if the intention is to join on the same weekday, then with that day+1 that would only work with comparing specific years. To make that work for any current date, this might become more complicated. Like joining on the same week number & day of the week?
– LukStorms
Jan 2 at 9:41
1
The problem with questions regarding Amazon-redshift is that for the people that want to answer it then there doesn't seem to be a free online tester like rextester.com of db-fiddle for it. But HERE is a db-fiddle for a Postgres 8.4 (which should be close enough I guess). I think that if you look at the date criteria, and the results then with a full join all are there.
– LukStorms
Jan 4 at 15:13