table in Athena not update with vpc logs
I have VPC flow log which the destination for it is S3, with S3 bucket = vpc_logs. under this bucket, the vpc records are upload under the folder AWSLogs/accountId/vpcflowlogs/region/year/month/day/log_file_name.log.gz.
I want to be able to view the records with Athena as suggested in AWS but the query suggested is:
CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
version int,
account string,
interfaceid string,
sourceaddress string,
destinationaddress string,
sourceport int,
destinationport int,
protocol int,
numpackets int,
numbytes bigint,
starttime int,
endtime int,
action string,
logstatus string
)
PARTITIONED BY (dt string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://bucket_name/AWSLogs/account_id/vpcflowlogs/region/'
TBLPROPERTIES ("skip.header.line.count"="1");
then when i want to view my record, i am getting 0 records. I believe cause that the information located under the folders.. this is right query to run?
amazon-athena vpc
add a comment |
I have VPC flow log which the destination for it is S3, with S3 bucket = vpc_logs. under this bucket, the vpc records are upload under the folder AWSLogs/accountId/vpcflowlogs/region/year/month/day/log_file_name.log.gz.
I want to be able to view the records with Athena as suggested in AWS but the query suggested is:
CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
version int,
account string,
interfaceid string,
sourceaddress string,
destinationaddress string,
sourceport int,
destinationport int,
protocol int,
numpackets int,
numbytes bigint,
starttime int,
endtime int,
action string,
logstatus string
)
PARTITIONED BY (dt string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://bucket_name/AWSLogs/account_id/vpcflowlogs/region/'
TBLPROPERTIES ("skip.header.line.count"="1");
then when i want to view my record, i am getting 0 records. I believe cause that the information located under the folders.. this is right query to run?
amazon-athena vpc
add a comment |
I have VPC flow log which the destination for it is S3, with S3 bucket = vpc_logs. under this bucket, the vpc records are upload under the folder AWSLogs/accountId/vpcflowlogs/region/year/month/day/log_file_name.log.gz.
I want to be able to view the records with Athena as suggested in AWS but the query suggested is:
CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
version int,
account string,
interfaceid string,
sourceaddress string,
destinationaddress string,
sourceport int,
destinationport int,
protocol int,
numpackets int,
numbytes bigint,
starttime int,
endtime int,
action string,
logstatus string
)
PARTITIONED BY (dt string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://bucket_name/AWSLogs/account_id/vpcflowlogs/region/'
TBLPROPERTIES ("skip.header.line.count"="1");
then when i want to view my record, i am getting 0 records. I believe cause that the information located under the folders.. this is right query to run?
amazon-athena vpc
I have VPC flow log which the destination for it is S3, with S3 bucket = vpc_logs. under this bucket, the vpc records are upload under the folder AWSLogs/accountId/vpcflowlogs/region/year/month/day/log_file_name.log.gz.
I want to be able to view the records with Athena as suggested in AWS but the query suggested is:
CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
version int,
account string,
interfaceid string,
sourceaddress string,
destinationaddress string,
sourceport int,
destinationport int,
protocol int,
numpackets int,
numbytes bigint,
starttime int,
endtime int,
action string,
logstatus string
)
PARTITIONED BY (dt string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://bucket_name/AWSLogs/account_id/vpcflowlogs/region/'
TBLPROPERTIES ("skip.header.line.count"="1");
then when i want to view my record, i am getting 0 records. I believe cause that the information located under the folders.. this is right query to run?
amazon-athena vpc
amazon-athena vpc
asked Jan 3 at 13:37
meitalmeital
113
113
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
The reason why your table is not fetching you any records is the s3 directory structure that you are following.As your table is partitioned and if you want Athena to automatically detect the partitions then the s3 directory structure should be like below :
s3://AWSLogs/.../.../.../year=2018/month=01/day=01/
If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:
msck repair table vpc_flow_logs
If you don't have control over the directory structure then refer to "Partition Your Data in Athena for Improved Query Performance and Reduced Costs" in this link which explains how to add partitions when you don't have hive style partitioning directory structure.
One more simpler but little expensive way is to run a crawler which will automatically create table and partitions even you have below directory structure.All you need to do is pass " s3://AWSLogs/accountId/vpcflowlogs/region/" to your crawler as input.
s3://.../.../.../.../year/month/day/
if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.
– meital
Jan 6 at 14:50
Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.
– bdcloud
Jan 6 at 14:52
@meital Can you mark it answered if this solution helped in solving your problem ?
– bdcloud
Jan 14 at 0:50
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54023391%2ftable-in-athena-not-update-with-vpc-logs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
The reason why your table is not fetching you any records is the s3 directory structure that you are following.As your table is partitioned and if you want Athena to automatically detect the partitions then the s3 directory structure should be like below :
s3://AWSLogs/.../.../.../year=2018/month=01/day=01/
If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:
msck repair table vpc_flow_logs
If you don't have control over the directory structure then refer to "Partition Your Data in Athena for Improved Query Performance and Reduced Costs" in this link which explains how to add partitions when you don't have hive style partitioning directory structure.
One more simpler but little expensive way is to run a crawler which will automatically create table and partitions even you have below directory structure.All you need to do is pass " s3://AWSLogs/accountId/vpcflowlogs/region/" to your crawler as input.
s3://.../.../.../.../year/month/day/
if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.
– meital
Jan 6 at 14:50
Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.
– bdcloud
Jan 6 at 14:52
@meital Can you mark it answered if this solution helped in solving your problem ?
– bdcloud
Jan 14 at 0:50
add a comment |
The reason why your table is not fetching you any records is the s3 directory structure that you are following.As your table is partitioned and if you want Athena to automatically detect the partitions then the s3 directory structure should be like below :
s3://AWSLogs/.../.../.../year=2018/month=01/day=01/
If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:
msck repair table vpc_flow_logs
If you don't have control over the directory structure then refer to "Partition Your Data in Athena for Improved Query Performance and Reduced Costs" in this link which explains how to add partitions when you don't have hive style partitioning directory structure.
One more simpler but little expensive way is to run a crawler which will automatically create table and partitions even you have below directory structure.All you need to do is pass " s3://AWSLogs/accountId/vpcflowlogs/region/" to your crawler as input.
s3://.../.../.../.../year/month/day/
if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.
– meital
Jan 6 at 14:50
Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.
– bdcloud
Jan 6 at 14:52
@meital Can you mark it answered if this solution helped in solving your problem ?
– bdcloud
Jan 14 at 0:50
add a comment |
The reason why your table is not fetching you any records is the s3 directory structure that you are following.As your table is partitioned and if you want Athena to automatically detect the partitions then the s3 directory structure should be like below :
s3://AWSLogs/.../.../.../year=2018/month=01/day=01/
If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:
msck repair table vpc_flow_logs
If you don't have control over the directory structure then refer to "Partition Your Data in Athena for Improved Query Performance and Reduced Costs" in this link which explains how to add partitions when you don't have hive style partitioning directory structure.
One more simpler but little expensive way is to run a crawler which will automatically create table and partitions even you have below directory structure.All you need to do is pass " s3://AWSLogs/accountId/vpcflowlogs/region/" to your crawler as input.
s3://.../.../.../.../year/month/day/
The reason why your table is not fetching you any records is the s3 directory structure that you are following.As your table is partitioned and if you want Athena to automatically detect the partitions then the s3 directory structure should be like below :
s3://AWSLogs/.../.../.../year=2018/month=01/day=01/
If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:
msck repair table vpc_flow_logs
If you don't have control over the directory structure then refer to "Partition Your Data in Athena for Improved Query Performance and Reduced Costs" in this link which explains how to add partitions when you don't have hive style partitioning directory structure.
One more simpler but little expensive way is to run a crawler which will automatically create table and partitions even you have below directory structure.All you need to do is pass " s3://AWSLogs/accountId/vpcflowlogs/region/" to your crawler as input.
s3://.../.../.../.../year/month/day/
answered Jan 4 at 8:56
bdcloudbdcloud
452411
452411
if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.
– meital
Jan 6 at 14:50
Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.
– bdcloud
Jan 6 at 14:52
@meital Can you mark it answered if this solution helped in solving your problem ?
– bdcloud
Jan 14 at 0:50
add a comment |
if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.
– meital
Jan 6 at 14:50
Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.
– bdcloud
Jan 6 at 14:52
@meital Can you mark it answered if this solution helped in solving your problem ?
– bdcloud
Jan 14 at 0:50
if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.
– meital
Jan 6 at 14:50
if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.
– meital
Jan 6 at 14:50
Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.
– bdcloud
Jan 6 at 14:52
Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.
– bdcloud
Jan 6 at 14:52
@meital Can you mark it answered if this solution helped in solving your problem ?
– bdcloud
Jan 14 at 0:50
@meital Can you mark it answered if this solution helped in solving your problem ?
– bdcloud
Jan 14 at 0:50
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54023391%2ftable-in-athena-not-update-with-vpc-logs%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown