table in Athena not update with vpc logs

I have VPC flow log which the destination for it is S3, with S3 bucket = vpc_logs. under this bucket, the vpc records are upload under the folder AWSLogs/accountId/vpcflowlogs/region/year/month/day/log_file_name.log.gz.
I want to be able to view the records with Athena as suggested in AWS but the query suggested is:

CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (

  version int,

  account string,

  interfaceid string,

  sourceaddress string,

  destinationaddress string,

  sourceport int,

  destinationport int,

  protocol int,

  numpackets int,

  numbytes bigint,

  starttime int,

  endtime int,

  action string,

  logstatus string

)  

PARTITIONED BY (dt string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

LOCATION 's3://bucket_name/AWSLogs/account_id/vpcflowlogs/region/'

TBLPROPERTIES ("skip.header.line.count"="1");

then when i want to view my record, i am getting 0 records. I believe cause that the information located under the folders.. this is right query to run?

asked Jan 3 at 13:37

meital

113

add a comment |

CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (

  version int,

  account string,

  interfaceid string,

  sourceaddress string,

  destinationaddress string,

  sourceport int,

  destinationport int,

  protocol int,

  numpackets int,

  numbytes bigint,

  starttime int,

  endtime int,

  action string,

  logstatus string

)  

PARTITIONED BY (dt string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

LOCATION 's3://bucket_name/AWSLogs/account_id/vpcflowlogs/region/'

TBLPROPERTIES ("skip.header.line.count"="1");

then when i want to view my record, i am getting 0 records. I believe cause that the information located under the folders.. this is right query to run?

asked Jan 3 at 13:37

meital

113

add a comment |

CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (

  version int,

  account string,

  interfaceid string,

  sourceaddress string,

  destinationaddress string,

  sourceport int,

  destinationport int,

  protocol int,

  numpackets int,

  numbytes bigint,

  starttime int,

  endtime int,

  action string,

  logstatus string

)  

PARTITIONED BY (dt string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

LOCATION 's3://bucket_name/AWSLogs/account_id/vpcflowlogs/region/'

TBLPROPERTIES ("skip.header.line.count"="1");

then when i want to view my record, i am getting 0 records. I believe cause that the information located under the folders.. this is right query to run?

asked Jan 3 at 13:37

meital

113

CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (

  version int,

  account string,

  interfaceid string,

  sourceaddress string,

  destinationaddress string,

  sourceport int,

  destinationport int,

  protocol int,

  numpackets int,

  numbytes bigint,

  starttime int,

  endtime int,

  action string,

  logstatus string

)  

PARTITIONED BY (dt string)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

LOCATION 's3://bucket_name/AWSLogs/account_id/vpcflowlogs/region/'

TBLPROPERTIES ("skip.header.line.count"="1");

then when i want to view my record, i am getting 0 records. I believe cause that the information located under the folders.. this is right query to run?

amazon-athena vpc

asked Jan 3 at 13:37

meital

113

asked Jan 3 at 13:37

meital

113

asked Jan 3 at 13:37

meital

113

asked Jan 3 at 13:37

meital

113

asked Jan 3 at 13:37

meital

113

add a comment |

1 Answer
1

active

oldest

votes

The reason why your table is not fetching you any records is the s3 directory structure that you are following.As your table is partitioned and if you want Athena to automatically detect the partitions then the s3 directory structure should be like below :

s3://AWSLogs/.../.../.../year=2018/month=01/day=01/

If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:

msck repair table vpc_flow_logs

If you don't have control over the directory structure then refer to "Partition Your Data in Athena for Improved Query Performance and Reduced Costs" in this link which explains how to add partitions when you don't have hive style partitioning directory structure.

One more simpler but little expensive way is to run a crawler which will automatically create table and partitions even you have below directory structure.All you need to do is pass " s3://AWSLogs/accountId/vpcflowlogs/region/" to your crawler as input.

s3://.../.../.../.../year/month/day/

answered Jan 4 at 8:56

bdcloud

452411

if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

– meital
Jan 6 at 14:50

Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

– bdcloud
Jan 6 at 14:52

@meital Can you mark it answered if this solution helped in solving your problem ?

– bdcloud
Jan 14 at 0:50

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54023391%2ftable-in-athena-not-update-with-vpc-logs%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

s3://AWSLogs/.../.../.../year=2018/month=01/day=01/

If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:

msck repair table vpc_flow_logs

s3://.../.../.../.../year/month/day/

answered Jan 4 at 8:56

bdcloud

452411

if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

– meital
Jan 6 at 14:50

Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

– bdcloud
Jan 6 at 14:52

@meital Can you mark it answered if this solution helped in solving your problem ?

– bdcloud
Jan 14 at 0:50

add a comment |

s3://AWSLogs/.../.../.../year=2018/month=01/day=01/

If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:

msck repair table vpc_flow_logs

s3://.../.../.../.../year/month/day/

answered Jan 4 at 8:56

bdcloud

452411

if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

– meital
Jan 6 at 14:50

Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

– bdcloud
Jan 6 at 14:52

@meital Can you mark it answered if this solution helped in solving your problem ?

– bdcloud
Jan 14 at 0:50

add a comment |

s3://AWSLogs/.../.../.../year=2018/month=01/day=01/

If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:

msck repair table vpc_flow_logs

s3://.../.../.../.../year/month/day/

answered Jan 4 at 8:56

bdcloud

452411

s3://AWSLogs/.../.../.../year=2018/month=01/day=01/

If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:

msck repair table vpc_flow_logs

s3://.../.../.../.../year/month/day/

answered Jan 4 at 8:56

bdcloud

452411

answered Jan 4 at 8:56

bdcloud

452411

answered Jan 4 at 8:56

bdcloud

452411

answered Jan 4 at 8:56

bdcloud

452411

if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

– meital
Jan 6 at 14:50

Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

– bdcloud
Jan 6 at 14:52

@meital Can you mark it answered if this solution helped in solving your problem ?

– bdcloud
Jan 14 at 0:50

add a comment |

if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

– meital
Jan 6 at 14:50

Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

– bdcloud
Jan 6 at 14:52

@meital Can you mark it answered if this solution helped in solving your problem ?

– bdcloud
Jan 14 at 0:50

if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

– meital
Jan 6 at 14:50

Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

– bdcloud
Jan 6 at 14:52

@meital Can you mark it answered if this solution helped in solving your problem ?

– bdcloud
Jan 14 at 0:50

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk