table in Athena not update with vpc logs












0















I have VPC flow log which the destination for it is S3, with S3 bucket = vpc_logs. under this bucket, the vpc records are upload under the folder AWSLogs/accountId/vpcflowlogs/region/year/month/day/log_file_name.log.gz.
I want to be able to view the records with Athena as suggested in AWS but the query suggested is:



CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
version int,
account string,
interfaceid string,
sourceaddress string,
destinationaddress string,
sourceport int,
destinationport int,
protocol int,
numpackets int,
numbytes bigint,
starttime int,
endtime int,
action string,
logstatus string
)
PARTITIONED BY (dt string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://bucket_name/AWSLogs/account_id/vpcflowlogs/region/'
TBLPROPERTIES ("skip.header.line.count"="1");


then when i want to view my record, i am getting 0 records. I believe cause that the information located under the folders.. this is right query to run?










share|improve this question



























    0















    I have VPC flow log which the destination for it is S3, with S3 bucket = vpc_logs. under this bucket, the vpc records are upload under the folder AWSLogs/accountId/vpcflowlogs/region/year/month/day/log_file_name.log.gz.
    I want to be able to view the records with Athena as suggested in AWS but the query suggested is:



    CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
    version int,
    account string,
    interfaceid string,
    sourceaddress string,
    destinationaddress string,
    sourceport int,
    destinationport int,
    protocol int,
    numpackets int,
    numbytes bigint,
    starttime int,
    endtime int,
    action string,
    logstatus string
    )
    PARTITIONED BY (dt string)
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ' '
    LOCATION 's3://bucket_name/AWSLogs/account_id/vpcflowlogs/region/'
    TBLPROPERTIES ("skip.header.line.count"="1");


    then when i want to view my record, i am getting 0 records. I believe cause that the information located under the folders.. this is right query to run?










    share|improve this question

























      0












      0








      0








      I have VPC flow log which the destination for it is S3, with S3 bucket = vpc_logs. under this bucket, the vpc records are upload under the folder AWSLogs/accountId/vpcflowlogs/region/year/month/day/log_file_name.log.gz.
      I want to be able to view the records with Athena as suggested in AWS but the query suggested is:



      CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
      version int,
      account string,
      interfaceid string,
      sourceaddress string,
      destinationaddress string,
      sourceport int,
      destinationport int,
      protocol int,
      numpackets int,
      numbytes bigint,
      starttime int,
      endtime int,
      action string,
      logstatus string
      )
      PARTITIONED BY (dt string)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ' '
      LOCATION 's3://bucket_name/AWSLogs/account_id/vpcflowlogs/region/'
      TBLPROPERTIES ("skip.header.line.count"="1");


      then when i want to view my record, i am getting 0 records. I believe cause that the information located under the folders.. this is right query to run?










      share|improve this question














      I have VPC flow log which the destination for it is S3, with S3 bucket = vpc_logs. under this bucket, the vpc records are upload under the folder AWSLogs/accountId/vpcflowlogs/region/year/month/day/log_file_name.log.gz.
      I want to be able to view the records with Athena as suggested in AWS but the query suggested is:



      CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
      version int,
      account string,
      interfaceid string,
      sourceaddress string,
      destinationaddress string,
      sourceport int,
      destinationport int,
      protocol int,
      numpackets int,
      numbytes bigint,
      starttime int,
      endtime int,
      action string,
      logstatus string
      )
      PARTITIONED BY (dt string)
      ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ' '
      LOCATION 's3://bucket_name/AWSLogs/account_id/vpcflowlogs/region/'
      TBLPROPERTIES ("skip.header.line.count"="1");


      then when i want to view my record, i am getting 0 records. I believe cause that the information located under the folders.. this is right query to run?







      amazon-athena vpc






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Jan 3 at 13:37









      meitalmeital

      113




      113
























          1 Answer
          1






          active

          oldest

          votes


















          0














          The reason why your table is not fetching you any records is the s3 directory structure that you are following.As your table is partitioned and if you want Athena to automatically detect the partitions then the s3 directory structure should be like below :



          s3://AWSLogs/.../.../.../year=2018/month=01/day=01/


          If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:



          msck repair table vpc_flow_logs


          If you don't have control over the directory structure then refer to "Partition Your Data in Athena for Improved Query Performance and Reduced Costs" in this link which explains how to add partitions when you don't have hive style partitioning directory structure.



          One more simpler but little expensive way is to run a crawler which will automatically create table and partitions even you have below directory structure.All you need to do is pass " s3://AWSLogs/accountId/vpcflowlogs/region/" to your crawler as input.



          s3://.../.../.../.../year/month/day/





          share|improve this answer
























          • if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

            – meital
            Jan 6 at 14:50











          • Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

            – bdcloud
            Jan 6 at 14:52











          • @meital Can you mark it answered if this solution helped in solving your problem ?

            – bdcloud
            Jan 14 at 0:50












          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54023391%2ftable-in-athena-not-update-with-vpc-logs%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          0














          The reason why your table is not fetching you any records is the s3 directory structure that you are following.As your table is partitioned and if you want Athena to automatically detect the partitions then the s3 directory structure should be like below :



          s3://AWSLogs/.../.../.../year=2018/month=01/day=01/


          If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:



          msck repair table vpc_flow_logs


          If you don't have control over the directory structure then refer to "Partition Your Data in Athena for Improved Query Performance and Reduced Costs" in this link which explains how to add partitions when you don't have hive style partitioning directory structure.



          One more simpler but little expensive way is to run a crawler which will automatically create table and partitions even you have below directory structure.All you need to do is pass " s3://AWSLogs/accountId/vpcflowlogs/region/" to your crawler as input.



          s3://.../.../.../.../year/month/day/





          share|improve this answer
























          • if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

            – meital
            Jan 6 at 14:50











          • Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

            – bdcloud
            Jan 6 at 14:52











          • @meital Can you mark it answered if this solution helped in solving your problem ?

            – bdcloud
            Jan 14 at 0:50
















          0














          The reason why your table is not fetching you any records is the s3 directory structure that you are following.As your table is partitioned and if you want Athena to automatically detect the partitions then the s3 directory structure should be like below :



          s3://AWSLogs/.../.../.../year=2018/month=01/day=01/


          If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:



          msck repair table vpc_flow_logs


          If you don't have control over the directory structure then refer to "Partition Your Data in Athena for Improved Query Performance and Reduced Costs" in this link which explains how to add partitions when you don't have hive style partitioning directory structure.



          One more simpler but little expensive way is to run a crawler which will automatically create table and partitions even you have below directory structure.All you need to do is pass " s3://AWSLogs/accountId/vpcflowlogs/region/" to your crawler as input.



          s3://.../.../.../.../year/month/day/





          share|improve this answer
























          • if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

            – meital
            Jan 6 at 14:50











          • Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

            – bdcloud
            Jan 6 at 14:52











          • @meital Can you mark it answered if this solution helped in solving your problem ?

            – bdcloud
            Jan 14 at 0:50














          0












          0








          0







          The reason why your table is not fetching you any records is the s3 directory structure that you are following.As your table is partitioned and if you want Athena to automatically detect the partitions then the s3 directory structure should be like below :



          s3://AWSLogs/.../.../.../year=2018/month=01/day=01/


          If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:



          msck repair table vpc_flow_logs


          If you don't have control over the directory structure then refer to "Partition Your Data in Athena for Improved Query Performance and Reduced Costs" in this link which explains how to add partitions when you don't have hive style partitioning directory structure.



          One more simpler but little expensive way is to run a crawler which will automatically create table and partitions even you have below directory structure.All you need to do is pass " s3://AWSLogs/accountId/vpcflowlogs/region/" to your crawler as input.



          s3://.../.../.../.../year/month/day/





          share|improve this answer













          The reason why your table is not fetching you any records is the s3 directory structure that you are following.As your table is partitioned and if you want Athena to automatically detect the partitions then the s3 directory structure should be like below :



          s3://AWSLogs/.../.../.../year=2018/month=01/day=01/


          If you have above key value pair structure in s3 then after the table created, partitions can be automatically added using below query:



          msck repair table vpc_flow_logs


          If you don't have control over the directory structure then refer to "Partition Your Data in Athena for Improved Query Performance and Reduced Costs" in this link which explains how to add partitions when you don't have hive style partitioning directory structure.



          One more simpler but little expensive way is to run a crawler which will automatically create table and partitions even you have below directory structure.All you need to do is pass " s3://AWSLogs/accountId/vpcflowlogs/region/" to your crawler as input.



          s3://.../.../.../.../year/month/day/






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jan 4 at 8:56









          bdcloudbdcloud

          452411




          452411













          • if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

            – meital
            Jan 6 at 14:50











          • Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

            – bdcloud
            Jan 6 at 14:52











          • @meital Can you mark it answered if this solution helped in solving your problem ?

            – bdcloud
            Jan 14 at 0:50



















          • if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

            – meital
            Jan 6 at 14:50











          • Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

            – bdcloud
            Jan 6 at 14:52











          • @meital Can you mark it answered if this solution helped in solving your problem ?

            – bdcloud
            Jan 14 at 0:50

















          if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

          – meital
          Jan 6 at 14:50





          if I want to avoid to search by specific date, I need to remove the partition? I want to be able to get full data.

          – meital
          Jan 6 at 14:50













          Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

          – bdcloud
          Jan 6 at 14:52





          Yes that should work.But if your data is huge then you might end up scanning huge amount of data if you want to even search for specific date.

          – bdcloud
          Jan 6 at 14:52













          @meital Can you mark it answered if this solution helped in solving your problem ?

          – bdcloud
          Jan 14 at 0:50





          @meital Can you mark it answered if this solution helped in solving your problem ?

          – bdcloud
          Jan 14 at 0:50




















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54023391%2ftable-in-athena-not-update-with-vpc-logs%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Monofisismo

          Angular Downloading a file using contenturl with Basic Authentication

          Olmecas