Data Loading csv file data into HBase, using Pig
I am trying to load the one CSV file to HBase table. I can successfully dump the data from CSV, but while importing into a table I am getting an error.
But, when loading other data, i can load any issues.
raw_data = LOAD
'hdfs://sandbox.hortonworks.com:8020/user/aravind/hbase/emp_data/emp_data.txt' USING PigStorage(',') AS (empno:int, ename:chararray, job:chararray, mgr:chararray, hiredate:chararray, sal:chararray, comm:chararray, deptno:int);
dump raw_data;
STORE raw_data INTO 'hbase://emp_tab' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'emp_info:ename
emp_info:job
emp_info:mgr
emp_info:hiredate
emp_info:sal
emp_info:comm
emp_info:deptno'
);
Output:
(7369,SMITH,CLERK,7902,13-06-93,800,0,20)
(7499,ALLEN,SALESMAN,7698,15-08-98,1600,300,30)
(7521,WARD,SALESMAN,7698,26-03-96,1250,500,30)
(7566,JONES,MANAGER,7839,31-10-95,2975,0,20)
(7698,BLAKE,MANAGER,7839,11-06-92,2850,0,30)
(7782,CLARK,MANAGER,7839,14-05-93,2450,0,10)
(7788,SCOTT,ANALYST,7566,05-03-96,3000,0,20)
(7839,KING,PRESIDENT,0,09-06-90,5000,0,10)
(7844,TURNER,SALESMAN,7698,04-06-95,1500,0,30)
(7876,ADAMS,CLERK,7788,04-06-99,1100,0,20)
(7900,JAMES,CLERK,7698,23-06-00,950,0,30)
(7934,MILLER,CLERK,7782,21-01-00,1300,0,10)
(7902,FORD,ANALYST,7566,05-12-97,3000,0,20)
(7654,MARTIN,SALESMAN,7698,05-12-98,1250,1400,30)
Then
STORE raw_data INTO 'hbase://emp_tab' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>> 'emp_info:ename
>> emp_info:job
>> emp_info:mgr
>> emp_info:hiredate
>> emp_info:sal
>> emp_info:comm
>> emp_info:deptno'
>> );
2018-12-31 13:45:55,581 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2018-12-31 13:45:55,594 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2018-12-31 13:45:55,594 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2018-12-31 13:45:55,628 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Tez staging directory is /tmp/temp-725993693
2018-12-31 13:45:55,628 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.plan.TezCompiler - File concatenation threshold: 100 optimistic? false
2018-12-31 13:45:55,645 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2018-12-31 13:45:55,645 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2018-12-31 13:45:55,649 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: htrace-core-3.1.0-incubating.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: metrics-core-2.2.0.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: protobuf-java-2.5.0.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: joda-time-2.8.2.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: zookeeper-3.4.6.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: antlr-runtime-3.4.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-hadoop-compat-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-protocol-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-common-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: netty-all-4.0.23.Final.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: guava-11.0.2.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-server-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: pig-0.15.0.2.3.2.0-2950-core-h2.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-client-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: automaton-1.11-8.jar
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.merge.percent to 0.66 from MR setting mapreduce.reduce.shuffle.merge.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.fetch.buffer.percent to 0.7 from MR setting mapreduce.reduce.shuffle.input.buffer.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.io.sort.mb to 64 from MR setting mapreduce.task.io.sort.mb
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.memory.limit.percent to 0.25 from MR setting mapreduce.reduce.shuffle.memory.limit.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.io.sort.factor to 100 from MR setting mapreduce.task.io.sort.factor
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.connect.timeout to 180000 from MR setting mapreduce.reduce.shuffle.connect.timeout
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.internal.sorter.class to org.apache.hadoop.util.QuickSort from MR setting map.sort.class
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.merge.progress.records to 10000 from MR setting mapreduce.task.merge.progress.records
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.compress to false from MR setting mapreduce.map.output.compress
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.sort.spill.percent to 0.7 from MR setting mapreduce.map.sort.spill.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.ssl.enable to false from MR setting mapreduce.shuffle.ssl.enabled
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.ifile.readahead to true from MR setting mapreduce.ifile.readahead
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.parallel.copies to 30 from MR setting mapreduce.reduce.shuffle.parallelcopies
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.ifile.readahead.bytes to 4194304 from MR setting mapreduce.ifile.readahead.bytes
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.task.input.post-merge.buffer.percent to 0.0 from MR setting mapreduce.reduce.input.buffer.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.read.timeout to 180000 from MR setting mapreduce.reduce.shuffle.read.timeout
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.compress.codec to org.apache.hadoop.io.compress.DefaultCodec from MR setting mapreduce.map.output.compress.codec
2018-12-31 13:45:55,706 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - For vertex - scope-673: parallelism=1, memory=250, java opts=-Xmx256m
2018-12-31 13:45:55,777 [PigTezLauncher-0] INFO org.apache.pig.tools.pigstats.tez.TezScriptState - Pig script settings are added to the job
2018-12-31 13:45:55,778 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Tez Client Version: [ component=tez-api, version=0.7.0.2.3.2.0-2950, revision=4900a9cea70487666ace4c9e490d4d8fc1fee96f, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=20150930-1859 ]
2018-12-31 13:45:55,836 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-12-31 13:45:55,837 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/192.168.61.158:8050
2018-12-31 13:45:55,837 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Using org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage Timeline ACLs
2018-12-31 13:45:55,878 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-12-31 13:45:55,878 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Session mode. Starting session.
2018-12-31 13:45:55,878 [PigTezLauncher-0] INFO org.apache.tez.client.TezClientUtils - Using tez.lib.uris value from configuration: /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz
2018-12-31 13:45:55,900 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Tez system stage directory hdfs://sandbox.hortonworks.com:8020/tmp/temp-725993693/.tez/application_1546253526030_0012 doesn't exist and is created
2018-12-31 13:45:55,919 [PigTezLauncher-0] INFO org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager - Created Timeline Domain for History ACLs, domainId=Tez_ATS_application_1546253526030_0012
2018-12-31 13:45:56,163 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1546253526030_0012
2018-12-31 13:45:56,165 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - The url to track the Tez Session: http://sandbox.hortonworks.com:8088/proxy/application_1546253526030_0012/
2018-12-31 13:46:00,233 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitting DAG PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66
2018-12-31 13:46:00,233 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Submitting dag to TezSession, sessionName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig, applicationId=application_1546253526030_0012, dagName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66
2018-12-31 13:46:00,487 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Submitted dag to TezSession, sessionName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig, applicationId=application_1546253526030_0012, dagName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66
2018-12-31 13:46:00,586 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-12-31 13:46:00,586 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/192.168.61.158:8050
2018-12-31 13:46:00,586 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitted DAG PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66. Application id: application_1546253526030_0012
2018-12-31 13:46:00,751 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - HadoopJobId: job_1546253526030_0012
2018-12-31 13:46:01,587 [Timer-33] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=RUNNING, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0, diagnostics=, counters=null
2018-12-31 13:46:15,741 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=FAILED, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1 Killed: 0 FailedTaskAttempts: 4, diagnostics=Vertex failed, vertexName=scope-673, vertexId=vertex_1546253526030_0012_1_00, diagnostics=[Task failed, taskId=task_1546253526030_0012_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1546253526030_0012_1_00 [scope-673] killed/failed due to:OWN_TASK_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0, counters=Counters: 4
org.apache.tez.common.counters.DAGCounter
NUM_FAILED_TASKS=4
TOTAL_LAUNCHED_TASKS=4
AM_CPU_MILLISECONDS=1310
AM_GC_TIME_MILLIS=17
2018-12-31 13:46:15,757 [PigTezLauncher-0] WARN org.apache.pig.tools.pigstats.JobStats - unable to find an output size reader
2018-12-31 13:46:15,761 [main] INFO org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:
HadoopVersion: 2.7.1.2.3.2.0-2950
PigVersion: 0.15.0.2.3.2.0-2950
TezVersion: 0.7.0.2.3.2.0-2950
UserId: root
FileName:
StartedAt: 2018-12-31 13:45:55
FinishedAt: 2018-12-31 13:46:15
Features: UNKNOWN
Failed!
DAG PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66:
ApplicationId: job_1546253526030_0012
TotalLaunchedTasks: 4
FileBytesRead: 0
FileBytesWritten: 0
HdfsBytesRead: 0
HdfsBytesWritten: 0
Input(s):
Failed to read data from "hdfs://sandbox.hortonworks.com:8020/user/aravind/hbase/emp_data/emp_data.txt"
Output(s):
Failed to produce result in "hbase://emp_tab"
grunt>
hive apache-pig loading
add a comment |
I am trying to load the one CSV file to HBase table. I can successfully dump the data from CSV, but while importing into a table I am getting an error.
But, when loading other data, i can load any issues.
raw_data = LOAD
'hdfs://sandbox.hortonworks.com:8020/user/aravind/hbase/emp_data/emp_data.txt' USING PigStorage(',') AS (empno:int, ename:chararray, job:chararray, mgr:chararray, hiredate:chararray, sal:chararray, comm:chararray, deptno:int);
dump raw_data;
STORE raw_data INTO 'hbase://emp_tab' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'emp_info:ename
emp_info:job
emp_info:mgr
emp_info:hiredate
emp_info:sal
emp_info:comm
emp_info:deptno'
);
Output:
(7369,SMITH,CLERK,7902,13-06-93,800,0,20)
(7499,ALLEN,SALESMAN,7698,15-08-98,1600,300,30)
(7521,WARD,SALESMAN,7698,26-03-96,1250,500,30)
(7566,JONES,MANAGER,7839,31-10-95,2975,0,20)
(7698,BLAKE,MANAGER,7839,11-06-92,2850,0,30)
(7782,CLARK,MANAGER,7839,14-05-93,2450,0,10)
(7788,SCOTT,ANALYST,7566,05-03-96,3000,0,20)
(7839,KING,PRESIDENT,0,09-06-90,5000,0,10)
(7844,TURNER,SALESMAN,7698,04-06-95,1500,0,30)
(7876,ADAMS,CLERK,7788,04-06-99,1100,0,20)
(7900,JAMES,CLERK,7698,23-06-00,950,0,30)
(7934,MILLER,CLERK,7782,21-01-00,1300,0,10)
(7902,FORD,ANALYST,7566,05-12-97,3000,0,20)
(7654,MARTIN,SALESMAN,7698,05-12-98,1250,1400,30)
Then
STORE raw_data INTO 'hbase://emp_tab' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>> 'emp_info:ename
>> emp_info:job
>> emp_info:mgr
>> emp_info:hiredate
>> emp_info:sal
>> emp_info:comm
>> emp_info:deptno'
>> );
2018-12-31 13:45:55,581 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2018-12-31 13:45:55,594 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2018-12-31 13:45:55,594 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2018-12-31 13:45:55,628 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Tez staging directory is /tmp/temp-725993693
2018-12-31 13:45:55,628 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.plan.TezCompiler - File concatenation threshold: 100 optimistic? false
2018-12-31 13:45:55,645 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2018-12-31 13:45:55,645 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2018-12-31 13:45:55,649 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: htrace-core-3.1.0-incubating.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: metrics-core-2.2.0.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: protobuf-java-2.5.0.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: joda-time-2.8.2.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: zookeeper-3.4.6.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: antlr-runtime-3.4.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-hadoop-compat-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-protocol-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-common-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: netty-all-4.0.23.Final.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: guava-11.0.2.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-server-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: pig-0.15.0.2.3.2.0-2950-core-h2.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-client-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: automaton-1.11-8.jar
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.merge.percent to 0.66 from MR setting mapreduce.reduce.shuffle.merge.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.fetch.buffer.percent to 0.7 from MR setting mapreduce.reduce.shuffle.input.buffer.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.io.sort.mb to 64 from MR setting mapreduce.task.io.sort.mb
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.memory.limit.percent to 0.25 from MR setting mapreduce.reduce.shuffle.memory.limit.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.io.sort.factor to 100 from MR setting mapreduce.task.io.sort.factor
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.connect.timeout to 180000 from MR setting mapreduce.reduce.shuffle.connect.timeout
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.internal.sorter.class to org.apache.hadoop.util.QuickSort from MR setting map.sort.class
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.merge.progress.records to 10000 from MR setting mapreduce.task.merge.progress.records
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.compress to false from MR setting mapreduce.map.output.compress
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.sort.spill.percent to 0.7 from MR setting mapreduce.map.sort.spill.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.ssl.enable to false from MR setting mapreduce.shuffle.ssl.enabled
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.ifile.readahead to true from MR setting mapreduce.ifile.readahead
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.parallel.copies to 30 from MR setting mapreduce.reduce.shuffle.parallelcopies
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.ifile.readahead.bytes to 4194304 from MR setting mapreduce.ifile.readahead.bytes
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.task.input.post-merge.buffer.percent to 0.0 from MR setting mapreduce.reduce.input.buffer.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.read.timeout to 180000 from MR setting mapreduce.reduce.shuffle.read.timeout
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.compress.codec to org.apache.hadoop.io.compress.DefaultCodec from MR setting mapreduce.map.output.compress.codec
2018-12-31 13:45:55,706 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - For vertex - scope-673: parallelism=1, memory=250, java opts=-Xmx256m
2018-12-31 13:45:55,777 [PigTezLauncher-0] INFO org.apache.pig.tools.pigstats.tez.TezScriptState - Pig script settings are added to the job
2018-12-31 13:45:55,778 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Tez Client Version: [ component=tez-api, version=0.7.0.2.3.2.0-2950, revision=4900a9cea70487666ace4c9e490d4d8fc1fee96f, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=20150930-1859 ]
2018-12-31 13:45:55,836 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-12-31 13:45:55,837 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/192.168.61.158:8050
2018-12-31 13:45:55,837 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Using org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage Timeline ACLs
2018-12-31 13:45:55,878 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-12-31 13:45:55,878 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Session mode. Starting session.
2018-12-31 13:45:55,878 [PigTezLauncher-0] INFO org.apache.tez.client.TezClientUtils - Using tez.lib.uris value from configuration: /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz
2018-12-31 13:45:55,900 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Tez system stage directory hdfs://sandbox.hortonworks.com:8020/tmp/temp-725993693/.tez/application_1546253526030_0012 doesn't exist and is created
2018-12-31 13:45:55,919 [PigTezLauncher-0] INFO org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager - Created Timeline Domain for History ACLs, domainId=Tez_ATS_application_1546253526030_0012
2018-12-31 13:45:56,163 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1546253526030_0012
2018-12-31 13:45:56,165 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - The url to track the Tez Session: http://sandbox.hortonworks.com:8088/proxy/application_1546253526030_0012/
2018-12-31 13:46:00,233 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitting DAG PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66
2018-12-31 13:46:00,233 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Submitting dag to TezSession, sessionName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig, applicationId=application_1546253526030_0012, dagName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66
2018-12-31 13:46:00,487 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Submitted dag to TezSession, sessionName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig, applicationId=application_1546253526030_0012, dagName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66
2018-12-31 13:46:00,586 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-12-31 13:46:00,586 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/192.168.61.158:8050
2018-12-31 13:46:00,586 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitted DAG PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66. Application id: application_1546253526030_0012
2018-12-31 13:46:00,751 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - HadoopJobId: job_1546253526030_0012
2018-12-31 13:46:01,587 [Timer-33] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=RUNNING, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0, diagnostics=, counters=null
2018-12-31 13:46:15,741 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=FAILED, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1 Killed: 0 FailedTaskAttempts: 4, diagnostics=Vertex failed, vertexName=scope-673, vertexId=vertex_1546253526030_0012_1_00, diagnostics=[Task failed, taskId=task_1546253526030_0012_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1546253526030_0012_1_00 [scope-673] killed/failed due to:OWN_TASK_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0, counters=Counters: 4
org.apache.tez.common.counters.DAGCounter
NUM_FAILED_TASKS=4
TOTAL_LAUNCHED_TASKS=4
AM_CPU_MILLISECONDS=1310
AM_GC_TIME_MILLIS=17
2018-12-31 13:46:15,757 [PigTezLauncher-0] WARN org.apache.pig.tools.pigstats.JobStats - unable to find an output size reader
2018-12-31 13:46:15,761 [main] INFO org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:
HadoopVersion: 2.7.1.2.3.2.0-2950
PigVersion: 0.15.0.2.3.2.0-2950
TezVersion: 0.7.0.2.3.2.0-2950
UserId: root
FileName:
StartedAt: 2018-12-31 13:45:55
FinishedAt: 2018-12-31 13:46:15
Features: UNKNOWN
Failed!
DAG PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66:
ApplicationId: job_1546253526030_0012
TotalLaunchedTasks: 4
FileBytesRead: 0
FileBytesWritten: 0
HdfsBytesRead: 0
HdfsBytesWritten: 0
Input(s):
Failed to read data from "hdfs://sandbox.hortonworks.com:8020/user/aravind/hbase/emp_data/emp_data.txt"
Output(s):
Failed to produce result in "hbase://emp_tab"
grunt>
hive apache-pig loading
Pretty sure you need commas between your column names in yourstore
command.
– Andrew
Dec 31 '18 at 15:16
add a comment |
I am trying to load the one CSV file to HBase table. I can successfully dump the data from CSV, but while importing into a table I am getting an error.
But, when loading other data, i can load any issues.
raw_data = LOAD
'hdfs://sandbox.hortonworks.com:8020/user/aravind/hbase/emp_data/emp_data.txt' USING PigStorage(',') AS (empno:int, ename:chararray, job:chararray, mgr:chararray, hiredate:chararray, sal:chararray, comm:chararray, deptno:int);
dump raw_data;
STORE raw_data INTO 'hbase://emp_tab' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'emp_info:ename
emp_info:job
emp_info:mgr
emp_info:hiredate
emp_info:sal
emp_info:comm
emp_info:deptno'
);
Output:
(7369,SMITH,CLERK,7902,13-06-93,800,0,20)
(7499,ALLEN,SALESMAN,7698,15-08-98,1600,300,30)
(7521,WARD,SALESMAN,7698,26-03-96,1250,500,30)
(7566,JONES,MANAGER,7839,31-10-95,2975,0,20)
(7698,BLAKE,MANAGER,7839,11-06-92,2850,0,30)
(7782,CLARK,MANAGER,7839,14-05-93,2450,0,10)
(7788,SCOTT,ANALYST,7566,05-03-96,3000,0,20)
(7839,KING,PRESIDENT,0,09-06-90,5000,0,10)
(7844,TURNER,SALESMAN,7698,04-06-95,1500,0,30)
(7876,ADAMS,CLERK,7788,04-06-99,1100,0,20)
(7900,JAMES,CLERK,7698,23-06-00,950,0,30)
(7934,MILLER,CLERK,7782,21-01-00,1300,0,10)
(7902,FORD,ANALYST,7566,05-12-97,3000,0,20)
(7654,MARTIN,SALESMAN,7698,05-12-98,1250,1400,30)
Then
STORE raw_data INTO 'hbase://emp_tab' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>> 'emp_info:ename
>> emp_info:job
>> emp_info:mgr
>> emp_info:hiredate
>> emp_info:sal
>> emp_info:comm
>> emp_info:deptno'
>> );
2018-12-31 13:45:55,581 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2018-12-31 13:45:55,594 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2018-12-31 13:45:55,594 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2018-12-31 13:45:55,628 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Tez staging directory is /tmp/temp-725993693
2018-12-31 13:45:55,628 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.plan.TezCompiler - File concatenation threshold: 100 optimistic? false
2018-12-31 13:45:55,645 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2018-12-31 13:45:55,645 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2018-12-31 13:45:55,649 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: htrace-core-3.1.0-incubating.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: metrics-core-2.2.0.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: protobuf-java-2.5.0.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: joda-time-2.8.2.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: zookeeper-3.4.6.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: antlr-runtime-3.4.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-hadoop-compat-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-protocol-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-common-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: netty-all-4.0.23.Final.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: guava-11.0.2.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-server-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: pig-0.15.0.2.3.2.0-2950-core-h2.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-client-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: automaton-1.11-8.jar
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.merge.percent to 0.66 from MR setting mapreduce.reduce.shuffle.merge.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.fetch.buffer.percent to 0.7 from MR setting mapreduce.reduce.shuffle.input.buffer.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.io.sort.mb to 64 from MR setting mapreduce.task.io.sort.mb
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.memory.limit.percent to 0.25 from MR setting mapreduce.reduce.shuffle.memory.limit.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.io.sort.factor to 100 from MR setting mapreduce.task.io.sort.factor
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.connect.timeout to 180000 from MR setting mapreduce.reduce.shuffle.connect.timeout
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.internal.sorter.class to org.apache.hadoop.util.QuickSort from MR setting map.sort.class
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.merge.progress.records to 10000 from MR setting mapreduce.task.merge.progress.records
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.compress to false from MR setting mapreduce.map.output.compress
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.sort.spill.percent to 0.7 from MR setting mapreduce.map.sort.spill.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.ssl.enable to false from MR setting mapreduce.shuffle.ssl.enabled
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.ifile.readahead to true from MR setting mapreduce.ifile.readahead
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.parallel.copies to 30 from MR setting mapreduce.reduce.shuffle.parallelcopies
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.ifile.readahead.bytes to 4194304 from MR setting mapreduce.ifile.readahead.bytes
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.task.input.post-merge.buffer.percent to 0.0 from MR setting mapreduce.reduce.input.buffer.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.read.timeout to 180000 from MR setting mapreduce.reduce.shuffle.read.timeout
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.compress.codec to org.apache.hadoop.io.compress.DefaultCodec from MR setting mapreduce.map.output.compress.codec
2018-12-31 13:45:55,706 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - For vertex - scope-673: parallelism=1, memory=250, java opts=-Xmx256m
2018-12-31 13:45:55,777 [PigTezLauncher-0] INFO org.apache.pig.tools.pigstats.tez.TezScriptState - Pig script settings are added to the job
2018-12-31 13:45:55,778 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Tez Client Version: [ component=tez-api, version=0.7.0.2.3.2.0-2950, revision=4900a9cea70487666ace4c9e490d4d8fc1fee96f, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=20150930-1859 ]
2018-12-31 13:45:55,836 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-12-31 13:45:55,837 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/192.168.61.158:8050
2018-12-31 13:45:55,837 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Using org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage Timeline ACLs
2018-12-31 13:45:55,878 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-12-31 13:45:55,878 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Session mode. Starting session.
2018-12-31 13:45:55,878 [PigTezLauncher-0] INFO org.apache.tez.client.TezClientUtils - Using tez.lib.uris value from configuration: /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz
2018-12-31 13:45:55,900 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Tez system stage directory hdfs://sandbox.hortonworks.com:8020/tmp/temp-725993693/.tez/application_1546253526030_0012 doesn't exist and is created
2018-12-31 13:45:55,919 [PigTezLauncher-0] INFO org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager - Created Timeline Domain for History ACLs, domainId=Tez_ATS_application_1546253526030_0012
2018-12-31 13:45:56,163 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1546253526030_0012
2018-12-31 13:45:56,165 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - The url to track the Tez Session: http://sandbox.hortonworks.com:8088/proxy/application_1546253526030_0012/
2018-12-31 13:46:00,233 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitting DAG PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66
2018-12-31 13:46:00,233 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Submitting dag to TezSession, sessionName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig, applicationId=application_1546253526030_0012, dagName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66
2018-12-31 13:46:00,487 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Submitted dag to TezSession, sessionName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig, applicationId=application_1546253526030_0012, dagName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66
2018-12-31 13:46:00,586 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-12-31 13:46:00,586 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/192.168.61.158:8050
2018-12-31 13:46:00,586 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitted DAG PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66. Application id: application_1546253526030_0012
2018-12-31 13:46:00,751 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - HadoopJobId: job_1546253526030_0012
2018-12-31 13:46:01,587 [Timer-33] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=RUNNING, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0, diagnostics=, counters=null
2018-12-31 13:46:15,741 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=FAILED, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1 Killed: 0 FailedTaskAttempts: 4, diagnostics=Vertex failed, vertexName=scope-673, vertexId=vertex_1546253526030_0012_1_00, diagnostics=[Task failed, taskId=task_1546253526030_0012_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1546253526030_0012_1_00 [scope-673] killed/failed due to:OWN_TASK_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0, counters=Counters: 4
org.apache.tez.common.counters.DAGCounter
NUM_FAILED_TASKS=4
TOTAL_LAUNCHED_TASKS=4
AM_CPU_MILLISECONDS=1310
AM_GC_TIME_MILLIS=17
2018-12-31 13:46:15,757 [PigTezLauncher-0] WARN org.apache.pig.tools.pigstats.JobStats - unable to find an output size reader
2018-12-31 13:46:15,761 [main] INFO org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:
HadoopVersion: 2.7.1.2.3.2.0-2950
PigVersion: 0.15.0.2.3.2.0-2950
TezVersion: 0.7.0.2.3.2.0-2950
UserId: root
FileName:
StartedAt: 2018-12-31 13:45:55
FinishedAt: 2018-12-31 13:46:15
Features: UNKNOWN
Failed!
DAG PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66:
ApplicationId: job_1546253526030_0012
TotalLaunchedTasks: 4
FileBytesRead: 0
FileBytesWritten: 0
HdfsBytesRead: 0
HdfsBytesWritten: 0
Input(s):
Failed to read data from "hdfs://sandbox.hortonworks.com:8020/user/aravind/hbase/emp_data/emp_data.txt"
Output(s):
Failed to produce result in "hbase://emp_tab"
grunt>
hive apache-pig loading
I am trying to load the one CSV file to HBase table. I can successfully dump the data from CSV, but while importing into a table I am getting an error.
But, when loading other data, i can load any issues.
raw_data = LOAD
'hdfs://sandbox.hortonworks.com:8020/user/aravind/hbase/emp_data/emp_data.txt' USING PigStorage(',') AS (empno:int, ename:chararray, job:chararray, mgr:chararray, hiredate:chararray, sal:chararray, comm:chararray, deptno:int);
dump raw_data;
STORE raw_data INTO 'hbase://emp_tab' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
'emp_info:ename
emp_info:job
emp_info:mgr
emp_info:hiredate
emp_info:sal
emp_info:comm
emp_info:deptno'
);
Output:
(7369,SMITH,CLERK,7902,13-06-93,800,0,20)
(7499,ALLEN,SALESMAN,7698,15-08-98,1600,300,30)
(7521,WARD,SALESMAN,7698,26-03-96,1250,500,30)
(7566,JONES,MANAGER,7839,31-10-95,2975,0,20)
(7698,BLAKE,MANAGER,7839,11-06-92,2850,0,30)
(7782,CLARK,MANAGER,7839,14-05-93,2450,0,10)
(7788,SCOTT,ANALYST,7566,05-03-96,3000,0,20)
(7839,KING,PRESIDENT,0,09-06-90,5000,0,10)
(7844,TURNER,SALESMAN,7698,04-06-95,1500,0,30)
(7876,ADAMS,CLERK,7788,04-06-99,1100,0,20)
(7900,JAMES,CLERK,7698,23-06-00,950,0,30)
(7934,MILLER,CLERK,7782,21-01-00,1300,0,10)
(7902,FORD,ANALYST,7566,05-12-97,3000,0,20)
(7654,MARTIN,SALESMAN,7698,05-12-98,1250,1400,30)
Then
STORE raw_data INTO 'hbase://emp_tab' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>> 'emp_info:ename
>> emp_info:job
>> emp_info:mgr
>> emp_info:hiredate
>> emp_info:sal
>> emp_info:comm
>> emp_info:deptno'
>> );
2018-12-31 13:45:55,581 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN
2018-12-31 13:45:55,594 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2018-12-31 13:45:55,594 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]}
2018-12-31 13:45:55,628 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Tez staging directory is /tmp/temp-725993693
2018-12-31 13:45:55,628 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.plan.TezCompiler - File concatenation threshold: 100 optimistic? false
2018-12-31 13:45:55,645 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2018-12-31 13:45:55,645 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
2018-12-31 13:45:55,649 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: htrace-core-3.1.0-incubating.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: metrics-core-2.2.0.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: protobuf-java-2.5.0.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: joda-time-2.8.2.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: zookeeper-3.4.6.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: antlr-runtime-3.4.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-hadoop-compat-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-protocol-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-common-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: netty-all-4.0.23.Final.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: guava-11.0.2.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-server-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: pig-0.15.0.2.3.2.0-2950-core-h2.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: hbase-client-1.1.2.2.3.2.0-2950.jar
2018-12-31 13:45:55,671 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - Local resource: automaton-1.11-8.jar
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.merge.percent to 0.66 from MR setting mapreduce.reduce.shuffle.merge.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.fetch.buffer.percent to 0.7 from MR setting mapreduce.reduce.shuffle.input.buffer.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.io.sort.mb to 64 from MR setting mapreduce.task.io.sort.mb
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.memory.limit.percent to 0.25 from MR setting mapreduce.reduce.shuffle.memory.limit.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.io.sort.factor to 100 from MR setting mapreduce.task.io.sort.factor
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.connect.timeout to 180000 from MR setting mapreduce.reduce.shuffle.connect.timeout
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.internal.sorter.class to org.apache.hadoop.util.QuickSort from MR setting map.sort.class
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.merge.progress.records to 10000 from MR setting mapreduce.task.merge.progress.records
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.compress to false from MR setting mapreduce.map.output.compress
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.sort.spill.percent to 0.7 from MR setting mapreduce.map.sort.spill.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.ssl.enable to false from MR setting mapreduce.shuffle.ssl.enabled
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.ifile.readahead to true from MR setting mapreduce.ifile.readahead
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.parallel.copies to 30 from MR setting mapreduce.reduce.shuffle.parallelcopies
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.ifile.readahead.bytes to 4194304 from MR setting mapreduce.ifile.readahead.bytes
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.task.input.post-merge.buffer.percent to 0.0 from MR setting mapreduce.reduce.input.buffer.percent
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.shuffle.read.timeout to 180000 from MR setting mapreduce.reduce.shuffle.read.timeout
2018-12-31 13:45:55,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.util.MRToTezHelper - Setting tez.runtime.compress.codec to org.apache.hadoop.io.compress.DefaultCodec from MR setting mapreduce.map.output.compress.codec
2018-12-31 13:45:55,706 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJobCompiler - For vertex - scope-673: parallelism=1, memory=250, java opts=-Xmx256m
2018-12-31 13:45:55,777 [PigTezLauncher-0] INFO org.apache.pig.tools.pigstats.tez.TezScriptState - Pig script settings are added to the job
2018-12-31 13:45:55,778 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Tez Client Version: [ component=tez-api, version=0.7.0.2.3.2.0-2950, revision=4900a9cea70487666ace4c9e490d4d8fc1fee96f, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=20150930-1859 ]
2018-12-31 13:45:55,836 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-12-31 13:45:55,837 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/192.168.61.158:8050
2018-12-31 13:45:55,837 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Using org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager to manage Timeline ACLs
2018-12-31 13:45:55,878 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-12-31 13:45:55,878 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Session mode. Starting session.
2018-12-31 13:45:55,878 [PigTezLauncher-0] INFO org.apache.tez.client.TezClientUtils - Using tez.lib.uris value from configuration: /hdp/apps/2.3.2.0-2950/tez/tez.tar.gz
2018-12-31 13:45:55,900 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Tez system stage directory hdfs://sandbox.hortonworks.com:8020/tmp/temp-725993693/.tez/application_1546253526030_0012 doesn't exist and is created
2018-12-31 13:45:55,919 [PigTezLauncher-0] INFO org.apache.tez.dag.history.ats.acls.ATSHistoryACLPolicyManager - Created Timeline Domain for History ACLs, domainId=Tez_ATS_application_1546253526030_0012
2018-12-31 13:45:56,163 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1546253526030_0012
2018-12-31 13:45:56,165 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - The url to track the Tez Session: http://sandbox.hortonworks.com:8088/proxy/application_1546253526030_0012/
2018-12-31 13:46:00,233 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitting DAG PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66
2018-12-31 13:46:00,233 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Submitting dag to TezSession, sessionName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig, applicationId=application_1546253526030_0012, dagName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66
2018-12-31 13:46:00,487 [PigTezLauncher-0] INFO org.apache.tez.client.TezClient - Submitted dag to TezSession, sessionName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig, applicationId=application_1546253526030_0012, dagName=PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66
2018-12-31 13:46:00,586 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
2018-12-31 13:46:00,586 [PigTezLauncher-0] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/192.168.61.158:8050
2018-12-31 13:46:00,586 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - Submitted DAG PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66. Application id: application_1546253526030_0012
2018-12-31 13:46:00,751 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - HadoopJobId: job_1546253526030_0012
2018-12-31 13:46:01,587 [Timer-33] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=RUNNING, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0 Killed: 0, diagnostics=, counters=null
2018-12-31 13:46:15,741 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=FAILED, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1 Killed: 0 FailedTaskAttempts: 4, diagnostics=Vertex failed, vertexName=scope-673, vertexId=vertex_1546253526030_0012_1_00, diagnostics=[Task failed, taskId=task_1546253526030_0012_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 2 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:1005)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:136)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:95)
at org.apache.tez.mapreduce.output.MROutput$1.write(MROutput.java:503)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:125)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:319)
at org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:196)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1546253526030_0012_1_00 [scope-673] killed/failed due to:OWN_TASK_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0, counters=Counters: 4
org.apache.tez.common.counters.DAGCounter
NUM_FAILED_TASKS=4
TOTAL_LAUNCHED_TASKS=4
AM_CPU_MILLISECONDS=1310
AM_GC_TIME_MILLIS=17
2018-12-31 13:46:15,757 [PigTezLauncher-0] WARN org.apache.pig.tools.pigstats.JobStats - unable to find an output size reader
2018-12-31 13:46:15,761 [main] INFO org.apache.pig.tools.pigstats.tez.TezPigScriptStats - Script Statistics:
HadoopVersion: 2.7.1.2.3.2.0-2950
PigVersion: 0.15.0.2.3.2.0-2950
TezVersion: 0.7.0.2.3.2.0-2950
UserId: root
FileName:
StartedAt: 2018-12-31 13:45:55
FinishedAt: 2018-12-31 13:46:15
Features: UNKNOWN
Failed!
DAG PigLatin:/home/hdfs/aravind/hbase/emp_data.pig-0_scope-66:
ApplicationId: job_1546253526030_0012
TotalLaunchedTasks: 4
FileBytesRead: 0
FileBytesWritten: 0
HdfsBytesRead: 0
HdfsBytesWritten: 0
Input(s):
Failed to read data from "hdfs://sandbox.hortonworks.com:8020/user/aravind/hbase/emp_data/emp_data.txt"
Output(s):
Failed to produce result in "hbase://emp_tab"
grunt>
hive apache-pig loading
hive apache-pig loading
edited Dec 31 '18 at 17:05
Robert
2,15962536
2,15962536
asked Dec 31 '18 at 14:00
Aravind PeddolaAravind Peddola
11
11
Pretty sure you need commas between your column names in yourstore
command.
– Andrew
Dec 31 '18 at 15:16
add a comment |
Pretty sure you need commas between your column names in yourstore
command.
– Andrew
Dec 31 '18 at 15:16
Pretty sure you need commas between your column names in your
store
command.– Andrew
Dec 31 '18 at 15:16
Pretty sure you need commas between your column names in your
store
command.– Andrew
Dec 31 '18 at 15:16
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988300%2fdata-loading-csv-file-data-into-hbase-using-pig%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53988300%2fdata-loading-csv-file-data-into-hbase-using-pig%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Pretty sure you need commas between your column names in your
store
command.– Andrew
Dec 31 '18 at 15:16