The Dataflow job appears to be stuck












1














The question has now changed a bit.. The main problem is that my code needs oracle libraries but I cant see to use the setup file to run custom commands to set up the oracle client in the workers



from __future__ import absolute_import
from __future__ import print_function

import subprocess
from distutils.command.build import build as _build

import setuptools

# This class handles the pip install mechanism.
class build(_build): # pylint: disable=invalid-name
sub_commands = _build.sub_commands + [('CustomCommands', None)]


CUSTOM_COMMANDS = [
['sudo','apt-get', 'update'],
['sudo','apt-get','--assume-yes','install','unzip'],
['wget','https://storage.googleapis.com/facbeambucketv1/files/instantclient-basic-linux.x64-18.3.0.0.0dbru.zip'],
['sudo','unzip','-o', 'instantclient-basic-linux.x64-18.3.0.0.0dbru.zip', '-d' ,'orclbm'],
['sudo','apt-get','--assume-yes','install','libaio1'],
['sudo','apt-get','--assume-yes','install','tree']
]


class CustomCommands(setuptools.Command):
"""A setuptools Command class able to run arbitrary commands."""

def initialize_options(self):
pass

def finalize_options(self):
pass

def RunCustomCommand(self, command_list):
print('Running command: %s' % command_list)
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Can use communicate(input='yn'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
print('Command output: %s' % stdout_data)
if p.returncode != 0:
raise RuntimeError('Command %s failed: exit code: %s' % (command_list, p.returncode))

def run(self):
for command in CUSTOM_COMMANDS:
self.RunCustomCommand(command)


# Configure the required packages and scripts to install.
# Note that the Python Dataflow containers come with numpy already installed
# so this dependency will not trigger anything to be installed unless a version
# restriction is specified.
REQUIRED_PACKAGES = ['numpy','apache_beam','apache_beam[gcp]','cx_Oracle','datetime','google-cloud-bigquery']


setuptools.setup(
name='orclbm',
version='0.0.1',
description='Oraclebm workflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
include_package_data=True,
cmdclass={
# Command class instantiated and run during pip install scenarios.
'build': build,
'CustomCommands': CustomCommands,
}
)


However the custom commands are not running. the required packages get installed through. I am not sure what the problem is.










share|improve this question
























  • There is not enough info in your description to find the root cause for this. Your job could have been stuck for a multitude of reasons. For example, it could be due to a strain in your resources in general or in one worker in particular. Seeing though as you didn't encounter the problem running locally, this could happen only if you changed the amount and/or key distribution of your input data. Could this be the case?
    – Lefteris S
    Dec 28 '18 at 13:44












  • Even more: it would be helpful if you edited your question to add any additional warning/error messages (don't rely just on the ones output in job details, go to stackdriver logging and filter by severity>=WARNING) as well as your execution graph. Particularly check for stuck steps (wall time is useful for this as well an indication that the amount of output elements does not follow that of input elements).
    – Lefteris S
    Dec 28 '18 at 13:51










  • @LefterisS actually the input data didnt change at all. This is actually the only error message I got.....Not sure about wall time. The process remains stuck for 1 hr 1 min which according to some post I read has something to do with networking and the compute engines cannot communicate
    – Abhishek Ray
    Dec 31 '18 at 17:34










  • It could be, have you set up a separate network where you run the job?
    – Lefteris S
    Dec 31 '18 at 18:26










  • I am starting the code locally with argument as dataflowrunner...i think it then uses the setup files to create workers but yes the code initates from my local machine...is that a problem?
    – Abhishek Ray
    Jan 1 at 7:39
















1














The question has now changed a bit.. The main problem is that my code needs oracle libraries but I cant see to use the setup file to run custom commands to set up the oracle client in the workers



from __future__ import absolute_import
from __future__ import print_function

import subprocess
from distutils.command.build import build as _build

import setuptools

# This class handles the pip install mechanism.
class build(_build): # pylint: disable=invalid-name
sub_commands = _build.sub_commands + [('CustomCommands', None)]


CUSTOM_COMMANDS = [
['sudo','apt-get', 'update'],
['sudo','apt-get','--assume-yes','install','unzip'],
['wget','https://storage.googleapis.com/facbeambucketv1/files/instantclient-basic-linux.x64-18.3.0.0.0dbru.zip'],
['sudo','unzip','-o', 'instantclient-basic-linux.x64-18.3.0.0.0dbru.zip', '-d' ,'orclbm'],
['sudo','apt-get','--assume-yes','install','libaio1'],
['sudo','apt-get','--assume-yes','install','tree']
]


class CustomCommands(setuptools.Command):
"""A setuptools Command class able to run arbitrary commands."""

def initialize_options(self):
pass

def finalize_options(self):
pass

def RunCustomCommand(self, command_list):
print('Running command: %s' % command_list)
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Can use communicate(input='yn'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
print('Command output: %s' % stdout_data)
if p.returncode != 0:
raise RuntimeError('Command %s failed: exit code: %s' % (command_list, p.returncode))

def run(self):
for command in CUSTOM_COMMANDS:
self.RunCustomCommand(command)


# Configure the required packages and scripts to install.
# Note that the Python Dataflow containers come with numpy already installed
# so this dependency will not trigger anything to be installed unless a version
# restriction is specified.
REQUIRED_PACKAGES = ['numpy','apache_beam','apache_beam[gcp]','cx_Oracle','datetime','google-cloud-bigquery']


setuptools.setup(
name='orclbm',
version='0.0.1',
description='Oraclebm workflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
include_package_data=True,
cmdclass={
# Command class instantiated and run during pip install scenarios.
'build': build,
'CustomCommands': CustomCommands,
}
)


However the custom commands are not running. the required packages get installed through. I am not sure what the problem is.










share|improve this question
























  • There is not enough info in your description to find the root cause for this. Your job could have been stuck for a multitude of reasons. For example, it could be due to a strain in your resources in general or in one worker in particular. Seeing though as you didn't encounter the problem running locally, this could happen only if you changed the amount and/or key distribution of your input data. Could this be the case?
    – Lefteris S
    Dec 28 '18 at 13:44












  • Even more: it would be helpful if you edited your question to add any additional warning/error messages (don't rely just on the ones output in job details, go to stackdriver logging and filter by severity>=WARNING) as well as your execution graph. Particularly check for stuck steps (wall time is useful for this as well an indication that the amount of output elements does not follow that of input elements).
    – Lefteris S
    Dec 28 '18 at 13:51










  • @LefterisS actually the input data didnt change at all. This is actually the only error message I got.....Not sure about wall time. The process remains stuck for 1 hr 1 min which according to some post I read has something to do with networking and the compute engines cannot communicate
    – Abhishek Ray
    Dec 31 '18 at 17:34










  • It could be, have you set up a separate network where you run the job?
    – Lefteris S
    Dec 31 '18 at 18:26










  • I am starting the code locally with argument as dataflowrunner...i think it then uses the setup files to create workers but yes the code initates from my local machine...is that a problem?
    – Abhishek Ray
    Jan 1 at 7:39














1












1








1







The question has now changed a bit.. The main problem is that my code needs oracle libraries but I cant see to use the setup file to run custom commands to set up the oracle client in the workers



from __future__ import absolute_import
from __future__ import print_function

import subprocess
from distutils.command.build import build as _build

import setuptools

# This class handles the pip install mechanism.
class build(_build): # pylint: disable=invalid-name
sub_commands = _build.sub_commands + [('CustomCommands', None)]


CUSTOM_COMMANDS = [
['sudo','apt-get', 'update'],
['sudo','apt-get','--assume-yes','install','unzip'],
['wget','https://storage.googleapis.com/facbeambucketv1/files/instantclient-basic-linux.x64-18.3.0.0.0dbru.zip'],
['sudo','unzip','-o', 'instantclient-basic-linux.x64-18.3.0.0.0dbru.zip', '-d' ,'orclbm'],
['sudo','apt-get','--assume-yes','install','libaio1'],
['sudo','apt-get','--assume-yes','install','tree']
]


class CustomCommands(setuptools.Command):
"""A setuptools Command class able to run arbitrary commands."""

def initialize_options(self):
pass

def finalize_options(self):
pass

def RunCustomCommand(self, command_list):
print('Running command: %s' % command_list)
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Can use communicate(input='yn'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
print('Command output: %s' % stdout_data)
if p.returncode != 0:
raise RuntimeError('Command %s failed: exit code: %s' % (command_list, p.returncode))

def run(self):
for command in CUSTOM_COMMANDS:
self.RunCustomCommand(command)


# Configure the required packages and scripts to install.
# Note that the Python Dataflow containers come with numpy already installed
# so this dependency will not trigger anything to be installed unless a version
# restriction is specified.
REQUIRED_PACKAGES = ['numpy','apache_beam','apache_beam[gcp]','cx_Oracle','datetime','google-cloud-bigquery']


setuptools.setup(
name='orclbm',
version='0.0.1',
description='Oraclebm workflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
include_package_data=True,
cmdclass={
# Command class instantiated and run during pip install scenarios.
'build': build,
'CustomCommands': CustomCommands,
}
)


However the custom commands are not running. the required packages get installed through. I am not sure what the problem is.










share|improve this question















The question has now changed a bit.. The main problem is that my code needs oracle libraries but I cant see to use the setup file to run custom commands to set up the oracle client in the workers



from __future__ import absolute_import
from __future__ import print_function

import subprocess
from distutils.command.build import build as _build

import setuptools

# This class handles the pip install mechanism.
class build(_build): # pylint: disable=invalid-name
sub_commands = _build.sub_commands + [('CustomCommands', None)]


CUSTOM_COMMANDS = [
['sudo','apt-get', 'update'],
['sudo','apt-get','--assume-yes','install','unzip'],
['wget','https://storage.googleapis.com/facbeambucketv1/files/instantclient-basic-linux.x64-18.3.0.0.0dbru.zip'],
['sudo','unzip','-o', 'instantclient-basic-linux.x64-18.3.0.0.0dbru.zip', '-d' ,'orclbm'],
['sudo','apt-get','--assume-yes','install','libaio1'],
['sudo','apt-get','--assume-yes','install','tree']
]


class CustomCommands(setuptools.Command):
"""A setuptools Command class able to run arbitrary commands."""

def initialize_options(self):
pass

def finalize_options(self):
pass

def RunCustomCommand(self, command_list):
print('Running command: %s' % command_list)
p = subprocess.Popen(
command_list,
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Can use communicate(input='yn'.encode()) if the command run requires
# some confirmation.
stdout_data, _ = p.communicate()
print('Command output: %s' % stdout_data)
if p.returncode != 0:
raise RuntimeError('Command %s failed: exit code: %s' % (command_list, p.returncode))

def run(self):
for command in CUSTOM_COMMANDS:
self.RunCustomCommand(command)


# Configure the required packages and scripts to install.
# Note that the Python Dataflow containers come with numpy already installed
# so this dependency will not trigger anything to be installed unless a version
# restriction is specified.
REQUIRED_PACKAGES = ['numpy','apache_beam','apache_beam[gcp]','cx_Oracle','datetime','google-cloud-bigquery']


setuptools.setup(
name='orclbm',
version='0.0.1',
description='Oraclebm workflow package.',
install_requires=REQUIRED_PACKAGES,
packages=setuptools.find_packages(),
include_package_data=True,
cmdclass={
# Command class instantiated and run during pip install scenarios.
'build': build,
'CustomCommands': CustomCommands,
}
)


However the custom commands are not running. the required packages get installed through. I am not sure what the problem is.







python google-cloud-dataflow apache-beam






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday







Abhishek Ray

















asked Dec 28 '18 at 7:51









Abhishek RayAbhishek Ray

65




65












  • There is not enough info in your description to find the root cause for this. Your job could have been stuck for a multitude of reasons. For example, it could be due to a strain in your resources in general or in one worker in particular. Seeing though as you didn't encounter the problem running locally, this could happen only if you changed the amount and/or key distribution of your input data. Could this be the case?
    – Lefteris S
    Dec 28 '18 at 13:44












  • Even more: it would be helpful if you edited your question to add any additional warning/error messages (don't rely just on the ones output in job details, go to stackdriver logging and filter by severity>=WARNING) as well as your execution graph. Particularly check for stuck steps (wall time is useful for this as well an indication that the amount of output elements does not follow that of input elements).
    – Lefteris S
    Dec 28 '18 at 13:51










  • @LefterisS actually the input data didnt change at all. This is actually the only error message I got.....Not sure about wall time. The process remains stuck for 1 hr 1 min which according to some post I read has something to do with networking and the compute engines cannot communicate
    – Abhishek Ray
    Dec 31 '18 at 17:34










  • It could be, have you set up a separate network where you run the job?
    – Lefteris S
    Dec 31 '18 at 18:26










  • I am starting the code locally with argument as dataflowrunner...i think it then uses the setup files to create workers but yes the code initates from my local machine...is that a problem?
    – Abhishek Ray
    Jan 1 at 7:39


















  • There is not enough info in your description to find the root cause for this. Your job could have been stuck for a multitude of reasons. For example, it could be due to a strain in your resources in general or in one worker in particular. Seeing though as you didn't encounter the problem running locally, this could happen only if you changed the amount and/or key distribution of your input data. Could this be the case?
    – Lefteris S
    Dec 28 '18 at 13:44












  • Even more: it would be helpful if you edited your question to add any additional warning/error messages (don't rely just on the ones output in job details, go to stackdriver logging and filter by severity>=WARNING) as well as your execution graph. Particularly check for stuck steps (wall time is useful for this as well an indication that the amount of output elements does not follow that of input elements).
    – Lefteris S
    Dec 28 '18 at 13:51










  • @LefterisS actually the input data didnt change at all. This is actually the only error message I got.....Not sure about wall time. The process remains stuck for 1 hr 1 min which according to some post I read has something to do with networking and the compute engines cannot communicate
    – Abhishek Ray
    Dec 31 '18 at 17:34










  • It could be, have you set up a separate network where you run the job?
    – Lefteris S
    Dec 31 '18 at 18:26










  • I am starting the code locally with argument as dataflowrunner...i think it then uses the setup files to create workers but yes the code initates from my local machine...is that a problem?
    – Abhishek Ray
    Jan 1 at 7:39
















There is not enough info in your description to find the root cause for this. Your job could have been stuck for a multitude of reasons. For example, it could be due to a strain in your resources in general or in one worker in particular. Seeing though as you didn't encounter the problem running locally, this could happen only if you changed the amount and/or key distribution of your input data. Could this be the case?
– Lefteris S
Dec 28 '18 at 13:44






There is not enough info in your description to find the root cause for this. Your job could have been stuck for a multitude of reasons. For example, it could be due to a strain in your resources in general or in one worker in particular. Seeing though as you didn't encounter the problem running locally, this could happen only if you changed the amount and/or key distribution of your input data. Could this be the case?
– Lefteris S
Dec 28 '18 at 13:44














Even more: it would be helpful if you edited your question to add any additional warning/error messages (don't rely just on the ones output in job details, go to stackdriver logging and filter by severity>=WARNING) as well as your execution graph. Particularly check for stuck steps (wall time is useful for this as well an indication that the amount of output elements does not follow that of input elements).
– Lefteris S
Dec 28 '18 at 13:51




Even more: it would be helpful if you edited your question to add any additional warning/error messages (don't rely just on the ones output in job details, go to stackdriver logging and filter by severity>=WARNING) as well as your execution graph. Particularly check for stuck steps (wall time is useful for this as well an indication that the amount of output elements does not follow that of input elements).
– Lefteris S
Dec 28 '18 at 13:51












@LefterisS actually the input data didnt change at all. This is actually the only error message I got.....Not sure about wall time. The process remains stuck for 1 hr 1 min which according to some post I read has something to do with networking and the compute engines cannot communicate
– Abhishek Ray
Dec 31 '18 at 17:34




@LefterisS actually the input data didnt change at all. This is actually the only error message I got.....Not sure about wall time. The process remains stuck for 1 hr 1 min which according to some post I read has something to do with networking and the compute engines cannot communicate
– Abhishek Ray
Dec 31 '18 at 17:34












It could be, have you set up a separate network where you run the job?
– Lefteris S
Dec 31 '18 at 18:26




It could be, have you set up a separate network where you run the job?
– Lefteris S
Dec 31 '18 at 18:26












I am starting the code locally with argument as dataflowrunner...i think it then uses the setup files to create workers but yes the code initates from my local machine...is that a problem?
– Abhishek Ray
Jan 1 at 7:39




I am starting the code locally with argument as dataflowrunner...i think it then uses the setup files to create workers but yes the code initates from my local machine...is that a problem?
– Abhishek Ray
Jan 1 at 7:39












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53955291%2fthe-dataflow-job-appears-to-be-stuck%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53955291%2fthe-dataflow-job-appears-to-be-stuck%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas