The Dataflow job appears to be stuck

The question has now changed a bit.. The main problem is that my code needs oracle libraries but I cant see to use the setup file to run custom commands to set up the oracle client in the workers

from __future__ import absolute_import

from __future__ import print_function



import subprocess

from distutils.command.build import build as _build



import setuptools



# This class handles the pip install mechanism.

class build(_build):  # pylint: disable=invalid-name

    sub_commands = _build.sub_commands + [('CustomCommands', None)]





CUSTOM_COMMANDS = [

    ['sudo','apt-get', 'update'],

    ['sudo','apt-get','--assume-yes','install','unzip'],

['wget','https://storage.googleapis.com/facbeambucketv1/files/instantclient-basic-linux.x64-18.3.0.0.0dbru.zip'],

    ['sudo','unzip','-o', 'instantclient-basic-linux.x64-18.3.0.0.0dbru.zip', '-d' ,'orclbm'],

    ['sudo','apt-get','--assume-yes','install','libaio1'],

    ['sudo','apt-get','--assume-yes','install','tree']  

]





class CustomCommands(setuptools.Command):

    """A setuptools Command class able to run arbitrary commands."""



    def initialize_options(self):

        pass



    def finalize_options(self):

        pass



    def RunCustomCommand(self, command_list):

        print('Running command: %s' % command_list)

        p = subprocess.Popen(

            command_list,

            stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

        # Can use communicate(input='yn'.encode()) if the command run requires

        # some confirmation.

        stdout_data, _ = p.communicate()

        print('Command output: %s' % stdout_data)

        if p.returncode != 0:

            raise RuntimeError('Command %s failed: exit code: %s' % (command_list, p.returncode))



    def run(self):

        for command in CUSTOM_COMMANDS:

            self.RunCustomCommand(command)





# Configure the required packages and scripts to install.

# Note that the Python Dataflow containers come with numpy already installed

# so this dependency will not trigger anything to be installed unless a version

# restriction is specified.

REQUIRED_PACKAGES = ['numpy','apache_beam','apache_beam[gcp]','cx_Oracle','datetime','google-cloud-bigquery']





setuptools.setup(

    name='orclbm',

    version='0.0.1',

    description='Oraclebm workflow package.',

    install_requires=REQUIRED_PACKAGES,

    packages=setuptools.find_packages(),

    include_package_data=True,

    cmdclass={

        # Command class instantiated and run during pip install scenarios.

        'build': build,

        'CustomCommands': CustomCommands,

        }

    )

However the custom commands are not running. the required packages get installed through. I am not sure what the problem is.

edited yesterday

asked Dec 28 '18 at 7:51

Abhishek Ray

There is not enough info in your description to find the root cause for this. Your job could have been stuck for a multitude of reasons. For example, it could be due to a strain in your resources in general or in one worker in particular. Seeing though as you didn't encounter the problem running locally, this could happen only if you changed the amount and/or key distribution of your input data. Could this be the case?
– Lefteris S
Dec 28 '18 at 13:44

Even more: it would be helpful if you edited your question to add any additional warning/error messages (don't rely just on the ones output in job details, go to stackdriver logging and filter by severity>=WARNING) as well as your execution graph. Particularly check for stuck steps (wall time is useful for this as well an indication that the amount of output elements does not follow that of input elements).
– Lefteris S
Dec 28 '18 at 13:51

@LefterisS actually the input data didnt change at all. This is actually the only error message I got.....Not sure about wall time. The process remains stuck for 1 hr 1 min which according to some post I read has something to do with networking and the compute engines cannot communicate
– Abhishek Ray
Dec 31 '18 at 17:34

It could be, have you set up a separate network where you run the job?
– Lefteris S
Dec 31 '18 at 18:26

I am starting the code locally with argument as dataflowrunner...i think it then uses the setup files to create workers but yes the code initates from my local machine...is that a problem?
– Abhishek Ray
Jan 1 at 7:39

|
show 2 more comments

The question has now changed a bit.. The main problem is that my code needs oracle libraries but I cant see to use the setup file to run custom commands to set up the oracle client in the workers

from __future__ import absolute_import

from __future__ import print_function



import subprocess

from distutils.command.build import build as _build



import setuptools



# This class handles the pip install mechanism.

class build(_build):  # pylint: disable=invalid-name

    sub_commands = _build.sub_commands + [('CustomCommands', None)]





CUSTOM_COMMANDS = [

    ['sudo','apt-get', 'update'],

    ['sudo','apt-get','--assume-yes','install','unzip'],

['wget','https://storage.googleapis.com/facbeambucketv1/files/instantclient-basic-linux.x64-18.3.0.0.0dbru.zip'],

    ['sudo','unzip','-o', 'instantclient-basic-linux.x64-18.3.0.0.0dbru.zip', '-d' ,'orclbm'],

    ['sudo','apt-get','--assume-yes','install','libaio1'],

    ['sudo','apt-get','--assume-yes','install','tree']  

]





class CustomCommands(setuptools.Command):

    """A setuptools Command class able to run arbitrary commands."""



    def initialize_options(self):

        pass



    def finalize_options(self):

        pass



    def RunCustomCommand(self, command_list):

        print('Running command: %s' % command_list)

        p = subprocess.Popen(

            command_list,

            stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

        # Can use communicate(input='yn'.encode()) if the command run requires

        # some confirmation.

        stdout_data, _ = p.communicate()

        print('Command output: %s' % stdout_data)

        if p.returncode != 0:

            raise RuntimeError('Command %s failed: exit code: %s' % (command_list, p.returncode))



    def run(self):

        for command in CUSTOM_COMMANDS:

            self.RunCustomCommand(command)





# Configure the required packages and scripts to install.

# Note that the Python Dataflow containers come with numpy already installed

# so this dependency will not trigger anything to be installed unless a version

# restriction is specified.

REQUIRED_PACKAGES = ['numpy','apache_beam','apache_beam[gcp]','cx_Oracle','datetime','google-cloud-bigquery']





setuptools.setup(

    name='orclbm',

    version='0.0.1',

    description='Oraclebm workflow package.',

    install_requires=REQUIRED_PACKAGES,

    packages=setuptools.find_packages(),

    include_package_data=True,

    cmdclass={

        # Command class instantiated and run during pip install scenarios.

        'build': build,

        'CustomCommands': CustomCommands,

        }

    )

However the custom commands are not running. the required packages get installed through. I am not sure what the problem is.

edited yesterday

asked Dec 28 '18 at 7:51

Abhishek Ray

There is not enough info in your description to find the root cause for this. Your job could have been stuck for a multitude of reasons. For example, it could be due to a strain in your resources in general or in one worker in particular. Seeing though as you didn't encounter the problem running locally, this could happen only if you changed the amount and/or key distribution of your input data. Could this be the case?
– Lefteris S
Dec 28 '18 at 13:44

Even more: it would be helpful if you edited your question to add any additional warning/error messages (don't rely just on the ones output in job details, go to stackdriver logging and filter by severity>=WARNING) as well as your execution graph. Particularly check for stuck steps (wall time is useful for this as well an indication that the amount of output elements does not follow that of input elements).
– Lefteris S
Dec 28 '18 at 13:51

@LefterisS actually the input data didnt change at all. This is actually the only error message I got.....Not sure about wall time. The process remains stuck for 1 hr 1 min which according to some post I read has something to do with networking and the compute engines cannot communicate
– Abhishek Ray
Dec 31 '18 at 17:34

It could be, have you set up a separate network where you run the job?
– Lefteris S
Dec 31 '18 at 18:26

I am starting the code locally with argument as dataflowrunner...i think it then uses the setup files to create workers but yes the code initates from my local machine...is that a problem?
– Abhishek Ray
Jan 1 at 7:39

|
show 2 more comments

The question has now changed a bit.. The main problem is that my code needs oracle libraries but I cant see to use the setup file to run custom commands to set up the oracle client in the workers

from __future__ import absolute_import

from __future__ import print_function



import subprocess

from distutils.command.build import build as _build



import setuptools



# This class handles the pip install mechanism.

class build(_build):  # pylint: disable=invalid-name

    sub_commands = _build.sub_commands + [('CustomCommands', None)]





CUSTOM_COMMANDS = [

    ['sudo','apt-get', 'update'],

    ['sudo','apt-get','--assume-yes','install','unzip'],

['wget','https://storage.googleapis.com/facbeambucketv1/files/instantclient-basic-linux.x64-18.3.0.0.0dbru.zip'],

    ['sudo','unzip','-o', 'instantclient-basic-linux.x64-18.3.0.0.0dbru.zip', '-d' ,'orclbm'],

    ['sudo','apt-get','--assume-yes','install','libaio1'],

    ['sudo','apt-get','--assume-yes','install','tree']  

]





class CustomCommands(setuptools.Command):

    """A setuptools Command class able to run arbitrary commands."""



    def initialize_options(self):

        pass



    def finalize_options(self):

        pass



    def RunCustomCommand(self, command_list):

        print('Running command: %s' % command_list)

        p = subprocess.Popen(

            command_list,

            stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

        # Can use communicate(input='yn'.encode()) if the command run requires

        # some confirmation.

        stdout_data, _ = p.communicate()

        print('Command output: %s' % stdout_data)

        if p.returncode != 0:

            raise RuntimeError('Command %s failed: exit code: %s' % (command_list, p.returncode))



    def run(self):

        for command in CUSTOM_COMMANDS:

            self.RunCustomCommand(command)





# Configure the required packages and scripts to install.

# Note that the Python Dataflow containers come with numpy already installed

# so this dependency will not trigger anything to be installed unless a version

# restriction is specified.

REQUIRED_PACKAGES = ['numpy','apache_beam','apache_beam[gcp]','cx_Oracle','datetime','google-cloud-bigquery']





setuptools.setup(

    name='orclbm',

    version='0.0.1',

    description='Oraclebm workflow package.',

    install_requires=REQUIRED_PACKAGES,

    packages=setuptools.find_packages(),

    include_package_data=True,

    cmdclass={

        # Command class instantiated and run during pip install scenarios.

        'build': build,

        'CustomCommands': CustomCommands,

        }

    )

However the custom commands are not running. the required packages get installed through. I am not sure what the problem is.

edited yesterday

asked Dec 28 '18 at 7:51

Abhishek Ray

The question has now changed a bit.. The main problem is that my code needs oracle libraries but I cant see to use the setup file to run custom commands to set up the oracle client in the workers

from __future__ import absolute_import

from __future__ import print_function



import subprocess

from distutils.command.build import build as _build



import setuptools



# This class handles the pip install mechanism.

class build(_build):  # pylint: disable=invalid-name

    sub_commands = _build.sub_commands + [('CustomCommands', None)]





CUSTOM_COMMANDS = [

    ['sudo','apt-get', 'update'],

    ['sudo','apt-get','--assume-yes','install','unzip'],

['wget','https://storage.googleapis.com/facbeambucketv1/files/instantclient-basic-linux.x64-18.3.0.0.0dbru.zip'],

    ['sudo','unzip','-o', 'instantclient-basic-linux.x64-18.3.0.0.0dbru.zip', '-d' ,'orclbm'],

    ['sudo','apt-get','--assume-yes','install','libaio1'],

    ['sudo','apt-get','--assume-yes','install','tree']  

]





class CustomCommands(setuptools.Command):

    """A setuptools Command class able to run arbitrary commands."""



    def initialize_options(self):

        pass



    def finalize_options(self):

        pass



    def RunCustomCommand(self, command_list):

        print('Running command: %s' % command_list)

        p = subprocess.Popen(

            command_list,

            stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

        # Can use communicate(input='yn'.encode()) if the command run requires

        # some confirmation.

        stdout_data, _ = p.communicate()

        print('Command output: %s' % stdout_data)

        if p.returncode != 0:

            raise RuntimeError('Command %s failed: exit code: %s' % (command_list, p.returncode))



    def run(self):

        for command in CUSTOM_COMMANDS:

            self.RunCustomCommand(command)





# Configure the required packages and scripts to install.

# Note that the Python Dataflow containers come with numpy already installed

# so this dependency will not trigger anything to be installed unless a version

# restriction is specified.

REQUIRED_PACKAGES = ['numpy','apache_beam','apache_beam[gcp]','cx_Oracle','datetime','google-cloud-bigquery']





setuptools.setup(

    name='orclbm',

    version='0.0.1',

    description='Oraclebm workflow package.',

    install_requires=REQUIRED_PACKAGES,

    packages=setuptools.find_packages(),

    include_package_data=True,

    cmdclass={

        # Command class instantiated and run during pip install scenarios.

        'build': build,

        'CustomCommands': CustomCommands,

        }

    )

However the custom commands are not running. the required packages get installed through. I am not sure what the problem is.

python google-cloud-dataflow apache-beam

edited yesterday

asked Dec 28 '18 at 7:51

Abhishek Ray

edited yesterday

asked Dec 28 '18 at 7:51

Abhishek Ray

edited yesterday

asked Dec 28 '18 at 7:51

Abhishek Ray

asked Dec 28 '18 at 7:51

Abhishek Ray

asked Dec 28 '18 at 7:51

Abhishek Ray

There is not enough info in your description to find the root cause for this. Your job could have been stuck for a multitude of reasons. For example, it could be due to a strain in your resources in general or in one worker in particular. Seeing though as you didn't encounter the problem running locally, this could happen only if you changed the amount and/or key distribution of your input data. Could this be the case?
– Lefteris S
Dec 28 '18 at 13:44

Even more: it would be helpful if you edited your question to add any additional warning/error messages (don't rely just on the ones output in job details, go to stackdriver logging and filter by severity>=WARNING) as well as your execution graph. Particularly check for stuck steps (wall time is useful for this as well an indication that the amount of output elements does not follow that of input elements).
– Lefteris S
Dec 28 '18 at 13:51

@LefterisS actually the input data didnt change at all. This is actually the only error message I got.....Not sure about wall time. The process remains stuck for 1 hr 1 min which according to some post I read has something to do with networking and the compute engines cannot communicate
– Abhishek Ray
Dec 31 '18 at 17:34

It could be, have you set up a separate network where you run the job?
– Lefteris S
Dec 31 '18 at 18:26

I am starting the code locally with argument as dataflowrunner...i think it then uses the setup files to create workers but yes the code initates from my local machine...is that a problem?
– Abhishek Ray
Jan 1 at 7:39

|
show 2 more comments

There is not enough info in your description to find the root cause for this. Your job could have been stuck for a multitude of reasons. For example, it could be due to a strain in your resources in general or in one worker in particular. Seeing though as you didn't encounter the problem running locally, this could happen only if you changed the amount and/or key distribution of your input data. Could this be the case?
– Lefteris S
Dec 28 '18 at 13:44

Even more: it would be helpful if you edited your question to add any additional warning/error messages (don't rely just on the ones output in job details, go to stackdriver logging and filter by severity>=WARNING) as well as your execution graph. Particularly check for stuck steps (wall time is useful for this as well an indication that the amount of output elements does not follow that of input elements).
– Lefteris S
Dec 28 '18 at 13:51

@LefterisS actually the input data didnt change at all. This is actually the only error message I got.....Not sure about wall time. The process remains stuck for 1 hr 1 min which according to some post I read has something to do with networking and the compute engines cannot communicate
– Abhishek Ray
Dec 31 '18 at 17:34

It could be, have you set up a separate network where you run the job?
– Lefteris S
Dec 31 '18 at 18:26

I am starting the code locally with argument as dataflowrunner...i think it then uses the setup files to create workers but yes the code initates from my local machine...is that a problem?
– Abhishek Ray
Jan 1 at 7:39

There is not enough info in your description to find the root cause for this. Your job could have been stuck for a multitude of reasons. For example, it could be due to a strain in your resources in general or in one worker in particular. Seeing though as you didn't encounter the problem running locally, this could happen only if you changed the amount and/or key distribution of your input data. Could this be the case?
– Lefteris S
Dec 28 '18 at 13:44

Even more: it would be helpful if you edited your question to add any additional warning/error messages (don't rely just on the ones output in job details, go to stackdriver logging and filter by severity>=WARNING) as well as your execution graph. Particularly check for stuck steps (wall time is useful for this as well an indication that the amount of output elements does not follow that of input elements).
– Lefteris S
Dec 28 '18 at 13:51

@LefterisS actually the input data didnt change at all. This is actually the only error message I got.....Not sure about wall time. The process remains stuck for 1 hr 1 min which according to some post I read has something to do with networking and the compute engines cannot communicate
– Abhishek Ray
Dec 31 '18 at 17:34

It could be, have you set up a separate network where you run the job?
– Lefteris S
Dec 31 '18 at 18:26

I am starting the code locally with argument as dataflowrunner...i think it then uses the setup files to create workers but yes the code initates from my local machine...is that a problem?
– Abhishek Ray
Jan 1 at 7:39

|
show 2 more comments

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53955291%2fthe-dataflow-job-appears-to-be-stuck%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtjtk