Pytorch backprop is slower compared to Tensorflow?

I’ve implemented a simple DDQN network in pytorch and tensorflow. The network is quite shallow.
While the forward pass is much faster in PyTorch compared to TF, the back-propagation step is much slower compared to TF. Both backprop steps were done on the CPU.
Any ideas on how to improve it.

The network part is:

def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8,

             learning_rate=1e-6):

    super().__init__()



    # build hidden layers

    self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400),

                            nn.LeakyReLU(negative_slope=alpha))

    self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200),

                            nn.LeakyReLU(negative_slope=alpha))

    self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200),

                            nn.LeakyReLU(negative_slope=alpha))

    # build output layer

    self.Qval = nn.Linear(in_features=200, out_features=24)



def forward(self, observation):

    if isinstance(observation, np.ndarray):

        observation = torch.from_numpy(observation).float()

    out1 = self.l1(observation)

    out2 = self.l2(out1)

    out3 = self.l3(out2)

    qval = self.Qval(out3)

    return qval

and the backprop code can be, for example:

self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)

self.optimizer.zero_grad()



state_batch=torch.rand([64,500])

act_batch=np.randi(0,24,[64,1]

act_batch_torch=torch.as_tensor(act_batch)

label_batch = torch.rand([64,500])

Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above

loss = mse_loss(input=Q, target=label_batch.detach())

loss.backward()



self.optimizer.step()

Note that since inference is much faster using the CPU, I’m also doing backprop on the CPU. I have tried transferring the network to the GPU, and then do a backprop on the GPU, but it turned out to be slower.

Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?

edited Jan 2 at 18:57

asked Jan 1 at 12:36

Eli

3115

Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?

– kvish
Jan 2 at 10:49

Yes, you are right. However, the timing problem persists...

– Eli
Jan 2 at 18:56

unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?

– kvish
Jan 3 at 11:42

add a comment |

The network part is:

def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8,

             learning_rate=1e-6):

    super().__init__()



    # build hidden layers

    self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400),

                            nn.LeakyReLU(negative_slope=alpha))

    self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200),

                            nn.LeakyReLU(negative_slope=alpha))

    self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200),

                            nn.LeakyReLU(negative_slope=alpha))

    # build output layer

    self.Qval = nn.Linear(in_features=200, out_features=24)



def forward(self, observation):

    if isinstance(observation, np.ndarray):

        observation = torch.from_numpy(observation).float()

    out1 = self.l1(observation)

    out2 = self.l2(out1)

    out3 = self.l3(out2)

    qval = self.Qval(out3)

    return qval

and the backprop code can be, for example:

self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)

self.optimizer.zero_grad()



state_batch=torch.rand([64,500])

act_batch=np.randi(0,24,[64,1]

act_batch_torch=torch.as_tensor(act_batch)

label_batch = torch.rand([64,500])

Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above

loss = mse_loss(input=Q, target=label_batch.detach())

loss.backward()



self.optimizer.step()

Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?

edited Jan 2 at 18:57

asked Jan 1 at 12:36

Eli

3115

Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?

– kvish
Jan 2 at 10:49

Yes, you are right. However, the timing problem persists...

– Eli
Jan 2 at 18:56

unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?

– kvish
Jan 3 at 11:42

add a comment |

The network part is:

def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8,

             learning_rate=1e-6):

    super().__init__()



    # build hidden layers

    self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400),

                            nn.LeakyReLU(negative_slope=alpha))

    self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200),

                            nn.LeakyReLU(negative_slope=alpha))

    self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200),

                            nn.LeakyReLU(negative_slope=alpha))

    # build output layer

    self.Qval = nn.Linear(in_features=200, out_features=24)



def forward(self, observation):

    if isinstance(observation, np.ndarray):

        observation = torch.from_numpy(observation).float()

    out1 = self.l1(observation)

    out2 = self.l2(out1)

    out3 = self.l3(out2)

    qval = self.Qval(out3)

    return qval

and the backprop code can be, for example:

self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)

self.optimizer.zero_grad()



state_batch=torch.rand([64,500])

act_batch=np.randi(0,24,[64,1]

act_batch_torch=torch.as_tensor(act_batch)

label_batch = torch.rand([64,500])

Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above

loss = mse_loss(input=Q, target=label_batch.detach())

loss.backward()



self.optimizer.step()

Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?

edited Jan 2 at 18:57

asked Jan 1 at 12:36

Eli

3115

The network part is:

def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8,

             learning_rate=1e-6):

    super().__init__()



    # build hidden layers

    self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400),

                            nn.LeakyReLU(negative_slope=alpha))

    self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200),

                            nn.LeakyReLU(negative_slope=alpha))

    self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200),

                            nn.LeakyReLU(negative_slope=alpha))

    # build output layer

    self.Qval = nn.Linear(in_features=200, out_features=24)



def forward(self, observation):

    if isinstance(observation, np.ndarray):

        observation = torch.from_numpy(observation).float()

    out1 = self.l1(observation)

    out2 = self.l2(out1)

    out3 = self.l3(out2)

    qval = self.Qval(out3)

    return qval

and the backprop code can be, for example:

self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)

self.optimizer.zero_grad()



state_batch=torch.rand([64,500])

act_batch=np.randi(0,24,[64,1]

act_batch_torch=torch.as_tensor(act_batch)

label_batch = torch.rand([64,500])

Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above

loss = mse_loss(input=Q, target=label_batch.detach())

loss.backward()



self.optimizer.step()

Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?

python tensorflow pycharm pytorch

edited Jan 2 at 18:57

asked Jan 1 at 12:36

Eli

3115

edited Jan 2 at 18:57

asked Jan 1 at 12:36

Eli

3115

edited Jan 2 at 18:57

asked Jan 1 at 12:36

Eli

3115

asked Jan 1 at 12:36

Eli

3115

asked Jan 1 at 12:36

Eli

3115

Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?

– kvish
Jan 2 at 10:49

Yes, you are right. However, the timing problem persists...

– Eli
Jan 2 at 18:56

unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?

– kvish
Jan 3 at 11:42

add a comment |

Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?

– kvish
Jan 2 at 10:49

Yes, you are right. However, the timing problem persists...

– Eli
Jan 2 at 18:56

unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?

– kvish
Jan 3 at 11:42

Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?

– kvish
Jan 2 at 10:49

Yes, you are right. However, the timing problem persists...

– Eli
Jan 2 at 18:56

unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?

– kvish
Jan 3 at 11:42

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53995486%2fpytorch-backprop-is-slower-compared-to-tensorflow%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

DWRp46kYeaHhmY LlDmSCA,rUCTeU,cd

搜尋此網誌

Bdtjtk