Pytorch backprop is slower compared to Tensorflow?












0















I’ve implemented a simple DDQN network in pytorch and tensorflow. The network is quite shallow.
While the forward pass is much faster in PyTorch compared to TF, the back-propagation step is much slower compared to TF. Both backprop steps were done on the CPU.
Any ideas on how to improve it.



The network part is:



def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8,
learning_rate=1e-6):
super().__init__()

# build hidden layers
self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400),
nn.LeakyReLU(negative_slope=alpha))
self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
# build output layer
self.Qval = nn.Linear(in_features=200, out_features=24)

def forward(self, observation):
if isinstance(observation, np.ndarray):
observation = torch.from_numpy(observation).float()
out1 = self.l1(observation)
out2 = self.l2(out1)
out3 = self.l3(out2)
qval = self.Qval(out3)
return qval


and the backprop code can be, for example:



self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)
self.optimizer.zero_grad()

state_batch=torch.rand([64,500])
act_batch=np.randi(0,24,[64,1]
act_batch_torch=torch.as_tensor(act_batch)
label_batch = torch.rand([64,500])
Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above
loss = mse_loss(input=Q, target=label_batch.detach())
loss.backward()

self.optimizer.step()


Note that since inference is much faster using the CPU, I’m also doing backprop on the CPU. I have tried transferring the network to the GPU, and then do a backprop on the GPU, but it turned out to be slower.



Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?










share|improve this question

























  • Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?

    – kvish
    Jan 2 at 10:49











  • Yes, you are right. However, the timing problem persists...

    – Eli
    Jan 2 at 18:56











  • unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?

    – kvish
    Jan 3 at 11:42
















0















I’ve implemented a simple DDQN network in pytorch and tensorflow. The network is quite shallow.
While the forward pass is much faster in PyTorch compared to TF, the back-propagation step is much slower compared to TF. Both backprop steps were done on the CPU.
Any ideas on how to improve it.



The network part is:



def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8,
learning_rate=1e-6):
super().__init__()

# build hidden layers
self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400),
nn.LeakyReLU(negative_slope=alpha))
self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
# build output layer
self.Qval = nn.Linear(in_features=200, out_features=24)

def forward(self, observation):
if isinstance(observation, np.ndarray):
observation = torch.from_numpy(observation).float()
out1 = self.l1(observation)
out2 = self.l2(out1)
out3 = self.l3(out2)
qval = self.Qval(out3)
return qval


and the backprop code can be, for example:



self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)
self.optimizer.zero_grad()

state_batch=torch.rand([64,500])
act_batch=np.randi(0,24,[64,1]
act_batch_torch=torch.as_tensor(act_batch)
label_batch = torch.rand([64,500])
Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above
loss = mse_loss(input=Q, target=label_batch.detach())
loss.backward()

self.optimizer.step()


Note that since inference is much faster using the CPU, I’m also doing backprop on the CPU. I have tried transferring the network to the GPU, and then do a backprop on the GPU, but it turned out to be slower.



Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?










share|improve this question

























  • Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?

    – kvish
    Jan 2 at 10:49











  • Yes, you are right. However, the timing problem persists...

    – Eli
    Jan 2 at 18:56











  • unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?

    – kvish
    Jan 3 at 11:42














0












0








0








I’ve implemented a simple DDQN network in pytorch and tensorflow. The network is quite shallow.
While the forward pass is much faster in PyTorch compared to TF, the back-propagation step is much slower compared to TF. Both backprop steps were done on the CPU.
Any ideas on how to improve it.



The network part is:



def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8,
learning_rate=1e-6):
super().__init__()

# build hidden layers
self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400),
nn.LeakyReLU(negative_slope=alpha))
self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
# build output layer
self.Qval = nn.Linear(in_features=200, out_features=24)

def forward(self, observation):
if isinstance(observation, np.ndarray):
observation = torch.from_numpy(observation).float()
out1 = self.l1(observation)
out2 = self.l2(out1)
out3 = self.l3(out2)
qval = self.Qval(out3)
return qval


and the backprop code can be, for example:



self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)
self.optimizer.zero_grad()

state_batch=torch.rand([64,500])
act_batch=np.randi(0,24,[64,1]
act_batch_torch=torch.as_tensor(act_batch)
label_batch = torch.rand([64,500])
Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above
loss = mse_loss(input=Q, target=label_batch.detach())
loss.backward()

self.optimizer.step()


Note that since inference is much faster using the CPU, I’m also doing backprop on the CPU. I have tried transferring the network to the GPU, and then do a backprop on the GPU, but it turned out to be slower.



Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?










share|improve this question
















I’ve implemented a simple DDQN network in pytorch and tensorflow. The network is quite shallow.
While the forward pass is much faster in PyTorch compared to TF, the back-propagation step is much slower compared to TF. Both backprop steps were done on the CPU.
Any ideas on how to improve it.



The network part is:



def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8,
learning_rate=1e-6):
super().__init__()

# build hidden layers
self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400),
nn.LeakyReLU(negative_slope=alpha))
self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
# build output layer
self.Qval = nn.Linear(in_features=200, out_features=24)

def forward(self, observation):
if isinstance(observation, np.ndarray):
observation = torch.from_numpy(observation).float()
out1 = self.l1(observation)
out2 = self.l2(out1)
out3 = self.l3(out2)
qval = self.Qval(out3)
return qval


and the backprop code can be, for example:



self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)
self.optimizer.zero_grad()

state_batch=torch.rand([64,500])
act_batch=np.randi(0,24,[64,1]
act_batch_torch=torch.as_tensor(act_batch)
label_batch = torch.rand([64,500])
Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above
loss = mse_loss(input=Q, target=label_batch.detach())
loss.backward()

self.optimizer.step()


Note that since inference is much faster using the CPU, I’m also doing backprop on the CPU. I have tried transferring the network to the GPU, and then do a backprop on the GPU, but it turned out to be slower.



Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?







python tensorflow pycharm pytorch






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 2 at 18:57







Eli

















asked Jan 1 at 12:36









EliEli

3115




3115













  • Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?

    – kvish
    Jan 2 at 10:49











  • Yes, you are right. However, the timing problem persists...

    – Eli
    Jan 2 at 18:56











  • unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?

    – kvish
    Jan 3 at 11:42



















  • Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?

    – kvish
    Jan 2 at 10:49











  • Yes, you are right. However, the timing problem persists...

    – Eli
    Jan 2 at 18:56











  • unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?

    – kvish
    Jan 3 at 11:42

















Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?

– kvish
Jan 2 at 10:49





Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?

– kvish
Jan 2 at 10:49













Yes, you are right. However, the timing problem persists...

– Eli
Jan 2 at 18:56





Yes, you are right. However, the timing problem persists...

– Eli
Jan 2 at 18:56













unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?

– kvish
Jan 3 at 11:42





unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?

– kvish
Jan 3 at 11:42












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53995486%2fpytorch-backprop-is-slower-compared-to-tensorflow%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53995486%2fpytorch-backprop-is-slower-compared-to-tensorflow%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Monofisismo

Angular Downloading a file using contenturl with Basic Authentication

Olmecas