Pytorch backprop is slower compared to Tensorflow?
I’ve implemented a simple DDQN network in pytorch and tensorflow. The network is quite shallow.
While the forward pass is much faster in PyTorch compared to TF, the back-propagation step is much slower compared to TF. Both backprop steps were done on the CPU.
Any ideas on how to improve it.
The network part is:
def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8,
learning_rate=1e-6):
super().__init__()
# build hidden layers
self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400),
nn.LeakyReLU(negative_slope=alpha))
self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
# build output layer
self.Qval = nn.Linear(in_features=200, out_features=24)
def forward(self, observation):
if isinstance(observation, np.ndarray):
observation = torch.from_numpy(observation).float()
out1 = self.l1(observation)
out2 = self.l2(out1)
out3 = self.l3(out2)
qval = self.Qval(out3)
return qval
and the backprop code can be, for example:
self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)
self.optimizer.zero_grad()
state_batch=torch.rand([64,500])
act_batch=np.randi(0,24,[64,1]
act_batch_torch=torch.as_tensor(act_batch)
label_batch = torch.rand([64,500])
Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above
loss = mse_loss(input=Q, target=label_batch.detach())
loss.backward()
self.optimizer.step()
Note that since inference is much faster using the CPU, I’m also doing backprop on the CPU. I have tried transferring the network to the GPU, and then do a backprop on the GPU, but it turned out to be slower.
Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?
python tensorflow pycharm pytorch
add a comment |
I’ve implemented a simple DDQN network in pytorch and tensorflow. The network is quite shallow.
While the forward pass is much faster in PyTorch compared to TF, the back-propagation step is much slower compared to TF. Both backprop steps were done on the CPU.
Any ideas on how to improve it.
The network part is:
def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8,
learning_rate=1e-6):
super().__init__()
# build hidden layers
self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400),
nn.LeakyReLU(negative_slope=alpha))
self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
# build output layer
self.Qval = nn.Linear(in_features=200, out_features=24)
def forward(self, observation):
if isinstance(observation, np.ndarray):
observation = torch.from_numpy(observation).float()
out1 = self.l1(observation)
out2 = self.l2(out1)
out3 = self.l3(out2)
qval = self.Qval(out3)
return qval
and the backprop code can be, for example:
self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)
self.optimizer.zero_grad()
state_batch=torch.rand([64,500])
act_batch=np.randi(0,24,[64,1]
act_batch_torch=torch.as_tensor(act_batch)
label_batch = torch.rand([64,500])
Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above
loss = mse_loss(input=Q, target=label_batch.detach())
loss.backward()
self.optimizer.step()
Note that since inference is much faster using the CPU, I’m also doing backprop on the CPU. I have tried transferring the network to the GPU, and then do a backprop on the GPU, but it turned out to be slower.
Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?
python tensorflow pycharm pytorch
Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?
– kvish
Jan 2 at 10:49
Yes, you are right. However, the timing problem persists...
– Eli
Jan 2 at 18:56
unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?
– kvish
Jan 3 at 11:42
add a comment |
I’ve implemented a simple DDQN network in pytorch and tensorflow. The network is quite shallow.
While the forward pass is much faster in PyTorch compared to TF, the back-propagation step is much slower compared to TF. Both backprop steps were done on the CPU.
Any ideas on how to improve it.
The network part is:
def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8,
learning_rate=1e-6):
super().__init__()
# build hidden layers
self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400),
nn.LeakyReLU(negative_slope=alpha))
self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
# build output layer
self.Qval = nn.Linear(in_features=200, out_features=24)
def forward(self, observation):
if isinstance(observation, np.ndarray):
observation = torch.from_numpy(observation).float()
out1 = self.l1(observation)
out2 = self.l2(out1)
out3 = self.l3(out2)
qval = self.Qval(out3)
return qval
and the backprop code can be, for example:
self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)
self.optimizer.zero_grad()
state_batch=torch.rand([64,500])
act_batch=np.randi(0,24,[64,1]
act_batch_torch=torch.as_tensor(act_batch)
label_batch = torch.rand([64,500])
Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above
loss = mse_loss(input=Q, target=label_batch.detach())
loss.backward()
self.optimizer.step()
Note that since inference is much faster using the CPU, I’m also doing backprop on the CPU. I have tried transferring the network to the GPU, and then do a backprop on the GPU, but it turned out to be slower.
Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?
python tensorflow pycharm pytorch
I’ve implemented a simple DDQN network in pytorch and tensorflow. The network is quite shallow.
While the forward pass is much faster in PyTorch compared to TF, the back-propagation step is much slower compared to TF. Both backprop steps were done on the CPU.
Any ideas on how to improve it.
The network part is:
def __init__(self, hidden_size_IP=100, hidden_size_rest=100, alpha=0.01, state_size=27, action_size=8,
learning_rate=1e-6):
super().__init__()
# build hidden layers
self.l1 = nn.Sequential(nn.Linear(in_features=500, out_features=400),
nn.LeakyReLU(negative_slope=alpha))
self.l2 = nn.Sequential(nn.Linear(in_features=400, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
self.l3 = nn.Sequential(nn.Linear(in_features=200, out_features=200),
nn.LeakyReLU(negative_slope=alpha))
# build output layer
self.Qval = nn.Linear(in_features=200, out_features=24)
def forward(self, observation):
if isinstance(observation, np.ndarray):
observation = torch.from_numpy(observation).float()
out1 = self.l1(observation)
out2 = self.l2(out1)
out3 = self.l3(out2)
qval = self.Qval(out3)
return qval
and the backprop code can be, for example:
self.optimizer = optim.Adam(self.q_net.parameters(), lr=1e-4)
self.optimizer.zero_grad()
state_batch=torch.rand([64,500])
act_batch=np.randi(0,24,[64,1]
act_batch_torch=torch.as_tensor(act_batch)
label_batch = torch.rand([64,500])
Q=self.q_net.forward(state_batch).gather(1, act_batch_torch) # q_net is an instance of the network above
loss = mse_loss(input=Q, target=label_batch.detach())
loss.backward()
self.optimizer.step()
Note that since inference is much faster using the CPU, I’m also doing backprop on the CPU. I have tried transferring the network to the GPU, and then do a backprop on the GPU, but it turned out to be slower.
Any ideas why pyTorch is slower? How can I improve the speed for this type of shallow network?
python tensorflow pycharm pytorch
python tensorflow pycharm pytorch
edited Jan 2 at 18:57
Eli
asked Jan 1 at 12:36
EliEli
3115
3115
Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?
– kvish
Jan 2 at 10:49
Yes, you are right. However, the timing problem persists...
– Eli
Jan 2 at 18:56
unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?
– kvish
Jan 3 at 11:42
add a comment |
Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?
– kvish
Jan 2 at 10:49
Yes, you are right. However, the timing problem persists...
– Eli
Jan 2 at 18:56
unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?
– kvish
Jan 3 at 11:42
Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?
– kvish
Jan 2 at 10:49
Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?
– kvish
Jan 2 at 10:49
Yes, you are right. However, the timing problem persists...
– Eli
Jan 2 at 18:56
Yes, you are right. However, the timing problem persists...
– Eli
Jan 2 at 18:56
unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?
– kvish
Jan 3 at 11:42
unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?
– kvish
Jan 3 at 11:42
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53995486%2fpytorch-backprop-is-slower-compared-to-tensorflow%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53995486%2fpytorch-backprop-is-slower-compared-to-tensorflow%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Is this the right code? Because you are doing optimizer.zero_grad right after loss.backward. Shouldn't you be doing this before computing the backprop?
– kvish
Jan 2 at 10:49
Yes, you are right. However, the timing problem persists...
– Eli
Jan 2 at 18:56
unfortunately I have limited experience with both pytorch and Reinforcement learning. Maybe putting the entire model in to one Sequential network can help speed up backprop? Did you construct it the same way in Tensorflow?
– kvish
Jan 3 at 11:42