How to implement a Continuous Control of a quadruped robot with Deep Reinforcement Learning in Pybullet and...

Multi tool use
I have designed this robot in URDF format and its environment in pybullet. Each leg has a minimum and maximum value of movement.
What reinforcement algorithm will be best to create a walking policy in a simple environment in which a positive reward will be given if it walks in the positive X-axis direction?
The expected output from the policy is an array in the range of (-1, 1) for each joint. The input of the policy is the position of each joint, the center of mass of the body, the difference in height between the floor and the body to see if it has fallen and the movement in the x-axis.
Limitations
- left_front_joint => lower="-0.4" upper="2.5" id=0
- left_front_leg_joint => lower="-0.6" upper="0.7" id=2
- right_front_joint => lower="-2.5" upper="0.4" id=3
- right_front_leg_joint => lower="-0.6" upper="0.7" id=5
- left_back_joint => lower="-2.5" upper="0.4" id=6
- left_back_leg_joint => lower="-0.6" upper="0.7" id=8
- right_back_joint => lower="-0.4" upper="2.5" id=9
- right_back_leg_joint => lower="-0.6" upper="0.7" id=11
The code above is just a test of the environment with a manual set of movements hard-coded in the robot just to test how it could walk later. The environment is set to real time, but I assume it needs to be in a frame by frame lapse during the policy training.
A video of it can be seen in:
https://youtu.be/j9sysG-EIkQ
import pybullet as p
import time
import pybullet_data
def moveLeg( robot=None, id=0, position=0, force=1.5 ):
if(robot is None):
return;
p.setJointMotorControl2(
robot,
id,
p.POSITION_CONTROL,
targetPosition=position,
force=force,
#maxVelocity=5
)
pixelWidth = 1000
pixelHeight = 1000
camTargetPos = [0,0,0]
camDistance = 0.5
pitch = -10.0
roll=0
upAxisIndex = 2
yaw = 0
physicsClient = p.connect(p.GUI)#or p.DIRECT for non-graphical version
p.setAdditionalSearchPath(pybullet_data.getDataPath()) #optionally
p.setGravity(0,0,-10)
viewMatrix = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
planeId = p.loadURDF("plane.urdf")
cubeStartPos = [0,0,0.05]
cubeStartOrientation = p.getQuaternionFromEuler([0,0,0])
#boxId = p.loadURDF("r2d2.urdf",cubeStartPos, cubeStartOrientation)
boxId = p.loadURDF("src/spider.xml",cubeStartPos, cubeStartOrientation)
# boxId = p.loadURDF("spider_simple.urdf",cubeStartPos, cubeStartOrientation)
toggle = 1
p.setRealTimeSimulation(1)
for i in range (10000):
#p.stepSimulation()
moveLeg( robot=boxId, id=0, position= toggle * -2 ) #LEFT_FRONT
moveLeg( robot=boxId, id=2, position= toggle * -2 ) #LEFT_FRONT
moveLeg( robot=boxId, id=3, position= toggle * -2 ) #RIGHT_FRONT
moveLeg( robot=boxId, id=5, position= toggle * 2 ) #RIGHT_FRONT
moveLeg( robot=boxId, id=6, position= toggle * 2 ) #LEFT_BACK
moveLeg( robot=boxId, id=8, position= toggle * -2 ) #LEFT_BACK
moveLeg( robot=boxId, id=9, position= toggle * 2 ) #RIGHT_BACK
moveLeg( robot=boxId, id=11, position= toggle * 2 ) #RIGHT_BACK
#time.sleep(1./140.)g
#time.sleep(0.01)
time.sleep(1)
toggle = toggle * -1
#viewMatrix = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
#projectionMatrix = [1.0825318098068237, 0.0, 0.0, 0.0, 0.0, 1.732050895690918, 0.0, 0.0, 0.0, 0.0, -1.0002000331878662, -1.0, 0.0, 0.0, -0.020002000033855438, 0.0]
#img_arr = p.getCameraImage(pixelWidth, pixelHeight, viewMatrix=viewMatrix, projectionMatrix=projectionMatrix, shadow=1,lightDirection=[1,1,1])
cubePos, cubeOrn = p.getBasePositionAndOrientation(boxId)
print(cubePos,cubeOrn)
p.disconnect()
The complete code can be seen here:
https://github.com/rubencg195/WalkingSpider_OpenAI_PyBullet_ROS
tensorflow deep-learning pytorch reinforcement-learning openai-gym
add a comment |
I have designed this robot in URDF format and its environment in pybullet. Each leg has a minimum and maximum value of movement.
What reinforcement algorithm will be best to create a walking policy in a simple environment in which a positive reward will be given if it walks in the positive X-axis direction?
The expected output from the policy is an array in the range of (-1, 1) for each joint. The input of the policy is the position of each joint, the center of mass of the body, the difference in height between the floor and the body to see if it has fallen and the movement in the x-axis.
Limitations
- left_front_joint => lower="-0.4" upper="2.5" id=0
- left_front_leg_joint => lower="-0.6" upper="0.7" id=2
- right_front_joint => lower="-2.5" upper="0.4" id=3
- right_front_leg_joint => lower="-0.6" upper="0.7" id=5
- left_back_joint => lower="-2.5" upper="0.4" id=6
- left_back_leg_joint => lower="-0.6" upper="0.7" id=8
- right_back_joint => lower="-0.4" upper="2.5" id=9
- right_back_leg_joint => lower="-0.6" upper="0.7" id=11
The code above is just a test of the environment with a manual set of movements hard-coded in the robot just to test how it could walk later. The environment is set to real time, but I assume it needs to be in a frame by frame lapse during the policy training.
A video of it can be seen in:
https://youtu.be/j9sysG-EIkQ
import pybullet as p
import time
import pybullet_data
def moveLeg( robot=None, id=0, position=0, force=1.5 ):
if(robot is None):
return;
p.setJointMotorControl2(
robot,
id,
p.POSITION_CONTROL,
targetPosition=position,
force=force,
#maxVelocity=5
)
pixelWidth = 1000
pixelHeight = 1000
camTargetPos = [0,0,0]
camDistance = 0.5
pitch = -10.0
roll=0
upAxisIndex = 2
yaw = 0
physicsClient = p.connect(p.GUI)#or p.DIRECT for non-graphical version
p.setAdditionalSearchPath(pybullet_data.getDataPath()) #optionally
p.setGravity(0,0,-10)
viewMatrix = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
planeId = p.loadURDF("plane.urdf")
cubeStartPos = [0,0,0.05]
cubeStartOrientation = p.getQuaternionFromEuler([0,0,0])
#boxId = p.loadURDF("r2d2.urdf",cubeStartPos, cubeStartOrientation)
boxId = p.loadURDF("src/spider.xml",cubeStartPos, cubeStartOrientation)
# boxId = p.loadURDF("spider_simple.urdf",cubeStartPos, cubeStartOrientation)
toggle = 1
p.setRealTimeSimulation(1)
for i in range (10000):
#p.stepSimulation()
moveLeg( robot=boxId, id=0, position= toggle * -2 ) #LEFT_FRONT
moveLeg( robot=boxId, id=2, position= toggle * -2 ) #LEFT_FRONT
moveLeg( robot=boxId, id=3, position= toggle * -2 ) #RIGHT_FRONT
moveLeg( robot=boxId, id=5, position= toggle * 2 ) #RIGHT_FRONT
moveLeg( robot=boxId, id=6, position= toggle * 2 ) #LEFT_BACK
moveLeg( robot=boxId, id=8, position= toggle * -2 ) #LEFT_BACK
moveLeg( robot=boxId, id=9, position= toggle * 2 ) #RIGHT_BACK
moveLeg( robot=boxId, id=11, position= toggle * 2 ) #RIGHT_BACK
#time.sleep(1./140.)g
#time.sleep(0.01)
time.sleep(1)
toggle = toggle * -1
#viewMatrix = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
#projectionMatrix = [1.0825318098068237, 0.0, 0.0, 0.0, 0.0, 1.732050895690918, 0.0, 0.0, 0.0, 0.0, -1.0002000331878662, -1.0, 0.0, 0.0, -0.020002000033855438, 0.0]
#img_arr = p.getCameraImage(pixelWidth, pixelHeight, viewMatrix=viewMatrix, projectionMatrix=projectionMatrix, shadow=1,lightDirection=[1,1,1])
cubePos, cubeOrn = p.getBasePositionAndOrientation(boxId)
print(cubePos,cubeOrn)
p.disconnect()
The complete code can be seen here:
https://github.com/rubencg195/WalkingSpider_OpenAI_PyBullet_ROS
tensorflow deep-learning pytorch reinforcement-learning openai-gym
Please check "Which site?" for general issues.
– Prune
Dec 27 '18 at 17:47
For this site, we generally look for a demonstrable problem with code or coding tools, narrowed down by your prior research and experimentation or debugging. What reinforcement algorithms have you tried, and how did they work out for you?
– Prune
Dec 27 '18 at 17:49
add a comment |
I have designed this robot in URDF format and its environment in pybullet. Each leg has a minimum and maximum value of movement.
What reinforcement algorithm will be best to create a walking policy in a simple environment in which a positive reward will be given if it walks in the positive X-axis direction?
The expected output from the policy is an array in the range of (-1, 1) for each joint. The input of the policy is the position of each joint, the center of mass of the body, the difference in height between the floor and the body to see if it has fallen and the movement in the x-axis.
Limitations
- left_front_joint => lower="-0.4" upper="2.5" id=0
- left_front_leg_joint => lower="-0.6" upper="0.7" id=2
- right_front_joint => lower="-2.5" upper="0.4" id=3
- right_front_leg_joint => lower="-0.6" upper="0.7" id=5
- left_back_joint => lower="-2.5" upper="0.4" id=6
- left_back_leg_joint => lower="-0.6" upper="0.7" id=8
- right_back_joint => lower="-0.4" upper="2.5" id=9
- right_back_leg_joint => lower="-0.6" upper="0.7" id=11
The code above is just a test of the environment with a manual set of movements hard-coded in the robot just to test how it could walk later. The environment is set to real time, but I assume it needs to be in a frame by frame lapse during the policy training.
A video of it can be seen in:
https://youtu.be/j9sysG-EIkQ
import pybullet as p
import time
import pybullet_data
def moveLeg( robot=None, id=0, position=0, force=1.5 ):
if(robot is None):
return;
p.setJointMotorControl2(
robot,
id,
p.POSITION_CONTROL,
targetPosition=position,
force=force,
#maxVelocity=5
)
pixelWidth = 1000
pixelHeight = 1000
camTargetPos = [0,0,0]
camDistance = 0.5
pitch = -10.0
roll=0
upAxisIndex = 2
yaw = 0
physicsClient = p.connect(p.GUI)#or p.DIRECT for non-graphical version
p.setAdditionalSearchPath(pybullet_data.getDataPath()) #optionally
p.setGravity(0,0,-10)
viewMatrix = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
planeId = p.loadURDF("plane.urdf")
cubeStartPos = [0,0,0.05]
cubeStartOrientation = p.getQuaternionFromEuler([0,0,0])
#boxId = p.loadURDF("r2d2.urdf",cubeStartPos, cubeStartOrientation)
boxId = p.loadURDF("src/spider.xml",cubeStartPos, cubeStartOrientation)
# boxId = p.loadURDF("spider_simple.urdf",cubeStartPos, cubeStartOrientation)
toggle = 1
p.setRealTimeSimulation(1)
for i in range (10000):
#p.stepSimulation()
moveLeg( robot=boxId, id=0, position= toggle * -2 ) #LEFT_FRONT
moveLeg( robot=boxId, id=2, position= toggle * -2 ) #LEFT_FRONT
moveLeg( robot=boxId, id=3, position= toggle * -2 ) #RIGHT_FRONT
moveLeg( robot=boxId, id=5, position= toggle * 2 ) #RIGHT_FRONT
moveLeg( robot=boxId, id=6, position= toggle * 2 ) #LEFT_BACK
moveLeg( robot=boxId, id=8, position= toggle * -2 ) #LEFT_BACK
moveLeg( robot=boxId, id=9, position= toggle * 2 ) #RIGHT_BACK
moveLeg( robot=boxId, id=11, position= toggle * 2 ) #RIGHT_BACK
#time.sleep(1./140.)g
#time.sleep(0.01)
time.sleep(1)
toggle = toggle * -1
#viewMatrix = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
#projectionMatrix = [1.0825318098068237, 0.0, 0.0, 0.0, 0.0, 1.732050895690918, 0.0, 0.0, 0.0, 0.0, -1.0002000331878662, -1.0, 0.0, 0.0, -0.020002000033855438, 0.0]
#img_arr = p.getCameraImage(pixelWidth, pixelHeight, viewMatrix=viewMatrix, projectionMatrix=projectionMatrix, shadow=1,lightDirection=[1,1,1])
cubePos, cubeOrn = p.getBasePositionAndOrientation(boxId)
print(cubePos,cubeOrn)
p.disconnect()
The complete code can be seen here:
https://github.com/rubencg195/WalkingSpider_OpenAI_PyBullet_ROS
tensorflow deep-learning pytorch reinforcement-learning openai-gym
I have designed this robot in URDF format and its environment in pybullet. Each leg has a minimum and maximum value of movement.
What reinforcement algorithm will be best to create a walking policy in a simple environment in which a positive reward will be given if it walks in the positive X-axis direction?
The expected output from the policy is an array in the range of (-1, 1) for each joint. The input of the policy is the position of each joint, the center of mass of the body, the difference in height between the floor and the body to see if it has fallen and the movement in the x-axis.
Limitations
- left_front_joint => lower="-0.4" upper="2.5" id=0
- left_front_leg_joint => lower="-0.6" upper="0.7" id=2
- right_front_joint => lower="-2.5" upper="0.4" id=3
- right_front_leg_joint => lower="-0.6" upper="0.7" id=5
- left_back_joint => lower="-2.5" upper="0.4" id=6
- left_back_leg_joint => lower="-0.6" upper="0.7" id=8
- right_back_joint => lower="-0.4" upper="2.5" id=9
- right_back_leg_joint => lower="-0.6" upper="0.7" id=11
The code above is just a test of the environment with a manual set of movements hard-coded in the robot just to test how it could walk later. The environment is set to real time, but I assume it needs to be in a frame by frame lapse during the policy training.
A video of it can be seen in:
https://youtu.be/j9sysG-EIkQ
import pybullet as p
import time
import pybullet_data
def moveLeg( robot=None, id=0, position=0, force=1.5 ):
if(robot is None):
return;
p.setJointMotorControl2(
robot,
id,
p.POSITION_CONTROL,
targetPosition=position,
force=force,
#maxVelocity=5
)
pixelWidth = 1000
pixelHeight = 1000
camTargetPos = [0,0,0]
camDistance = 0.5
pitch = -10.0
roll=0
upAxisIndex = 2
yaw = 0
physicsClient = p.connect(p.GUI)#or p.DIRECT for non-graphical version
p.setAdditionalSearchPath(pybullet_data.getDataPath()) #optionally
p.setGravity(0,0,-10)
viewMatrix = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
planeId = p.loadURDF("plane.urdf")
cubeStartPos = [0,0,0.05]
cubeStartOrientation = p.getQuaternionFromEuler([0,0,0])
#boxId = p.loadURDF("r2d2.urdf",cubeStartPos, cubeStartOrientation)
boxId = p.loadURDF("src/spider.xml",cubeStartPos, cubeStartOrientation)
# boxId = p.loadURDF("spider_simple.urdf",cubeStartPos, cubeStartOrientation)
toggle = 1
p.setRealTimeSimulation(1)
for i in range (10000):
#p.stepSimulation()
moveLeg( robot=boxId, id=0, position= toggle * -2 ) #LEFT_FRONT
moveLeg( robot=boxId, id=2, position= toggle * -2 ) #LEFT_FRONT
moveLeg( robot=boxId, id=3, position= toggle * -2 ) #RIGHT_FRONT
moveLeg( robot=boxId, id=5, position= toggle * 2 ) #RIGHT_FRONT
moveLeg( robot=boxId, id=6, position= toggle * 2 ) #LEFT_BACK
moveLeg( robot=boxId, id=8, position= toggle * -2 ) #LEFT_BACK
moveLeg( robot=boxId, id=9, position= toggle * 2 ) #RIGHT_BACK
moveLeg( robot=boxId, id=11, position= toggle * 2 ) #RIGHT_BACK
#time.sleep(1./140.)g
#time.sleep(0.01)
time.sleep(1)
toggle = toggle * -1
#viewMatrix = p.computeViewMatrixFromYawPitchRoll(camTargetPos, camDistance, yaw, pitch, roll, upAxisIndex)
#projectionMatrix = [1.0825318098068237, 0.0, 0.0, 0.0, 0.0, 1.732050895690918, 0.0, 0.0, 0.0, 0.0, -1.0002000331878662, -1.0, 0.0, 0.0, -0.020002000033855438, 0.0]
#img_arr = p.getCameraImage(pixelWidth, pixelHeight, viewMatrix=viewMatrix, projectionMatrix=projectionMatrix, shadow=1,lightDirection=[1,1,1])
cubePos, cubeOrn = p.getBasePositionAndOrientation(boxId)
print(cubePos,cubeOrn)
p.disconnect()
The complete code can be seen here:
https://github.com/rubencg195/WalkingSpider_OpenAI_PyBullet_ROS
tensorflow deep-learning pytorch reinforcement-learning openai-gym
tensorflow deep-learning pytorch reinforcement-learning openai-gym
edited Dec 27 '18 at 17:46


Prune
42.6k143454
42.6k143454
asked Dec 27 '18 at 16:39


Ruben A. Chevez
72
72
Please check "Which site?" for general issues.
– Prune
Dec 27 '18 at 17:47
For this site, we generally look for a demonstrable problem with code or coding tools, narrowed down by your prior research and experimentation or debugging. What reinforcement algorithms have you tried, and how did they work out for you?
– Prune
Dec 27 '18 at 17:49
add a comment |
Please check "Which site?" for general issues.
– Prune
Dec 27 '18 at 17:47
For this site, we generally look for a demonstrable problem with code or coding tools, narrowed down by your prior research and experimentation or debugging. What reinforcement algorithms have you tried, and how did they work out for you?
– Prune
Dec 27 '18 at 17:49
Please check "Which site?" for general issues.
– Prune
Dec 27 '18 at 17:47
Please check "Which site?" for general issues.
– Prune
Dec 27 '18 at 17:47
For this site, we generally look for a demonstrable problem with code or coding tools, narrowed down by your prior research and experimentation or debugging. What reinforcement algorithms have you tried, and how did they work out for you?
– Prune
Dec 27 '18 at 17:49
For this site, we generally look for a demonstrable problem with code or coding tools, narrowed down by your prior research and experimentation or debugging. What reinforcement algorithms have you tried, and how did they work out for you?
– Prune
Dec 27 '18 at 17:49
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53948176%2fhow-to-implement-a-continuous-control-of-a-quadruped-robot-with-deep-reinforceme%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53948176%2fhow-to-implement-a-continuous-control-of-a-quadruped-robot-with-deep-reinforceme%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
CcsXAlNXw,SsHFwsqr PxuOdf5wR8FR UOnFC9tPhG2HBXU9Ox,jaz1teO7 H3mtgNg kvca4YezIfcPwg,zqgERZhLX SoGa,pSP
Please check "Which site?" for general issues.
– Prune
Dec 27 '18 at 17:47
For this site, we generally look for a demonstrable problem with code or coding tools, narrowed down by your prior research and experimentation or debugging. What reinforcement algorithms have you tried, and how did they work out for you?
– Prune
Dec 27 '18 at 17:49