Glusterfs Replace Brick in Dispersed












0















I have a Dispersed Glusterfs volume comprised of 3x bricks on 3x servers. Recently one of the servers experienced a hard drive failure and dropped out of the cluster. I am trying to replace this brick in the cluster but i cant get it to work.



First up here is the version info:



$ glusterfsd --version
glusterfs 3.13.2
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.


It is running on Ubuntu 18.04.



Here is the existing info:



Volume Name: vol01
Type: Disperse
Volume ID: 061cac4d-1165-4afe-87e0-27b213ea19dc
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srv02:/srv/glusterfs/vol01/brick <-- This is the brick that died
Brick2: srv03:/srv/glusterfs/vol01/brick
Brick3: srv04:/srv/glusterfs/vol01/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet


I wish to replace the srv02 brick with a brick from srv05 using the following:



gluster volume replace-brick vol01 srv02:/srv/glusterfs/vol01/brick srv05:/srv/glusterfs/vol01/brick commit force


However when I run this command (as root) I get this error:



volume replace-brick: failed: Pre Validation failed on srv05. brick: srv02:/srv/glusterfs/vol01/brick does not exist in volume: vol01


As far as I know it should work, srv05 is connected:



# gluster peer status
Number of Peers: 3

Hostname: srv04
Uuid: 5bbd6c69-e0a7-491c-b605-d70cb83ebc72
State: Peer in Cluster (Connected)

Hostname: srv02
Uuid: e4e856ba-61df-45eb-83bb-e2d2e799fc8d
State: Peer Rejected (Disconnected)

Hostname: srv05
Uuid: e7d098c1-7bbd-44e1-931f-034da645c6c6
State: Peer in Cluster (Connected)


As you can see srv05 is connected and in the cluster, srv02 is not and disconnected...



All the bricks are the same size on a XFS partitions. The brick on srv05 is empty.



What am I doing wrong? I would prefer not to have to dump the whole FS and rebuild it if possible...



EDIT 2019-01-01:
After following this tutorial here: https://support.rackspace.com/how-to/recover-from-a-failed-server-in-a-glusterfs-array/ to replace the dead server brick (srv02) with the new one.



The server and brick are recognized by the cluster:



# gluster volume status
Status of volume: vol01
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick srv02:/srv/glusterfs/vol01/brick N/A N/A N N/A
Brick srv03:/srv/glusterfs/vol01/brick 49152 0 Y 21984
Brick srv04:/srv/glusterfs/vol01/brick 49152 0 Y 16681
Self-heal Daemon on localhost N/A N/A Y 2582
Self-heal Daemon on srv04 N/A N/A Y 16703
Self-heal Daemon on srv03 N/A N/A Y 22006


The brick however on the replacement SRV02 is not coming online!



After much searching I found this in the brick log on the new srv02:



[2019-01-01 05:50:05.727791] E [MSGID: 138001] [index.c:2349:init] 0-vol01-index: Failed to find parent dir (/srv/glusterfs/vol01/brick/.glusterfs) of index basepath /srv/glusterfs/vol01/brick/.glusterfs/indices. [No such file or directory]


Not at all sure how to fix this one as its a blank brick that I am looking to bring online and heal!










share|improve this question

























  • Okay so I followed this tutorial here: support.rackspace.com/how-to/… Now I have the server re-mapped into the cluster, however the brick in srv02 wont start and glusterfsd is not running on that server... where do I check the logs for this?

    – Zexelon
    Jan 1 at 1:57


















0















I have a Dispersed Glusterfs volume comprised of 3x bricks on 3x servers. Recently one of the servers experienced a hard drive failure and dropped out of the cluster. I am trying to replace this brick in the cluster but i cant get it to work.



First up here is the version info:



$ glusterfsd --version
glusterfs 3.13.2
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.


It is running on Ubuntu 18.04.



Here is the existing info:



Volume Name: vol01
Type: Disperse
Volume ID: 061cac4d-1165-4afe-87e0-27b213ea19dc
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srv02:/srv/glusterfs/vol01/brick <-- This is the brick that died
Brick2: srv03:/srv/glusterfs/vol01/brick
Brick3: srv04:/srv/glusterfs/vol01/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet


I wish to replace the srv02 brick with a brick from srv05 using the following:



gluster volume replace-brick vol01 srv02:/srv/glusterfs/vol01/brick srv05:/srv/glusterfs/vol01/brick commit force


However when I run this command (as root) I get this error:



volume replace-brick: failed: Pre Validation failed on srv05. brick: srv02:/srv/glusterfs/vol01/brick does not exist in volume: vol01


As far as I know it should work, srv05 is connected:



# gluster peer status
Number of Peers: 3

Hostname: srv04
Uuid: 5bbd6c69-e0a7-491c-b605-d70cb83ebc72
State: Peer in Cluster (Connected)

Hostname: srv02
Uuid: e4e856ba-61df-45eb-83bb-e2d2e799fc8d
State: Peer Rejected (Disconnected)

Hostname: srv05
Uuid: e7d098c1-7bbd-44e1-931f-034da645c6c6
State: Peer in Cluster (Connected)


As you can see srv05 is connected and in the cluster, srv02 is not and disconnected...



All the bricks are the same size on a XFS partitions. The brick on srv05 is empty.



What am I doing wrong? I would prefer not to have to dump the whole FS and rebuild it if possible...



EDIT 2019-01-01:
After following this tutorial here: https://support.rackspace.com/how-to/recover-from-a-failed-server-in-a-glusterfs-array/ to replace the dead server brick (srv02) with the new one.



The server and brick are recognized by the cluster:



# gluster volume status
Status of volume: vol01
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick srv02:/srv/glusterfs/vol01/brick N/A N/A N N/A
Brick srv03:/srv/glusterfs/vol01/brick 49152 0 Y 21984
Brick srv04:/srv/glusterfs/vol01/brick 49152 0 Y 16681
Self-heal Daemon on localhost N/A N/A Y 2582
Self-heal Daemon on srv04 N/A N/A Y 16703
Self-heal Daemon on srv03 N/A N/A Y 22006


The brick however on the replacement SRV02 is not coming online!



After much searching I found this in the brick log on the new srv02:



[2019-01-01 05:50:05.727791] E [MSGID: 138001] [index.c:2349:init] 0-vol01-index: Failed to find parent dir (/srv/glusterfs/vol01/brick/.glusterfs) of index basepath /srv/glusterfs/vol01/brick/.glusterfs/indices. [No such file or directory]


Not at all sure how to fix this one as its a blank brick that I am looking to bring online and heal!










share|improve this question

























  • Okay so I followed this tutorial here: support.rackspace.com/how-to/… Now I have the server re-mapped into the cluster, however the brick in srv02 wont start and glusterfsd is not running on that server... where do I check the logs for this?

    – Zexelon
    Jan 1 at 1:57
















0












0








0








I have a Dispersed Glusterfs volume comprised of 3x bricks on 3x servers. Recently one of the servers experienced a hard drive failure and dropped out of the cluster. I am trying to replace this brick in the cluster but i cant get it to work.



First up here is the version info:



$ glusterfsd --version
glusterfs 3.13.2
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.


It is running on Ubuntu 18.04.



Here is the existing info:



Volume Name: vol01
Type: Disperse
Volume ID: 061cac4d-1165-4afe-87e0-27b213ea19dc
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srv02:/srv/glusterfs/vol01/brick <-- This is the brick that died
Brick2: srv03:/srv/glusterfs/vol01/brick
Brick3: srv04:/srv/glusterfs/vol01/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet


I wish to replace the srv02 brick with a brick from srv05 using the following:



gluster volume replace-brick vol01 srv02:/srv/glusterfs/vol01/brick srv05:/srv/glusterfs/vol01/brick commit force


However when I run this command (as root) I get this error:



volume replace-brick: failed: Pre Validation failed on srv05. brick: srv02:/srv/glusterfs/vol01/brick does not exist in volume: vol01


As far as I know it should work, srv05 is connected:



# gluster peer status
Number of Peers: 3

Hostname: srv04
Uuid: 5bbd6c69-e0a7-491c-b605-d70cb83ebc72
State: Peer in Cluster (Connected)

Hostname: srv02
Uuid: e4e856ba-61df-45eb-83bb-e2d2e799fc8d
State: Peer Rejected (Disconnected)

Hostname: srv05
Uuid: e7d098c1-7bbd-44e1-931f-034da645c6c6
State: Peer in Cluster (Connected)


As you can see srv05 is connected and in the cluster, srv02 is not and disconnected...



All the bricks are the same size on a XFS partitions. The brick on srv05 is empty.



What am I doing wrong? I would prefer not to have to dump the whole FS and rebuild it if possible...



EDIT 2019-01-01:
After following this tutorial here: https://support.rackspace.com/how-to/recover-from-a-failed-server-in-a-glusterfs-array/ to replace the dead server brick (srv02) with the new one.



The server and brick are recognized by the cluster:



# gluster volume status
Status of volume: vol01
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick srv02:/srv/glusterfs/vol01/brick N/A N/A N N/A
Brick srv03:/srv/glusterfs/vol01/brick 49152 0 Y 21984
Brick srv04:/srv/glusterfs/vol01/brick 49152 0 Y 16681
Self-heal Daemon on localhost N/A N/A Y 2582
Self-heal Daemon on srv04 N/A N/A Y 16703
Self-heal Daemon on srv03 N/A N/A Y 22006


The brick however on the replacement SRV02 is not coming online!



After much searching I found this in the brick log on the new srv02:



[2019-01-01 05:50:05.727791] E [MSGID: 138001] [index.c:2349:init] 0-vol01-index: Failed to find parent dir (/srv/glusterfs/vol01/brick/.glusterfs) of index basepath /srv/glusterfs/vol01/brick/.glusterfs/indices. [No such file or directory]


Not at all sure how to fix this one as its a blank brick that I am looking to bring online and heal!










share|improve this question
















I have a Dispersed Glusterfs volume comprised of 3x bricks on 3x servers. Recently one of the servers experienced a hard drive failure and dropped out of the cluster. I am trying to replace this brick in the cluster but i cant get it to work.



First up here is the version info:



$ glusterfsd --version
glusterfs 3.13.2
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.


It is running on Ubuntu 18.04.



Here is the existing info:



Volume Name: vol01
Type: Disperse
Volume ID: 061cac4d-1165-4afe-87e0-27b213ea19dc
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srv02:/srv/glusterfs/vol01/brick <-- This is the brick that died
Brick2: srv03:/srv/glusterfs/vol01/brick
Brick3: srv04:/srv/glusterfs/vol01/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet


I wish to replace the srv02 brick with a brick from srv05 using the following:



gluster volume replace-brick vol01 srv02:/srv/glusterfs/vol01/brick srv05:/srv/glusterfs/vol01/brick commit force


However when I run this command (as root) I get this error:



volume replace-brick: failed: Pre Validation failed on srv05. brick: srv02:/srv/glusterfs/vol01/brick does not exist in volume: vol01


As far as I know it should work, srv05 is connected:



# gluster peer status
Number of Peers: 3

Hostname: srv04
Uuid: 5bbd6c69-e0a7-491c-b605-d70cb83ebc72
State: Peer in Cluster (Connected)

Hostname: srv02
Uuid: e4e856ba-61df-45eb-83bb-e2d2e799fc8d
State: Peer Rejected (Disconnected)

Hostname: srv05
Uuid: e7d098c1-7bbd-44e1-931f-034da645c6c6
State: Peer in Cluster (Connected)


As you can see srv05 is connected and in the cluster, srv02 is not and disconnected...



All the bricks are the same size on a XFS partitions. The brick on srv05 is empty.



What am I doing wrong? I would prefer not to have to dump the whole FS and rebuild it if possible...



EDIT 2019-01-01:
After following this tutorial here: https://support.rackspace.com/how-to/recover-from-a-failed-server-in-a-glusterfs-array/ to replace the dead server brick (srv02) with the new one.



The server and brick are recognized by the cluster:



# gluster volume status
Status of volume: vol01
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick srv02:/srv/glusterfs/vol01/brick N/A N/A N N/A
Brick srv03:/srv/glusterfs/vol01/brick 49152 0 Y 21984
Brick srv04:/srv/glusterfs/vol01/brick 49152 0 Y 16681
Self-heal Daemon on localhost N/A N/A Y 2582
Self-heal Daemon on srv04 N/A N/A Y 16703
Self-heal Daemon on srv03 N/A N/A Y 22006


The brick however on the replacement SRV02 is not coming online!



After much searching I found this in the brick log on the new srv02:



[2019-01-01 05:50:05.727791] E [MSGID: 138001] [index.c:2349:init] 0-vol01-index: Failed to find parent dir (/srv/glusterfs/vol01/brick/.glusterfs) of index basepath /srv/glusterfs/vol01/brick/.glusterfs/indices. [No such file or directory]


Not at all sure how to fix this one as its a blank brick that I am looking to bring online and heal!







distributed-computing glusterfs






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 1 at 15:48







Zexelon

















asked Jan 1 at 1:20









ZexelonZexelon

5910




5910













  • Okay so I followed this tutorial here: support.rackspace.com/how-to/… Now I have the server re-mapped into the cluster, however the brick in srv02 wont start and glusterfsd is not running on that server... where do I check the logs for this?

    – Zexelon
    Jan 1 at 1:57





















  • Okay so I followed this tutorial here: support.rackspace.com/how-to/… Now I have the server re-mapped into the cluster, however the brick in srv02 wont start and glusterfsd is not running on that server... where do I check the logs for this?

    – Zexelon
    Jan 1 at 1:57



















Okay so I followed this tutorial here: support.rackspace.com/how-to/… Now I have the server re-mapped into the cluster, however the brick in srv02 wont start and glusterfsd is not running on that server... where do I check the logs for this?

– Zexelon
Jan 1 at 1:57







Okay so I followed this tutorial here: support.rackspace.com/how-to/… Now I have the server re-mapped into the cluster, however the brick in srv02 wont start and glusterfsd is not running on that server... where do I check the logs for this?

– Zexelon
Jan 1 at 1:57














1 Answer
1






active

oldest

votes


















0














So in the end I got the brick to come online by the following in the brick volume directory:



# mkdir .glusterfs
# chmod 600 .glusterfs
# cd .glusterfs
# mkdir indices
# chmod 600 indices
# systemctl restart glusterd


The brick came online and the heal process was started with:



# gluster volume heal vol01 full


So far it seams to be functioning just fine.






share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53992485%2fglusterfs-replace-brick-in-dispersed%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    So in the end I got the brick to come online by the following in the brick volume directory:



    # mkdir .glusterfs
    # chmod 600 .glusterfs
    # cd .glusterfs
    # mkdir indices
    # chmod 600 indices
    # systemctl restart glusterd


    The brick came online and the heal process was started with:



    # gluster volume heal vol01 full


    So far it seams to be functioning just fine.






    share|improve this answer




























      0














      So in the end I got the brick to come online by the following in the brick volume directory:



      # mkdir .glusterfs
      # chmod 600 .glusterfs
      # cd .glusterfs
      # mkdir indices
      # chmod 600 indices
      # systemctl restart glusterd


      The brick came online and the heal process was started with:



      # gluster volume heal vol01 full


      So far it seams to be functioning just fine.






      share|improve this answer


























        0












        0








        0







        So in the end I got the brick to come online by the following in the brick volume directory:



        # mkdir .glusterfs
        # chmod 600 .glusterfs
        # cd .glusterfs
        # mkdir indices
        # chmod 600 indices
        # systemctl restart glusterd


        The brick came online and the heal process was started with:



        # gluster volume heal vol01 full


        So far it seams to be functioning just fine.






        share|improve this answer













        So in the end I got the brick to come online by the following in the brick volume directory:



        # mkdir .glusterfs
        # chmod 600 .glusterfs
        # cd .glusterfs
        # mkdir indices
        # chmod 600 indices
        # systemctl restart glusterd


        The brick came online and the heal process was started with:



        # gluster volume heal vol01 full


        So far it seams to be functioning just fine.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 1 at 15:43









        ZexelonZexelon

        5910




        5910
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53992485%2fglusterfs-replace-brick-in-dispersed%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Mossoró

            Error while reading .h5 file using the rhdf5 package in R

            Pushsharp Apns notification error: 'InvalidToken'