+ Responder ao Tópico



  1. #1

    Padrão Heartbeat sem serial

    Olá pessoal...

    Instalei o DRBD 0.7, compilei, configurei e até aí blz. Estou usando o Fedora 6 e nunca configurei o heartbeat, porém todos os tutoriais que obtive pela net apontam para uma configuração utilizando as seriais. Como posso configurar o heartbeat para utilizar a eth? Qual o problema de eu utilizar a eth que está sendo utilizada pelo DRBD? Alguém poderia me enviar os arquivos de configuração do Heartbeat para que eu faça apenas as modificações necessária?

    Antecipadamente agradeço.
    Abraços e até mais.

  2. #2

    Padrão

    Fala ae denysiacanga

    Cara como tu montou esse DRBD, primeiramente qual estrutura tu tá usando.

    Mas ae segue o link. para tu usar H.A com via eth's

    Heartbeat.

    Mas ae se tu puder postar ae qual a finalidade desse servidor, para podermos, te aconeselha melhor. seria mais facil.

  3. #3

    Padrão Amarrando a bigorna no pescoço... vai pular...

    Valeu pela atenção...

    É o seguinte... Eu estou querendo fazer um servidor smb de HA.
    Tenho dois micros identicos, com duas placas de 1GB uma eth1 e outra eth2. Já configurei o DRBD, e até o Heartbeat ontem mesmo. Os problemas que eu venho encontrando:
    1-) Quando o Servidor1 desarma, o Servidor2 passa para primary, mas não monta o /dev/drbd0 no diretório /mnt/tudo, e tão pouco compartilha o mesmo através do smb (evidente, afinal não está montado)

    Após uma porrada de alterações que eu fiz o que é que eu consegui?
    2-) Quando o Servidor1 desarma, o Servidor2 não está mais passando para primary no /proc/drbd (resultado... piorou)

    Abaixo meu drbd.conf

    skip {
    As you can see, you can also comment chunks of text
    with a 'skip[optional nonsense]{ skipped text }' section.
    This comes in handy, if you just want to comment out
    some 'resource <some name> {...}' section:
    just precede it with 'skip'.

    The basic format of option assignment is
    <option name><linear whitespace><value>;

    It should be obvious from the examples below,
    but if you really care to know the details:

    <option name> :=
    valid options in the respective scope
    <value> := <num>|<string>|<choice>|...
    depending on the set of allowed values
    for the respective option.
    <num> := [0-9]+, sometimes with an optional suffix of K,M,G
    <string> := (<name>|\"([^\"\\\n]*|\\.)*\")+
    <name> := [/_.A-Za-z0-9-]+
    }

    #
    # At most ONE global section is allowed.
    # It must precede any resource section.
    #
    # global {
    # use this if you want to define more resources later
    # without reloading the module.
    # by default we load the module with exactly as many devices
    # as configured mentioned in this file.
    #
    # minor-count 5;

    # The user dialog counts and displays the seconds it waited so
    # far. You might want to disable this if you have the console
    # of your server connected to a serial terminal server with
    # limited logging capacity.
    # The Dialog will print the count each 'dialog-refresh' seconds,
    # set it to 0 to disable redrawing completely. [ default = 1 ]
    #
    # dialog-refresh 5; # 5 seconds

    # You might disable one of drbdadm's sanity check.
    # disable-ip-verification;
    # }

    #
    # this need not be r#, you may use phony resource names,
    # like "resource web" or "resource mail", too
    #

    resource r0 {

    protocol C;
    incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";

    startup {
    # Wait for connection timeout.
    # The init script blocks the boot process until the resources
    # are connected. This is so when the cluster manager starts later,
    # it does not see a resource with internal split-brain.
    # In case you want to limit the wait time, do it here.
    # Default is 0, which means unlimited. Unit is seconds.
    #
    # wfc-timeout 0;

    # Wait for connection timeout if this node was a degraded cluster.
    # In case a degraded cluster (= cluster with only one node left)
    # is rebooted, this timeout value is used.
    #
    degr-wfc-timeout 120; # 2 minutes.
    }

    disk {
    # if the lower level device reports io-error you have the choice of
    # "pass_on" -> Report the io-error to the upper layers.
    # Primary -> report it to the mounted file system.
    # Secondary -> ignore it.
    # "panic" -> The node leaves the cluster by doing a kernel panic.
    # "detach" -> The node drops its backing storage device, and
    # continues in disk less mode.
    #
    on-io-error detach;

    # In case you only want to use a fraction of the available space
    # you might use the "size" option here.
    #
    # size 10G;
    }

    net {
    # this is the size of the tcp socket send buffer
    # increase it _carefully_ if you want to use protocol A over a
    # high latency network with reasonable write throughput.
    # defaults to 2*65535; you might try even 1M, but if your kernel or
    # network driver chokes on that, you have been warned.
    # sndbuf-size 512k;

    # timeout 60; # 6 seconds (unit = 0.1 seconds)
    # connect-int 10; # 10 seconds (unit = 1 second)
    # ping-int 10; # 10 seconds (unit = 1 second)

    # Maximal number of requests (4K) to be allocated by DRBD.
    # The minimum is hardcoded to 32 (=128 kByte).
    # For high performance installations it might help if you
    # increase that number. These buffers are used to hold
    # datablocks while they are written to disk.
    #
    # max-buffers 2048;

    # When the number of outstanding requests on a standby (secondary)
    # node exceeds unplug-watermark, we start to kick the backing device
    # to start its request processing. This is an advanced tuning
    # parameter to get more performance out of capable storage controlers.
    # Some controlers like to be kicked often, other controlers
    # deliver better performance when they are kicked less frequently.
    # Set it to the value of max-buffers to get the least possible
    # number of run_task_queue_disk() / q->unplug_fn(q) calls.
    #
    # unplug-watermark 128;


    # The highest number of data blocks between two write barriers.
    # If you set this < 10 you might decrease your performance.
    # max-epoch-size 2048;

    # if some block send times out this many times, the peer is
    # considered dead, even if it still answers ping requests.
    # ko-count 4;

    # if the connection to the peer is lost you have the choice of
    # "reconnect" -> Try to reconnect (AKA WFConnection state)
    # "stand_alone" -> Do not reconnect (AKA StandAlone state)
    # "freeze_io" -> Try to reconnect but freeze all IO until
    # the connection is established again.
    # on-disconnect reconnect;

    }

    syncer {
    # Limit the bandwith used by the resynchronisation process.
    # default unit is kByte/sec; optional suffixes K,M are allowed.
    #
    # Even though this is a network setting, the units are based
    # on _byte_ (octet for our french friends) not bit.
    # We are storage guys.
    #
    # Note that on 100Mbit ethernet, you cannot expect more than
    # 12.5 MByte total transfer rate.
    # Consider using GigaBit Ethernet.
    #
    rate 100M;

    # All devices in one group are resynchronized parallel.
    # Resychronisation of groups is serialized in ascending order.
    # Put DRBD resources which are on different physical disks in one group.
    # Put DRBD resources on one physical disk in different groups.
    #
    group 1;

    # Configures the size of the active set. Each extent is 4M,
    # 257 Extents ~> 1GB active set size. In case your syncer
    # runs @ 10MB/sec, all resync after a primary's crash will last
    # 1GB / ( 10MB/sec ) ~ 102 seconds ~ One Minute and 42 Seconds.
    # BTW, the hash algorithm works best if the number of al-extents
    # is prime. (To test the worst case performace use a power of 2)
    al-extents 257;
    }

    on Servidor1 {
    device /dev/drbd0;
    disk /dev/sda3;
    address 10.0.0.1:7788;
    meta-disk internal;

    # meta-disk is either 'internal' or '/dev/ice/name [idx]'
    #
    # You can use a single block device to store meta-data
    # of multiple DRBD's.
    # E.g. use meta-disk /dev/hde6[0]; and meta-disk /dev/hde6[1];
    # for two different resources. In this case the meta-disk
    # would need to be at least 256 MB in size.
    #
    # 'internal' means, that the last 128 MB of the lower device
    # are used to store the meta-data.
    # You must not give an index with 'internal'.
    }

    on Servidor2 {
    device /dev/drbd0;
    disk /dev/sda3;
    address 10.0.0.2:7788;
    meta-disk internal;
    }
    }



    meu ha.cf
    debugfile /var/log/ha-debug
    logfile /var/log/ha-log
    logfacility local0
    keepalive 2
    deadtime 30
    warntime 10
    initdead 120
    udpport 694
    bcast eth1 # Linux
    auto_failback on
    node Servidor1
    node Servidor2
    apiauth ipfail gid=haclient uid=hacluster
    # respwan hacluster /usr/lib/heartbeat/ipfail
    auto_failback off

    meu haresourses
    Servidor1 10.0.0.1 smb drbddisk

    Por hora agradeço muito.... Até mais.

  4. #4

    Padrão

    me diz ae no servidor 2 sem ele estar com Heartbeat habilitado, ele sobe normalmente o samba, eo drbd, ou seja as funções que o master tem, para disponiblizar para a sua rede.

  5. #5

    Padrão Consegui!!!!!

    Obrigado LinuxKid...

    Passei a manhã inteira quebrando meu crânio e finalmente ele funcionou.

    1- Erro: Estava apontando no ha.cf a eth1, enquanto que deveria ter apontado a eth2
    2- Erro: O serviço smb que estava em init.d não estava erguendo... cp smb /etc/ha.d/resource.d e.... eureka... ele carregou!!!!
    3-Erro: Faltava carregar o Filesystem no haresources
    4-Erro: Não havia destivado o selinux do Servidor2... Não acessava o samba

    vou postar como os arquivos ficaram depois de prontos...

    drbd.conf:
    skip {
    As you can see, you can also comment chunks of text
    with a 'skip[optional nonsense]{ skipped text }' section.
    This comes in handy, if you just want to comment out
    some 'resource <some name> {...}' section:
    just precede it with 'skip'.

    The basic format of option assignment is
    <option name><linear whitespace><value>;

    It should be obvious from the examples below,
    but if you really care to know the details:

    <option name> :=
    valid options in the respective scope
    <value> := <num>|<string>|<choice>|...
    depending on the set of allowed values
    for the respective option.
    <num> := [0-9]+, sometimes with an optional suffix of K,M,G
    <string> := (<name>|\"([^\"\\\n]*|\\.)*\")+
    <name> := [/_.A-Za-z0-9-]+
    }

    #
    # At most ONE global section is allowed.
    # It must precede any resource section.
    #
    # global {
    # use this if you want to define more resources later
    # without reloading the module.
    # by default we load the module with exactly as many devices
    # as configured mentioned in this file.
    #
    # minor-count 5;

    # The user dialog counts and displays the seconds it waited so
    # far. You might want to disable this if you have the console
    # of your server connected to a serial terminal server with
    # limited logging capacity.
    # The Dialog will print the count each 'dialog-refresh' seconds,
    # set it to 0 to disable redrawing completely. [ default = 1 ]
    #
    # dialog-refresh 5; # 5 seconds

    # You might disable one of drbdadm's sanity check.
    # disable-ip-verification;
    # }

    #
    # this need not be r#, you may use phony resource names,
    # like "resource web" or "resource mail", too
    #

    resource r0 {

    # transfer protocol to use.
    # C: write IO is reported as completed, if we know it has
    # reached _both_ local and remote DISK.
    # * for critical transactional data.
    # * for most cases.
    # B: write IO is reported as completed, if it has reached
    # local DISK and remote buffer cache.
    # A: write IO is reported as completed, if it has reached
    # local DISK and local tcp send buffer. (see also sndbuf-size)
    # * for high latency networks
    #
    #**********
    # uhm, benchmarks have shown that C is actually better than B.
    # this note shall disappear, when we are convinced that B is
    # the right choice "for most cases".
    # Until then, always use C unless you have a reason not to.
    # --lge
    #**********
    #
    protocol C;

    # what should be done in case the cluster starts up in
    # degraded mode, but knows it has inconsistent data.
    incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f";

    startup {
    # Wait for connection timeout.
    # The init script blocks the boot process until the resources
    # are connected. This is so when the cluster manager starts later,
    # it does not see a resource with internal split-brain.
    # In case you want to limit the wait time, do it here.
    # Default is 0, which means unlimited. Unit is seconds.
    #
    # wfc-timeout 0;

    # Wait for connection timeout if this node was a degraded cluster.
    # In case a degraded cluster (= cluster with only one node left)
    # is rebooted, this timeout value is used.
    #
    degr-wfc-timeout 120; # 2 minutes.
    }

    disk {
    # if the lower level device reports io-error you have the choice of
    # "pass_on" -> Report the io-error to the upper layers.
    # Primary -> report it to the mounted file system.
    # Secondary -> ignore it.
    # "panic" -> The node leaves the cluster by doing a kernel panic.
    # "detach" -> The node drops its backing storage device, and
    # continues in disk less mode.
    #
    on-io-error panic;

    # In case you only want to use a fraction of the available space
    # you might use the "size" option here.
    #
    # size 10G;
    }

    net {
    # this is the size of the tcp socket send buffer
    # increase it _carefully_ if you want to use protocol A over a
    # high latency network with reasonable write throughput.
    # defaults to 2*65535; you might try even 1M, but if your kernel or
    # network driver chokes on that, you have been warned.
    # sndbuf-size 512k;

    # timeout 60; # 6 seconds (unit = 0.1 seconds)
    # connect-int 10; # 10 seconds (unit = 1 second)
    # ping-int 10; # 10 seconds (unit = 1 second)

    # Maximal number of requests (4K) to be allocated by DRBD.
    # The minimum is hardcoded to 32 (=128 kByte).
    # For high performance installations it might help if you
    # increase that number. These buffers are used to hold
    # datablocks while they are written to disk.
    #
    # max-buffers 2048;

    # When the number of outstanding requests on a standby (secondary)
    # node exceeds unplug-watermark, we start to kick the backing device
    # to start its request processing. This is an advanced tuning
    # parameter to get more performance out of capable storage controlers.
    # Some controlers like to be kicked often, other controlers
    # deliver better performance when they are kicked less frequently.
    # Set it to the value of max-buffers to get the least possible
    # number of run_task_queue_disk() / q->unplug_fn(q) calls.
    #
    # unplug-watermark 128;


    # The highest number of data blocks between two write barriers.
    # If you set this < 10 you might decrease your performance.
    # max-epoch-size 2048;

    # if some block send times out this many times, the peer is
    # considered dead, even if it still answers ping requests.
    # ko-count 4;

    # if the connection to the peer is lost you have the choice of
    # "reconnect" -> Try to reconnect (AKA WFConnection state)
    # "stand_alone" -> Do not reconnect (AKA StandAlone state)
    # "freeze_io" -> Try to reconnect but freeze all IO until
    # the connection is established again.
    on-disconnect stand_alone;

    }

    syncer {
    # Limit the bandwith used by the resynchronisation process.
    # default unit is kByte/sec; optional suffixes K,M are allowed.
    #
    # Even though this is a network setting, the units are based
    # on _byte_ (octet for our french friends) not bit.
    # We are storage guys.
    #
    # Note that on 100Mbit ethernet, you cannot expect more than
    # 12.5 MByte total transfer rate.
    # Consider using GigaBit Ethernet.
    #
    rate 100M;

    # All devices in one group are resynchronized parallel.
    # Resychronisation of groups is serialized in ascending order.
    # Put DRBD resources which are on different physical disks in one group.
    # Put DRBD resources on one physical disk in different groups.
    #
    group 1;

    # Configures the size of the active set. Each extent is 4M,
    # 257 Extents ~> 1GB active set size. In case your syncer
    # runs @ 10MB/sec, all resync after a primary's crash will last
    # 1GB / ( 10MB/sec ) ~ 102 seconds ~ One Minute and 42 Seconds.
    # BTW, the hash algorithm works best if the number of al-extents
    # is prime. (To test the worst case performace use a power of 2)
    al-extents 257;
    }

    on Servidor1 {
    device /dev/drbd0;
    disk /dev/sda3;
    address 10.0.0.1:7788;
    meta-disk internal;

    # meta-disk is either 'internal' or '/dev/ice/name [idx]'
    #
    # You can use a single block device to store meta-data
    # of multiple DRBD's.
    # E.g. use meta-disk /dev/hde6[0]; and meta-disk /dev/hde6[1];
    # for two different resources. In this case the meta-disk
    # would need to be at least 256 MB in size.
    #
    # 'internal' means, that the last 128 MB of the lower device
    # are used to store the meta-data.
    # You must not give an index with 'internal'.
    }

    on Servidor2 {
    device /dev/drbd0;
    disk /dev/sda3;
    address 10.0.0.2:7788;
    meta-disk internal;
    }
    }

    ha.cf:
    debugfile /var/log/ha-debug
    logfile /var/log/ha-log
    logfacility local0
    keepalive 1
    deadtime 10
    warntime 5
    initdead 30
    udpport 694
    bcast eth2 # Linux
    auto_failback on
    node Servidor1
    node Servidor2
    apiauth ipfail gid=haclient uid=hacluster
    # respwan hacluster /usr/lib/heartbeat/ipfail
    auto_failback off

    haresources
    Servidor1 192.168.1.70 drbddisk Filesystem::/dev/drbd0::/mnt/tudo::ext3 smb
    Última edição por denysiacanga; 11-04-2007 às 12:30. Razão: alteração no drbd.conf

  6. #6

    Padrão

    legal denys, fica ae para o pessoal que tiver o mesmo problema, só dar um search.

  7. #7

    Question Outro probleminha com as eths

    Olá pessoal...

    Estou utilizando a eth2 para o heartbeat, e para disponibilizar as informações para as estações windows através do samba... (vide arquivos de configuração postandos anteriormente).

    O sistema está funcionando perfeitamente... (digo, quase) Todos os serviços são carregados e o servidor2 continua tocando a barca...

    O problema é o seguinte... se eu por um acaso desconectar o cabo de rede do servidor2 (com os dois servidores em pleno funcionamento), ele não vai conseguir identificar que o problema gerado ocorreu no servidor2, e o heartbeat vai desarmar o servidor1 (que está funcionando corretamente) e carregar o servidor2 (que deixou de se comunicar com a rede), parando o funcionamento de ambos os servidores. O servidor1 desarmado pelo heartbeat e o servidor2 por não conseguir se comunicar com a rede (afinal a eth2 parou de funcionar)...

    Alguém tem alguma idéia?