Ola a todos.
Aki estou enfrentando um problema estranho o drbd + heartbeat esta funcionando ok. quando o primario cai o secundario asume blz, do que ao passar dos dias o servidor primario não cai, simplesmente o heartbeat para de funcionar o o servidor primario assum, ai então andei olhando o log o servidor primario esta assim:
Jul 3 05:31:58 cpd020 kernel: martian source 192.168.1.27 from 192.168.1.27, on dev eth1
Jul 3 05:31:58 cpd020 kernel: ll header: ff:ff:ff:ff:ff:ff:00:08:54:1a:ef:3c:08:06
Jul 3 05:31:58 cpd020 heartbeat[20857]: WARN: node cpd020.agrovale: is dead
Jul 3 05:31:58 cpd020 heartbeat[20857]: ERROR: No local heartbeat. Forcing shutdown.
Jul 3 05:31:58 cpd020 heartbeat[20857]: WARN: Late heartbeat: Node cpd021.agrovale: interval 4650 ms
Jul 3 05:31:58 cpd020 heartbeat[20857]: info: hb_signal_giveup_resources(): current status: active
Jul 3 05:31:58 cpd020 heartbeat[20857]: info: Heartbeat shutdown in progress. (20857)
Jul 3 05:31:58 cpd020 heartbeat[20857]: WARN: node cpd020.agrovale: is dead
Jul 3 05:31:58 cpd020 heartbeat[20857]: ERROR: No local heartbeat. Forcing shutdown.
Jul 3 05:31:58 cpd020 heartbeat[24948]: info: Giving up all HA resources.
Jul 3 05:32:00 cpd020 kernel: martian source 192.168.1.27 from 192.168.1.27, on dev eth1
Jul 3 05:31:59 cpd020 heartbeat: info: Releasing resource group: cpd020.agrovale 192.168.1.27 datadisk smb postgresql
Jul 3 05:32:03 cpd020 heartbeat[20853]: info: heartbeat: version 1.0.3
Jul 3 05:32:01 cpd020 heartbeat[20857]: WARN: node cpd020.agrovale: is dead
Jul 3 05:32:03 cpd020 heartbeat[20857]: ERROR: No local heartbeat. Forcing shutdown.
Jul 3 05:32:03 cpd020 heartbeat[20857]: WARN: node cpd020.agrovale: is dead
Jul 3 05:32:03 cpd020 heartbeat[20857]: ERROR: No local heartbeat. Forcing shutdown.
Jul 3 05:32:03 cpd020 heartbeat[20857]: WARN: node cpd021.agrovale: is dead
Jul 3 05:32:03 cpd020 kernel: ll header: ff:ff:ff:ff:ff:ff:00:08:54:1a:ef:3c:08:06
Jul 3 05:32:03 cpd020 kernel: martian source 192.168.1.27 from 192.168.1.27, on dev eth1
Jul 3 05:32:03 cpd020 kernel: ll header: ff:ff:ff:ff:ff:ff:00:08:54:1a:ef:3c:08:06
Jul 3 05:32:03 cpd020 heartbeat[20857]: WARN: No STONITH device configured.
Jul 3 05:32:03 cpd020 heartbeat[20857]: WARN: Shared disks are not protected.
Jul 3 05:32:03 cpd020 heartbeat[20857]: info: Resource takeover cancelled - shutdown in progress.
Jul 3 05:32:03 cpd020 heartbeat[20857]: info: Link cpd021.agrovale:eth0 dead.
Jul 3 05:32:03 cpd020 heartbeat[20857]: WARN: Cluster node cpd021.agrovalereturning after partition.
Jul 3 05:32:03 cpd020 heartbeat[20857]: WARN: Deadtime value may be too small.
Jul 3 05:32:03 cpd020 heartbeat[20857]: info: See documentation for information on tuning deadtime.
Jul 3 05:32:03 cpd020 heartbeat: info: Running /etc/init.d/postgresql stop
Jul 3 05:32:03 cpd020 heartbeat[20857]: info: Link cpd021.agrovale:eth0 up.
Jul 3 05:32:03 cpd020 heartbeat[20857]: WARN: Late heartbeat: Node cpd021.agrovale: interval 5160 ms
Alguem saberia me informar o que podera esta acontecendo.
Desde ja agradeço