|
|
|
|
||||||
| comp.info.servers.unix Web servers for UNIX platforms. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hey everyone. We have been running Apache 2.2 as a reverse proxy cache
server for some time, and in general it performs great. We have one nagging problem however. Currently we have a cluster of 70+ back-end web servers that are BalancerMember's. If we apachectl stop on a backend web, we are fine. If however the network dies because of a server crash, arp issues... whatever, the front-end cache hangs until it comes back. This causes all of the other back-end web servers web requests to also hang... and we get a snowball effect that requires a total restart of both the cache and the back-end webs to clear things up. Obviously this is not ideal. This all seems like a TCP timeout issue of some kind. I was perplexed to discover that doing a netstat -anp|grep [backend that died ip] Only showed 8 connections in the SYN_SENT state. I also notice that the Apache balancer-manager scoreboard marks the web server as ERR (appropriately)... yet we are still hanging on something. We have eliminated the cluster fs as the culprit, as network issues on a non-back-end web server do not cause a problem. We would be perfectly happy if a web server not responding for 5 seconds meant it was marked as an error cluster and it didn't try to connect for quite some time. So far we have specified each balancer member to have: retry=120 max=40 We do this because we have some web pages that can legitimately take up to 60 seconds to finish rendering. Is there some kind of timeout we can configure at the OS or apache level that would prevent waiting on a host that has gone completely dark for 5 seconds? Thanks in advance for any advice people have. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Aug 5, 8:55 pm, IRey <vargam...@gmail.com> wrote:
> Hey everyone. We have been running Apache 2.2 as a reverse proxy cache > server for some time, and in general it performs great. We have one > nagging problem however. > > Currently we have a cluster of 70+ back-end web servers that are > BalancerMember's. If we apachectl stop on a backend web, we are fine. > If however the network dies because of a server crash, arp issues... > whatever, the front-end cache hangs until it comes back. This causes > all of the other back-end web servers web requests to also hang... and > we get a snowball effect that requires a total restart of both the > cache and the back-end webs to clear things up. Obviously this is not > ideal. > > This all seems like a TCP timeout issue of some kind. I was perplexed > to discover that doing a netstat -anp|grep [backend that died ip] > Only showed 8 connections in the SYN_SENT state. I also notice that > the Apache balancer-manager scoreboard marks the web server as ERR > (appropriately)... yet we are still hanging on something. > > We have eliminated the cluster fs as the culprit, as network issues on > a non-back-end web server do not cause a problem. > > We would be perfectly happy if a web server not responding for 5 > seconds meant it was marked as an error cluster and it didn't try to > connect for quite some time. So far we have specified each balancer > member to have: > retry=120 max=40 > > We do this because we have some web pages that can legitimately take > up to 60 seconds to finish rendering. Is there some kind of timeout we > can configure at the OS or apache level that would prevent waiting on > a host that has gone completely dark for 5 seconds? > > Thanks in advance for any advice people have. From what you've said, this only appears to be an issue where you get a major loss of connectivity. And then the problem is recovery time. If this is impinging on your traffic then really you should be looking at why you having major network faults so frequently. While it wouldn't solve the root cause, you might consider using bonded ethernet connections (or netRAIN, or whatever your OS provides) at either end with different cables/hubs/switches inbetween. AFAIK there's not much that can be changed in the config for the handshake, although setting a lower timeout (core config option) may allow the system to recover a bit faster from transactions that were already initiated. Decreasing the keepalivetimeout and maxkeepaliverequests at the webserver end may improve recovery, but you could lose performance as a result. C. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
On Aug 5, 8:55 pm, IRey <vargam...@gmail.com> wrote:
> Hey everyone. We have been running Apache 2.2 as a reverse proxy cache > server for some time, and in general it performs great. We have one > nagging problem however. > > Currently we have a cluster of 70+ back-end web servers that are > BalancerMember's. If we apachectl stop on a backend web, we are fine. > If however the network dies because of a server crash, arp issues... > whatever, the front-end cache hangs until it comes back. This causes > all of the other back-end web servers web requests to also hang... and > we get a snowball effect that requires a total restart of both the > cache and the back-end webs to clear things up. Obviously this is not > ideal. > > This all seems like a TCP timeout issue of some kind. I was perplexed > to discover that doing a netstat -anp|grep [backend that died ip] > Only showed 8 connections in the SYN_SENT state. I also notice that > the Apache balancer-manager scoreboard marks the web server as ERR > (appropriately)... yet we are still hanging on something. > > We have eliminated the cluster fs as the culprit, as network issues on > a non-back-end web server do not cause a problem. > > We would be perfectly happy if a web server not responding for 5 > seconds meant it was marked as an error cluster and it didn't try to > connect for quite some time. So far we have specified each balancer > member to have: > retry=120 max=40 > > We do this because we have some web pages that can legitimately take > up to 60 seconds to finish rendering. Is there some kind of timeout we > can configure at the OS or apache level that would prevent waiting on > a host that has gone completely dark for 5 seconds? > > Thanks in advance for any advice people have. From what you've said, this only appears to be an issue where you get a major loss of connectivity. And then the problem is recovery time. If this is impinging on your traffic then really you should be looking at why you having major network faults so frequently. While it wouldn't solve the root cause, you might consider using bonded ethernet connections (or netRAIN, or whatever your OS provides) at either end with different cables/hubs/switches inbetween. AFAIK there's not much that can be changed in the config for the handshake, although setting a lower timeout (core config option) may allow the system to recover a bit faster from transactions that were already initiated. Decreasing the keepalivetimeout and maxkeepaliverequests at the webserver end may improve recovery, but you could lose performance as a result. C. |
|
![]() |
| Outils de la discussion | |
|
|