|
|
|
|
||||||
| comp.mail.imap Discussion of IMAP-based mail systems. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
We're running UW IMAP on some old AIX 4.3 boxes. Every now and then
when a server becomes extremely overloaded (2-3 minutes to just read (cat) a 30M MBOX from disk, yeck!) we get a bunch of IMAP processes stuck forever. dbx attaching shows them stuck in env_unix.c dotlock_lock() at line 1065, doing the read() from the mlock er program, waiting for the er to say if it got the lock (+ or -). lsof shows the communication pipes to be empty; there's nothing pending to be read. The mlock process isn't around. The IMAP server process ends up waiting forever. Shouldn't there be a select (or alarm timeout) before the read() to protect against cases where the mlock er doesn't complete successfully? The timeout should match what mlock is using... Has anyone seen anything like this before? Thanks for any info. -nik Nik Conwell Boston University nik@bu.edu |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Nik Conwell writes:
> Shouldn't there be a select (or alarm timeout) before the read() to > protect against cases where the mlock er doesn't complete > successfully? No. If the other end of the pipe is closed, read() will immediately return with an end-of-file indication. Your mlock process is probably still there, you just have to look harder. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) iD8DBQBEPC80x9p3GYHlUOIRAhZeAJ9GG/oMRH5VroqyOytNJBKB7UmpGwCfeaP+ Wi4OK18OGYQU6ch49Iu9TO0= =7RIV -----END PGP SIGNATURE----- |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
I would have expected, if the mlock process died, that the imapd process
would have gotten an error from the read() on the pipe. But then again, you said that you're using AIX 4.3. It is no small joy to me that our last AIX systems have just been retired... :-) Does the attached timeout code ? 1130,1131c1130,1136 < else if (j > 0) { /* reap child; grandchild now owned by init */ < grim_pid_reap (j,NIL); --- > else if (j > 0) { /* parent process */ > fd_set rfd; > struct timeval tmo; > FD_ZERO (&rfd); > FD_SET (pi[0],&rfd); > tmo.tv_sec = locktimeout * 60; > grim_pid_reap (j,NIL);/* reap child; grandchild now owned by init */ 1133c1138,1139 < if ((read (pi[0],tmp,1) == 1) && (tmp[0] == '+')) { --- > if (select (pi[0]+1,&rfd,0,0,&tmo) && > (read (pi[0],tmp,1) == 1) && (tmp[0] == '+')) { -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate. Si vis pacem, para bellum. |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
Thanks. I'll give it a spin. I can't recreate the problem at will so
it might take a while for it to hit again. I didn't see any old mlock processes out there and lsof didn't show any other process having the FIFO matching the IMAP server process. Thanks again for your Mark. What OS did you settle on after AIX retired? I'm about to prototype a replacement Intel Linux box. -nik |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On Wed, 12 Apr 2006, Nik Conwell wrote:
> Thanks again for your Mark. What OS did you settle on after AIX > retired? I'm about to prototype a replacement Intel Linux box. We made the safe choice: RHE Linux on Intel hardware. Personally, I prefer BSD, but given how fragmented the tiny BSD community remains it is no wonder that the organization went with Linux as a more mainstream solution. Linux is a good choice, but I wish that they had greater quality control and compatibility regression testing. A couple of years ago, they broke the flock() API, and did not respond in a good way when that breakage was discovered. I think that the problem is fixed now, but it continues to affect implementors today since the broken kernel was widely distributed and still runs in many places. imap-2004d and later have the necessary workaround. See https://bugzilla.redhat.com/bugzilla....cgi?id=123415 for details. As infuriating as that flock() bug was, it pales compared to the issues in SVR4 (such as Solaris, AIX, HP-UX, etc.). I'm always discovering some new and non-obvious quirk in SVR4 that requires reprogramming an application to work around the problem. So it does not surprise me to learn that a program can be blocked in a read() on a closed pipe on SVR4 instead of getting an error. -- Mark -- http://panda.com/mrc Democracy is two wolves and a sheep deciding what to eat for lunch. Liberty is a well-armed sheep contesting the vote. |
|
![]() |
| Outils de la discussion | |
|
|