|
|
|
|
||||||
| comp.mail.imap Discussion of IMAP-based mail systems. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
We're running UW 2004g on a Centos 2.6.9-34.0.2.ELsmp system, about
2500 users on the system. We use restrictive mailspools, thus mlock for locking. Occasionally (every couple of days) we see an imap that's stuck in locking. The grandparent is in env_unix.c dotlock_lock, after EACCES. The grandparent has forked the parent, and the parent has forked the child, which execed mlock. See (1) for the processes. The grandparent is stuck in a mutex. gdb unfortunately doesn't have anything interesting as to where it is at. See (2) for the lack of details. The parent process is Z, wchan of exit, which implies it exited and somebody needs to wait() on it. The mlock process is running and in a read on the communication pipe. I'm interpreting that as meaning it got the lock, told the grandparent OK (+) and is waiting for the grandparent to do the work and then it can relinquish the lock. Anybody ever seen anything like this? I'm inclined to think it's a kernel bug but I wanted to throw it against the imap newsgroup to see if anything stuck. Nik Conwell Boston University nik@bu.edu (1) The processes look like this: [grandparent] 4 S foobar 29199 23953 0 77 0 - 1307 322564 10:55 ? 00:00:00 /usr/sbin/imapd [parent] 1 Z foobar 29202 29199 0 77 0 - 0 exit 10:55 ? 00:00:00 [imapd] <defunct> [child] 0 S foobar 29203 1 0 79 0 - 370 pipe_w 10:55 ? 00:00:00 /usr/sbin/mlock 4 /mailspool/25/foobar (2) gdb of grandparent process. gdb /usr/sbin/imapd 29199 Attaching to program: /usr/sbin/imapd, process 29199 [...] Reading symbols from /usr/lib/libc-client.so.2004g...Reading symbols from /usr/lib/debug/usr/lib/libc-client.so.2004g.debug...done. done. Loaded symbols for /usr/lib/libc-client.so.2004g [...] 0x001117a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) where #0 0x001117a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x0050469e in __lll_mutex_lock_wait () from /lib/tls/libc.so.6 #2 0x00496aef in _L_mutex_lock_10230 () from /lib/tls/libc.so.6 #3 0x00000000 in ?? () |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Thu, 24 Aug 2006, Nik Conwell wrote:
> The parent process is Z, wchan of exit, which implies it exited and > somebody needs to wait() on it. That's strange, because the grandparent supposedly already reaped it (the call to grim_pid_reap()). In fact, it reaps it before reading the data from the pipe to the child. Offhand, it looks like this is what is happening: The grandparent is waiting for the parent to terminate, and won't read from the child's pipe until the wait happens. For some reason, rather than the reap happening, it's stuck in some internal C library mutex. The parent is terminated and waiting to be reaped. The child is waiting for the grandparent to read from the pipe. If you can figure out what that mutex is, and why the grandparent is stuck in it instead of reaping the parent, you'll have the key to the entire puzzle. -- Mark -- http://panda.com/mrc Democracy is two wolves and a sheep deciding what to eat for lunch. Liberty is a well-armed sheep contesting the vote. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Mark Crispin wrote:
> The grandparent is waiting for the parent to terminate, and won't read > from the child's pipe until the wait happens. For some reason, rather > than the reap happening, it's stuck in some internal C library mutex. Thanks for taking a look. I'll throw some debug syslogs in there to figure out where it's getting stuck. It's annoying gdb doesn't show where it's at. IIRC strace showed it in mutex or futex or something. (I've been hanging around for the past couple of days waiting for another one to happen.) |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On Thu, 24 Aug 2006, Nik Conwell wrote:
> Thanks for taking a look. I'll throw some debug syslogs in there to > figure out where it's getting stuck. It's annoying gdb doesn't show > where it's at. IIRC strace showed it in mutex or futex or something. > (I've been hanging around for the past couple of days waiting for > another one to happen.) The code is in dotlock_lock() in env_unix.c. It's basically doing: ...create pipes... if (!(pid = fork ())) { /* create child */ if (!fork ()) { /* in child, create grandchild */ ...stuff to run mlock... /* in grandchild */ } _exit (1); /* child exits immediately */ } else if (pid > 0) { /* in parent, was child created? */ waitpid (pid,0,0); /* reap the child */ ...read pipe stuff from grandchild... } The purpose of having a child create a grandchild, rather than just running mlock in the child, is zombie avoidance. The direct child is immediately reaped, and the grandchild consequently gets inherited by init which reaps any zombies that it finds that it owns. The grandchild has the other end of the pipe, and I/O to it is under a select() timeout in both cases. So, sooner or later, either the pipe data is sent or eventually both sides give up. Either way, init reaps the grandchild. Anyway, that's how it's supposed to work on paper. The fact that the child became a zombie indicates that the reap never was done. I just realized that there is another possibility. We already discussed if the waitpid() somehow is hanging in that mutex. The other possibility is if the fork() to create a child returned -1 to the parent, but actually did create the child (and hence the grandchild). In that case, the parent would treat it as a lock failure and block again (which may be the mutex that you are seeing). If this is happening, I'd assert that it's either a kernel or documentation bug. The man page for fork() says "On failure, a -1 will be returned in the parent's context, no child process will be created..." There's no mention of an error return from fork() that creates a child. -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate. Si vis pacem, para bellum. |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
Hi. Sorry for the delay for a response - other priorities.
Bypassing the mutex, I now have a stack trace. Looks like we're getting KOD while we were in something in libc holding a mutex. The following example is a malloc lock (main_arena): (gdb) where #0 0x004a7fde in free () from /lib/tls/libc.so.6 #1 0x004c1f55 in tzset_internal () from /lib/tls/libc.so.6 #2 0x004c29ae in tzset () from /lib/tls/libc.so.6 #3 0x004c754e in strftime_l () from /lib/tls/libc.so.6 #4 0x0050992b in vsyslog () from /lib/tls/libc.so.6 #5 0x00509e9f in syslog () from /lib/tls/libc.so.6 #6 0x0804b56c in kodint () at imapd.c:1654 #7 <signal handler called> #8 0x004a83f3 in _int_malloc () from /lib/tls/libc.so.6 #9 0x004aa0b1 in malloc () from /lib/tls/libc.so.6 #10 0x004a0013 in open_memstream () from /lib/tls/libc.so.6 #11 0x005098ae in vsyslog () from /lib/tls/libc.so.6 #12 0x00509e9f in syslog () from /lib/tls/libc.so.6 #13 0x080525ae in main (argc=5, argv=0xbffffe14) at imapd.c:1363 #14 0x0045ae23 in __libc_start_main () from /lib/tls/libc.so.6 #15 0x0804aa01 in _start () The imapd.c:1363 is some extra syslog stuff I've added. The logging I added is some timing info on the executed command: syslog(LOG_INFO,"elapsed: %d.%06d; %s",seconds,microseconds,cmd); I don't think it changes the spirit of the problem, but it will increase the probability. Here's an example when we were holding the tzset_lock mutex: #0 0x004c2a35 in __tz_convert () from /lib/tls/libc.so.6 #1 0x004c0c5d in localtime_r () from /lib/tls/libc.so.6 #2 0x005098fc in vsyslog () from /lib/tls/libc.so.6 #3 0x00509e9f in syslog () from /lib/tls/libc.so.6 #4 0x0804b56c in kodint () at imapd.c:1654 #5 <signal handler called> #6 0x004c3a44 in __tzfile_compute () from /lib/tls/libc.so.6 #7 0x004c2b37 in __tz_convert () from /lib/tls/libc.so.6 #8 0x004c0ca0 in localtime () from /lib/tls/libc.so.6 #9 0x0014a901 in mail_parse_date (elt=0x809b9c8, s=0xbfffd5f8 "") at mail.c:2948 #10 0x00182225 in unix_parse (stream=0x807c938, lock=0xbfffe3b0, op=1) at unix.c:1343 #11 0x001839d0 in unix_open (stream=0x807c938) at unix.c:504 #12 0x00154eb4 in mail_open (stream=0x807c938, name=0x8068cf0 "INBOX", options=0) at mail.c:1223 #13 0x080531a0 in main (argc=5, argv=0xbffffe14) at imapd.c:938 #14 0x0045ae23 in __libc_start_main () from /lib/tls/libc.so.6 #15 0x0804aa01 in _start () We also have a lot of stupid clients (webmail, outlook express, etc.) that insist on making multiple connections to the same mailbox. We're pretty tied to mbox for now. |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On Fri, 15 Sep 2006, Nik Conwell wrote:
> Bypassing the mutex, I now have a stack trace. Looks like we're getting > KOD while we were in something in libc holding a mutex. If that's the cause of the problem, then the patch below should remedy the issue. Basically, it instructs imapd not to respond to KOD events that occur while the mlock interchange is in progress. -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate. Si vis pacem, para bellum. *** env_unix.c.old 2006-08-31 13:37:32.000000000 -0700 --- env_unix.c 2006-09-15 10:00:55.000000000 -0700 *************** *** 1116,1121 **** --- 1116,1122 ---- if (fd >= 0) switch (errno) { case EACCES: /* protection failure? */ + MM_CRITICAL (NIL); /* go critical */ /* make command pipes */ if (!closedBox && !stat (LOCKPGM,&sb) && (pipe (pi) >= 0)) { if (pipe (po) >= 0) { *************** *** 1152,1157 **** --- 1153,1159 ---- base->pipei = pi[0]; base->pipeo = po[1]; /* close child's side of the pipes */ close (pi[1]); close (po[0]); + MM_NOCRITICAL (NIL);/* no longer critical */ return LONGT; } } *************** *** 1159,1164 **** --- 1161,1167 ---- } close (pi[0]); close (pi[1]); } + MM_NOCRITICAL (NIL); /* no longer critical */ /* find directory/file delimiter */ if (s = strrchr (base->lock,'/')) { *s = '\0'; /* tie off at directory */ |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
That addresses my initial problem, but from the two stack traces I got
above, the process receiving the KOD isn't in dotlock_lock(). One was doing a syslog() and one was parsing dates from the mailbox. Based on glibc's locking, it would seem that calling any glibc function from within a signal handler could involve deadlock. |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
On Fri, 15 Sep 2006, Nik Conwell wrote:
> That addresses my initial problem, but from the two stack traces I got > above, the process receiving the KOD isn't in dotlock_lock(). One was > doing a syslog() and one was parsing dates from the mailbox. > Based on glibc's locking, it would seem that calling any glibc function > from within a signal handler could involve deadlock. That's bad news. That essentially means that a signal handler that responds to any critical condition (autologout timer, KOD, hangup, termination) is precluded from doing much of anything, even if it has no intention of resuming from the signal. I guess that the authors of glibc had a good reason for not making glibc be reentrant, but in general it's not a good thing to do. -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate. Si vis pacem, para bellum. |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
Nik Conwell writes:
> That addresses my initial problem, but from the two stack traces I got > above, the process receiving the KOD isn't in dotlock_lock(). One was > doing a syslog() and one was parsing dates from the mailbox. > > Based on glibc's locking, it would seem that calling any glibc function > from within a signal handler could involve deadlock. Correct. The only functions you can invoke from a signal handler are kernel syscalls (man section 2). Unless explicitly stated otherwise, none of libc functions, from man section 3, can be safely called from a signal handler. Even malloc. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQBFCyHKx9p3GYHlUOIRAv88AJ0Qym2b7/yBZpHXFWp/r2ReuaAWhACfZ8b3 D7LnhkGkmp6T7SRSD5SfYmo= =j+ua -----END PGP SIGNATURE----- |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
Mark Crispin writes:
> On Fri, 15 Sep 2006, Nik Conwell wrote: >> That addresses my initial problem, but from the two stack traces I got >> above, the process receiving the KOD isn't in dotlock_lock(). One was >> doing a syslog() and one was parsing dates from the mailbox. >> Based on glibc's locking, it would seem that calling any glibc function >> from within a signal handler could involve deadlock. > > That's bad news. That essentially means that a signal handler that > responds to any critical condition (autologout timer, KOD, hangup, > termination) is precluded from doing much of anything, even if it has no > intention of resuming from the signal. > > I guess that the authors of glibc had a good reason for not making glibc > be reentrant, but in general it's not a good thing to do. Standard C library was not reentrant long before glibc came on the scene. I recall reading explicit warnings in AT&T SVR3 man pages that libc functions are not reentrant. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQBFCyJIx9p3GYHlUOIRAnjNAJ92N1YuZEy0AeOSoiAKKc hdtVTJBwCfYRTf UcEI2Cw3K2bm7k6P9y14cLQ= =MLjd -----END PGP SIGNATURE----- |
|
|
|
#11 |
|
Messages: n/a
Hébergeur: |
On Fri, 15 Sep 2006, Sam wrote:
> I recall reading explicit warnings in AT&T SVR3 man pages that libc functions > are not reentrant. The old AT&T documentation that I have simply warned about the signal handler trying to dismiss back to a libc function if the signal handler also called a libc function. It said nothing about a signal handler that has no intention of returning and instead will exit. That's the case here. The apparent purpose of the mutex is to protect the original libc call from ill effects on it caused by the reentered call. I assume that the choice for a mutex which waits (which would deadlock) as opposed to one that caused an abort() call was intentional. Since the signal handler has no intention of returning, is there a way to disable the mutex? That is, the signal handler wants to be treated more like a setjmp()/longjmp(). Otherwise, that precludes the signal handler from even logging that it was called. -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate. Si vis pacem, para bellum. |
|
|
|
#12 |
|
Messages: n/a
Hébergeur: |
Mark Crispin writes:
> On Fri, 15 Sep 2006, Sam wrote: >> I recall reading explicit warnings in AT&T SVR3 man pages that libc functions >> are not reentrant. > > The old AT&T documentation that I have simply warned about the signal > handler trying to dismiss back to a libc function if the signal handler > also called a libc function. It said nothing about a signal handler that > has no intention of returning and instead will exit. That's the case > here. exit() itself is a libc function that tries to flush any open files, before terminating the process. _exit(), I think, is a syscall that's safe to use in a signal handler. > > The apparent purpose of the mutex is to protect the original libc call > from ill effects on it caused by the reentered call. I assume that the > choice for a mutex which waits (which would deadlock) as opposed to one > that caused an abort() call was intentional. > > Since the signal handler has no intention of returning, is there a way to > disable the mutex? That is, the signal handler wants to be treated more > like a setjmp()/longjmp(). Otherwise, that precludes the signal handler > from even logging that it was called. There is no mutex. The internal data structures in libc simply are not re-enterable. In a pre-threaded world, most malloc implementations, for example, maintained somewhat involved strategies for recycling memory blocks; often managing multiple lists of memory blocks of different sizes, trying to optimize for O(n) performance. If you are interrupted in a middle of reshuffling internal memory pool lists, you just can't reenter malloc() even if you have no intention of returning from the signal handler, since the internal memory lists and pointers are likely to be in an inconsistent state. In a modern, threaded, world, most internal structures are protected by mutexes so that they're thread safe. But it's not just a single mutex protecting the C library. That would be a performance suicide, if only one thread can be running inside the C library, locking out all other threads even if they need to do something completely unrelated. Most implementations use granular mutexes that are simply not exposed to the user app. glibc -- and that I know -- use some gnu ld tricks to impose the overhead of mutexes only when the app is multithreaded and links against libpthread. Furthermore, many glibc functions use thread-local storage for temporary scratch space; as such glibc is thread-safe, but not reentrant-safe. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQBFCzJYx9p3GYHlUOIRAjSNAJ0V5sd/Z3wdrNOKdR24ya6u0cd6bgCfc+au jE3+ILuFxerJzNB3Bd5QbtE= =f+aA -----END PGP SIGNATURE----- |
|
|
|
#13 |
|
Messages: n/a
Hébergeur: |
On Fri, 15 Sep 2006, Sam wrote:
>> It said nothing about a signal handler that >> has no intention of returning and instead will exit. That's the case here. > exit() itself is a libc function that tries to flush any open files, before > terminating the process. _exit(), I think, is a syscall that's safe to use > in a signal handler. Of course I know that; my use of the verb "exit" instead of the function name "exit()" was intentional. Specifically: the signal handler has no intention of returning and instead will exit with _exit(). > In a pre-threaded world, most malloc implementations, for > example, maintained somewhat involved strategies for recycling memory blocks; > often managing multiple lists of memory blocks of different sizes, trying to > optimize for O(n) performance. If you are interrupted in a middle of > reshuffling internal memory pool lists, you just can't reenter malloc() even > if you have no intention of returning from the signal handler, since the > internal memory lists and pointers are likely to be in an inconsistent state. I know all this, too. It makes sense that the heap may be in an inconsistant state during manipulation and thus heap can't be reentered. That is why I block signals during my calls to heap routines such as malloc(). > glibc -- > and that I know -- use some gnu ld tricks to impose the overhead of mutexes > only when the app is multithreaded and links against libpthread. That doesn't explain why the original poster encountered the mutex. The server is not multithreaded and doesn't link with any thread library. My expectation would be that the signal handler was therefore at liberty to call libc functions, as long as it was reasonably careful. libc, too, ought to be a bit more thoughtful, especially when non-threaded. It should not be that much of an effort if libc make stdout calls and syslog() reentrant in a signal handler, even if only in a special non-buffered form. These are operations that a signal handler is likely to want to do, as in recording why it's about to commit suicide rather than just vanishing without a trace. Without that available, the only recourse is to have a external monitoring process that records the event instead of the process doing it itself. It also isn't as if this class of issue is new. The general problem with software interrupts has been known at least since the 1960s; and was solved quite well on ITS with PCLSR. Other systems didn't go that far, but still allowed "dangerous" calls to be made as long as the application was willing to abandon returning from the interrupt. -- Mark -- http://staff.washington.edu/mrc Science does not emerge from voting, party politics, or public debate. Si vis pacem, para bellum. |
|
|
|
#14 |
|
Messages: n/a
Hébergeur: |
Mark Crispin writes:
> server is not multithreaded and doesn't link with any thread library. My > expectation would be that the signal handler was therefore at liberty to > call libc functions, as long as it was reasonably careful. The posted backtrace shows that strftime() was getting invoked in the signal handler indirectly through syslog(). That's definitely not reentrant. strftime() itself calls tzset(), which calls free(), as the backtrace shows. We're definitely well into non-reenterable territory. I can understand the assumption that syslog() would be a syscall. But, it's not. Can't do that in a signal handler. You probably have some memory corruption happening here, so you cannot fully trust the backtrace that shows mutex functions on the stack. The Linux signal man page actually enumerates the functions that may be safely invoked from a signal handler, and refers to POSIX as the reference for the safe function list. So it looks like there's even a 2003 POSIX standard of functions that are guaranteed to be reenterable. Anything not on that list is off the table. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (GNU/Linux) iD8DBQBFC2hcx9p3GYHlUOIRAgS5AJ4qxPqQUXJhAT7ZLJ2VfI JdMm8gRACcCKlK 3HmA5/iJMJvqVfCdzgLFkjQ= =TUZS -----END PGP SIGNATURE----- |
|
|
|
#15 |
|
Messages: n/a
Hébergeur: |
On Fri, 15 Sep 2006, Sam wrote:
> I can understand the assumption that syslog() would be a syscall. But, it's > not. Can't do that in a signal handler. Of course syslog() is not a system call. However, someone should take the effort to make syslog() work in a signal handler (ditto for stdout operations) anyway, at least in a signal handler that is not going to return. A Google search shows that other developers have made the same complaint; what has worked in the past doesn't work in glibc. POSIX aside, a lot of things *did* work in SVR4 and BSD and now have suddenly broken on Linux. Typical example: A daemon gets a SIGHUP. It wants to syslog() that fact, and then exit; as opposed to vanishing without a trace. It even knows that whatever it was doing, it wasn't critical. But, from what you say, it can't even do a longjmp() to get out of whatever it might have been doing in the bowels of the C library, or at least enough to do a syslog() and exit. It isn't as if solving this in glibc is technically impossible. It isn't. At most it is moderately challenging due to the complexity added by threading. Perhaps it may be alright to solve it only for non-threaded applications, since a threaded application is (1) likely to have the necessary control infrastructure to provide an alternative and (2) is not likely to respond to a signal by writing a log message and exiting. One way would be to have a mutex when non-threaded, but one which fails instead of blocking. If the mutex fails, then escape to a special mode that uses its own context and structures and dispenses with such luxuries as tzset(). This would be done only in certain well-defined cases (the ones which application developers have been complaining about!). A more Linuxish way would be to add new calls (such as syslog_r()) for reentrant versions, and thus forcing developers to have additional conditionals to use those calls instead of the normal ones. NOT my preferred solution, but perhaps more palatable to the glibc guys. -- Mark -- http://panda.com/mrc Democracy is two wolves and a sheep deciding what to eat for lunch. Liberty is a well-armed sheep contesting the vote. |
|
|
|
#16 |
|
Messages: n/a
Hébergeur: |
Mark Crispin wrote: > One way would be to have a mutex when non-threaded, but one which fails > instead of blocking. If the mutex fails, then escape to a special mode > that uses its own context and structures and dispenses with such luxuries > as tzset(). This would be done only in certain well-defined cases (the > ones which application developers have been complaining about!). glibc could possibly use other types of mutexes - recursive (blocks other threads but lets the current thread succeed) or error checking (return EDEADLK instead of blocking), but as far as I can tell (not far) it's coded to block. Who knows what would break if I changed that... Back to the IMAP server, I was thinking about having the USR2 handler just set a global KOD variable and then resume, and then have slurp() check for the global KOD. I think I'd have to also do siginterrupt(SIGUSR2) so that the fgets() will abort with EAGAIN. Ideally I'd do that for all signals, but in practice it's just been SIGUSR2. -nik |
|
|
|
#17 |
|
Messages: n/a
Hébergeur: |
On Mon, 18 Sep 2006, Nik Conwell wrote:
> glibc could possibly use other types of mutexes - recursive (blocks > other threads but lets the current thread succeed) or error checking > (return EDEADLK instead of blocking), but as far as I can tell (not > far) it's coded to block. Who knows what would break if I changed > that... I don't think that there is much hope of winning by changing the mutexes. It does need to work that way for threading. In 2006a I changed things to make sure that it never does a syslog or stdout I/O if the signal will return. > Back to the IMAP server, I was thinking about having the USR2 handler > just set a global KOD variable and then resume, and then have slurp() > check for the global KOD. I think I'd have to also do > siginterrupt(SIGUSR2) so that the fgets() will abort with EAGAIN. > Ideally I'd do that for all signals, but in practice it's just been > SIGUSR2. Let me know how it goes. One potential problem is that this may cause KOD not to respond rapidly enough if it is doing something potentially time-consuming (such as a large search) but not "critical" (thus can be aborted). The other solution is just to stop using traditional UNIX mailbox format, since it's the only one that needs/uses KOD. -- Mark -- http://panda.com/mrc Democracy is two wolves and a sheep deciding what to eat for lunch. Liberty is a well-armed sheep contesting the vote. |
|
|
|
#18 |
|
Messages: n/a
Hébergeur: |
Mark Crispin wrote: > On Mon, 18 Sep 2006, Nik Conwell wrote: > > Back to the IMAP server, I was thinking about having the USR2 handler > > just set a global KOD variable and then resume, and then have slurp() > > check for the global KOD. I think I'd have to also do > > siginterrupt(SIGUSR2) so that the fgets() will abort with EAGAIN. > > Ideally I'd do that for all signals, but in practice it's just been > > SIGUSR2. > > Let me know how it goes. So far so good. Works in a simple engineered test but I'll have to see how it shakes out in production for a couple of days. I can't test SSL but we're not using that on the Linux servers yet. Want me to e-mail you the patch? (will come from nik@bu.edu) > One potential problem is that this may cause KOD not to respond rapidly > enough if it is doing something potentially time-consuming (such as a > large search) but not "critical" (thus can be aborted). > > The other solution is just to stop using traditional UNIX mailbox format, > since it's the only one that needs/uses KOD. Unfortunately not possible as we still have legacy stuff that expects mbox. |
|
![]() |
| Outils de la discussion | |
|
|