SA Bugzilla – Bug 5166
mass-check: Too many open files
Last modified: 2006-11-28 11:47:14 UTC
Ok, this has been causing me trouble for weeks and I think I've narrowed it down. The problem is that my mass-checks complain about too many open files, fairly loudly. I get lots of lines like this: bayes: cannot write to /home/duncf/svn/spamassassin-nightly/masses/spamassassin/bayes_journal, bayes db update ignored: Too many open files util: secure_tmpfile failed to create file '/tmp/.spamassassin5912T01JRetmp': Too many open files etc. The mass-check still completes, but only a small number of messages actually get checked. By searching through my nightly logs uploads in spamassassin.zones.apache.org:/home/automc/corpus/html, I've found that this occurred between revisions r440298 and r440658. Looking through these revisions, revision 440575 looks suspicious. "use temp files for mime parts that we're unlikely to use during processing, and fall-back to memory-only if it's not going to work out. also, use a scalar to hold the decoded information." I believe this causes us to open new filehandles to store this data, but it never gets closed, causing mass-check to open too many files, hence the messages. I'm not nearly familiar enough with this code, so Theo, can you take a look? I'll see what I can do and attach a patch if I'm successful.
I think it turned out to be simpler than I thought. I'll let this run tonight and make sure it's working right. Theo, if you could take a look at my commit just to make sure I'm not way off the mark, that'd be great. duncf@gold:~/svn/spamassassin$ svn commit Sending lib/Mail/SpamAssassin/Message.pm Transmitting file data . Committed revision 471136.
Nope. still broken :-(
Hrm. When the Message::Node objects go away, the file references should go away, thereby closing the files. There shouldn't need to be an explicit 'close'. For a temporary work-around, you can use --restart.
have you tried using "lsof -p PID" to find out what fds are open, halfway through the mass-check? (easy option: fix that "cannot write" message to call "system lsof -p $$" ;)
There's a whole bunch of temp files as created by M:SA:Util::secure_tmpfile(). The file is deleted, so there's (AFAIK) no way of reading their contents. Since I'm the only one that seems to experience this, here's my perl info: duncf@gold:~$ perl -V Summary of my perl5 (revision 5 version 8 subversion 8) configuration: Platform: osname=linux, osvers=2.6.15-1-686, archname=i486-linux-gnu-thread-multi uname='linux ulises 2.6.15-1-686 #2 mon mar 6 15:27:08 utc 2006 i686 gnulinux ' config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des' hint=recommended, useposix=true, d_sigaction=define usethreads=define use5005threads=undef useithreads=define usemultiplicity=define useperlio=define d_sfio=undef uselargefiles=define usesocks=undef use64bitint=undef use64bitall=undef uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include' ccversion='', gccversion='4.1.2 20060729 (prerelease) (Debian 4.1.1-10)', gccosandvers='' intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12 ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=4, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/local/lib' libpth=/usr/local/lib /lib /usr/lib libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt perllibs=-ldl -lm -lpthread -lc -lcrypt libc=/lib/libc-2.3.6.so, so=so, useshrplib=true, libperl=libperl.so.5.8.8 gnulibc_version='2.3.6' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib' Characteristics of this binary (from libperl): Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP THREADS_HAVE_PIDS USE_ITHREADS USE_LARGE_FILES USE_PERLIO USE_REENTRANT_API Built under linux Compiled at Aug 6 2006 15:35:16 @INC: /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl /usr/local/lib/perl/5.8.7 /usr/local/share/perl/5.8.7 /usr/local/lib/perl/5.8.4 /usr/local/share/perl/5.8.4 /usr/local/lib/perl/5.8.0 /usr/local/share/perl/5.8.0 .
I tried debugging this more to no avail. My best guess is the reference isn't getting deleted somewhere OR my version of perl has a bug and that's causing it to not delete the reference.
Duncan -- can you post a simple test case to reproduce this? (obviously my mass-check is not producing this error, so it's not a widespread bug.) fwiw, you can debug perl garbage collection using something like sub DESTROY { warn "JMD destroyed $_[0]"; }
in theory this is fixed due to the explicit finish() destruction put back into trunk.
Yep, this is fixed.