|
SA Bugzilla – Full Text Bug Listing |
Summary: | Wide character in print at /usr/bin/sa-compile line 433 | ||
---|---|---|---|
Product: | Spamassassin | Reporter: | Fabian Dellwing <f.dellwing> |
Component: | sa-compile | Assignee: | SpamAssassin Developer Mailing List <dev> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | apache-bugzilla, apache, billcole, dmigowski, duncan, eqx, jidanni, sa-bugzilla, toddr |
Priority: | P2 | ||
Version: | 3.4.2 | ||
Target Milestone: | 4.0.0 | ||
Hardware: | PC | ||
OS: | Linux | ||
Whiteboard: | |||
Bug Depends on: | 7656 | ||
Bug Blocks: | |||
Attachments: | sa-compile full log |
Description
Fabian Dellwing
2018-10-22 07:58:10 UTC
I get the same after adding the heinlein channel. (In reply to eqx from comment #1) > I get the same after adding the heinlein channel. I am using the same SA rules from channel "spamassassin.heinlein-support.de", so this might be a similarity here.. What version of Perl are you using? (In reply to Bill Cole from comment #3) > What version of Perl are you using? Perl 5.22.1 on Ubuntu 16.04.5 > [14:50 root@mail ~] > perl -V
> Summary of my perl5 (revision 5 version 18 subversion 2) configuration:
>
> Platform:
> osname=linux, osvers=4.4.0-127-generic, archname=i686-linux-gnu-thread-multi-64int
> uname='linux lgw01-amd64-009 4.4.0-127-generic #153-ubuntu smp sat may 19 10:58:46 utc 2018 i686 i686 i686 gnulinux '
> config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN -D_FORTIFY_SOURCE=2 -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -Dldflags= -Wl,-Bsymbolic-functions -Wl,-z,relro -Dlddlflags=-shared -Wl,-Bsymbolic-functions -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=i686-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.18 -Darchlib=/usr/lib/perl/5.18 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.18.2 -Dsitearch=/usr/local/lib/perl/5.18.2 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib -Dlibperl=libperl.so.5.18.2 -des'
> hint=recommended, useposix=true, d_sigaction=define
> useithreads=define, usemultiplicity=define
> useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
> use64bitint=define, use64bitall=undef, uselongdouble=undef
> usemymalloc=n, bincompat5005=undef
> Compiler:
> cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
> optimize='-O2 -g',
> cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fstack-protector -fno-strict-aliasing -pipe -I/usr/local/include'
> ccversion='', gccversion='4.8.4', gccosandvers=''
> intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
> d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
> ivtype='long long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
> alignbytes=4, prototype=define
> Linker and Libraries:
> ld='cc', ldflags =' -fstack-protector -L/usr/local/lib'
> libpth=/usr/local/lib /lib/i386-linux-gnu /lib/../lib /usr/lib/i386-linux-gnu /usr/lib/../lib /lib /usr/lib
> libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
> perllibs=-ldl -lm -lpthread -lc -lcrypt
> libc=, so=so, useshrplib=true, libperl=libperl.so.5.18.2
> gnulibc_version='2.19'
> Dynamic Linking:
> dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
> cccdlflags='-fPIC', lddlflags='-shared -L/usr/local/lib -fstack-protector'
>
>
> Characteristics of this binary (from libperl):
> Compile-time options: HAS_TIMES MULTIPLICITY PERLIO_LAYERS
> PERL_DONT_CREATE_GVSV
> PERL_HASH_FUNC_ONE_AT_A_TIME_HARD
> PERL_IMPLICIT_CONTEXT PERL_MALLOC_WRAP
> PERL_PRESERVE_IVUV PERL_SAWAMPERSAND USE_64_BIT_INT
> USE_ITHREADS USE_LARGE_FILES USE_LOCALE
> USE_LOCALE_COLLATE USE_LOCALE_CTYPE
> USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF
> USE_REENTRANT_API
> Locally applied patches:
> DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
> DEBPKG:debian/db_file_ver - http://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
> DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
> DEBPKG:debian/enc2xs_inc - http://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
> DEBPKG:debian/errno_ver - http://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
> DEBPKG:debian/libperl_embed_doc - http://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
> DEBPKG:fixes/respect_umask - Respect umask during installation
> DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
> DEBPKG:debian/extutils_set_libperl_path - EU:MM: Set location of libperl.a to /usr/lib
> DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
> DEBPKG:debian/prefix_changes - Fiddle with *PREFIX and variables written to the makefile
> DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
> DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
> DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
> DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
> DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
> DEBPKG:debian/module_build_man_extensions - http://bugs.debian.org/479460 Adjust Module::Build manual page extensions for the Debian Perl policy
> DEBPKG:debian/prune_libs - http://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
> DEBPKG:fixes/net_smtp_docs - [rt.cpan.org #36038] http://bugs.debian.org/100195 Document the Net::SMTP 'Port' option
> DEBPKG:debian/perlivp - http://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
> DEBPKG:debian/cpanplus_definstalldirs - http://bugs.debian.org/533707 Configure CPANPLUS to use the site directories by default.
> DEBPKG:debian/cpanplus_config_path - Save local versions of CPANPLUS::Config::System into /etc/perl.
> DEBPKG:debian/deprecate-with-apt - http://bugs.debian.org/702096 Point users to Debian packages of deprecated core modules
> DEBPKG:debian/squelch-locale-warnings - http://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
> DEBPKG:debian/skip-upstream-git-tests - Skip tests specific to the upstream Git repository
> DEBPKG:debian/patchlevel - http://bugs.debian.org/567489 List packaged patches for 5.18.2-2ubuntu1.6 in patchlevel.h
> DEBPKG:debian/skip-kfreebsd-crash - http://bugs.debian.org/628493 [perl #96272] Skip a crashing test case in t/op/threads.t on GNU/kFreeBSD
> DEBPKG:fixes/document_makemaker_ccflags - http://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
> DEBPKG:debian/find_html2text - http://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
> DEBPKG:debian/hurd_test_skip_stack - http://bugs.debian.org/650175 Disable failing GNU/Hurd tests dist/threads/t/stack.t
> DEBPKG:fixes/manpage_name_Test-Harness - http://bugs.debian.org/650451 [rt.cpan.org #73399] cpan/Test-Harness: add NAME headings in modules with POD
> DEBPKG:debian/makemaker-pasthru - http://bugs.debian.org/660195 [rt.cpan.org #28632] Make EU::MM pass LD through to recursive Makefile.PL invocations
> DEBPKG:debian/perl5db-x-terminal-emulator.patch - http://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
> DEBPKG:debian/cpan-missing-site-dirs - http://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
> DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] http://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected
> DEBPKG:fixes/net_ftp_failed_command - [rt.cpan.org #37700] http://bugs.debian.org/491062 Net::FTP: cope gracefully with a failed command
> DEBPKG:fixes/perlbug-patchlist - [3541c11] http://bugs.debian.org/710842 [perl #118433] Make perlbug look up the list of local patches at run time
> DEBPKG:fixes/module_metadata_security_doc - [68cdd4b] CVE-2013-1437 documentation fix
> DEBPKG:fixes/module_metadata_taint_fix - [bff978f] http://bugs.debian.org/722210 [rt.cpan.org #88576] untaint version, if needed, in Module::Metadata
> DEBPKG:fixes/IPC-SysV-spelling - http://bugs.debian.org/730558 [rt.cpan.org #86736] Fix spelling of IPC_CREAT in IPC-SysV documentation
> DEBPKG:fixes/fix-undef-source -
> DEBPKG:fixes/CVE-2013-7422.patch - [PATCH] [perl #119505] Segfault from bad backreference
> DEBPKG:fixes/CVE-2014-4330.patch - [PATCH] don't recurse infinitely in Data::Dumper
> DEBPKG:fixes/CVE-2016-2381.patch - [PATCH 1/2] remove duplicate environment variables from environ
> DEBPKG:fixes/CVE-2017-12837.patch - [PATCH] regcomp [perl #131582]
> DEBPKG:fixes/CVE-2017-12883.patch - [PATCH] PATCH: [perl #131598]
> DEBPKG:fixes/CVE-2015-8853-1.patch - [PATCH] PATCH [perl #123562] Regexp-matching "hangs"
> DEBPKG:fixes/CVE-2015-8853-2.patch - [PATCH] regexec.c: Use Perl_croak_nocontext()
> DEBPKG:fixes/CVE-2016-6185.patch - [PATCH] =?utf8?q?Don=E2=80=99t=20let=20XSLoader=20load=20relative?= =?utf8?q?=20paths?=
> DEBPKG:fixes/CVE-2017-6512-pre.patch - [PATCH] Correct the order of tests of chmod(). (#294)
> DEBPKG:fixes/CVE-2017-6512.patch - http://bugs.debian.org/863870 [rt.cpan.org #121951] Prevent directory chmod race attack.
> DEBPKG:fixes/CVE-2018-6913.patch - (perl #131844) fix various space calculation issues in pp_pack.c
> DEBPKG:fixes/CVE-2018-12015.patch - [PATCH] [PATCH] Remove existing files before overwriting them
> Built under linux
> Compiled at Jun 13 2018 12:40:40
> @INC:
> /etc/perl
> /usr/local/lib/perl/5.18.2
> /usr/local/share/perl/5.18.2
> /usr/lib/perl5
> /usr/share/perl5
> /usr/lib/perl/5.18
> /usr/share/perl/5.18
> /usr/local/lib/site_perl
> .
There's some utf8 rules, for example (I've used "cat -v" to print them..) body HS_BODY_899 /The seller hasnM-CM-"M-bM-^BM-,M-bM-^DM-"t provided any postage details yet/ body HS_BODY_1575 /diesem Grund folgende Zahlung zu stornieren. Um den dafM-CM-<r nM-CM-6tigen/ Basically the wide print error comes from outputting "scanner1.re", which ends up containing char *Mail_SpamAssassin_CompiledRegexps_body_0_scan1(unsigned char **p){ unsigned char *q = 1 + *p; /*!re2c "diesem grund folgende zahlung zu stornieren" {RET("HS_BODY_1575,[l=1]");} "the seller hasnâ" {RET("HS_BODY_899,[l=1]");} [\000-\377] { return NULL; } */ Not sure if we should just print with binmode utf8 or similar, so the utf8 characters end up in scanner1.re, or perhaps convert them first to some hex \xAB value. I guess this depends on what re2c is expecting. I'm not sure what state utf8 rules/checks are in anyway. If there isn't already, we should have some docs/bug describing all the steps from reading .cf with utf8 rules to how the rule is stored and matched to decoded body (which is, or is not utf8?).. and also how sa-compile fits in all of this.. Noticed something that made me think of this bug.. https://bz.apache.org/SpamAssassin/show_bug.cgi?id=5691#c16 "SA rules files are encoded in ISO-8859-1, not UTF-8. You have to either encode Japanese characters in pattern tests using \x sequences or develop a new feature adding support for UTF-8 config files to SA." I don't know if this (still) true of false, but perhaps we should clarify this somewhere and optionally reject any non-ascii configuration lines. No time to investicate right now. Thanks Henrik. I notified the maintainers of spamassassin.heinlein-support.de and pointed them here. same problem here under Gentoo: Wide character in print at /usr/bin/sa-compile line 433, <$fh> line 9716. Wide character in print at /usr/bin/sa-compile line 433, <$fh> line 10641. spamassassin 3.4.2 and perl 5.26.2 I am also using spamassassin.heinlein-support.de Any news on this topic? (In reply to Robert R. Richter from comment #9) > same problem here under Gentoo: > > Wide character in print at /usr/bin/sa-compile line 433, <$fh> line 9716. > Wide character in print at /usr/bin/sa-compile line 433, <$fh> line 10641. > > spamassassin 3.4.2 and perl 5.26.2 > > I am also using spamassassin.heinlein-support.de > > Any news on this topic? Not really. It's a low priority because it seems to be purely cosmetic and only occur with a third-party ruleset. 1. A simple direct POSSIBLE fix with UNKNOWN side-effects may be to add this UNTESTED line after line 22: use open OUT => ':utf8'; 2. A better fix will be to not use STDOUT for building the .re files. Either change is unfit for the 3.4.3 release, which will be the terminal release for the 3.4 branch. The untested one-line possible fix may not work and may not quiet the warning while possibly breaking the rules involved. The refactoring of .re generation is simply too big to put in the final cleanup of the 3.4 branch. I am no expert, so is it safe to just ignore these "Wide character in print at..." warnings/errors? Or are there any other sideeffects so that I should remove this ruleset? FYI: I still have one 3.4.1 installation left and there are no such warnings using this ruleset on 3.4.1. Seems to be an issue only on 3.4.2. Well if Heinlein is reading this, do not use UTF8 in rule files. That's the most simple fix. Write rules in pure latin1: /füübar/ Or better yet, with UTF8 byte alternatives: $ perl -MEncode -e 'print unpack("H*", encode("UTF-8", "ü"))' c3bc /f(?:ü|\xc3\xbc)(?:ü|\xc3\xbc)bar/ Most portable: $ perl -e 'print unpack("H*", "ü")' fc /f(?:\xfc|\xc3\xbc)(?:\xfc|\xc3\xbc)bar/ Some related thread: http://spamassassin.1065346.n5.nabble.com/UTF8-character-in-doesn-t-match-td154199.html (In reply to Robert R. Richter from comment #11) > I am no expert, so is it safe to just ignore these "Wide character in print > at..." warnings/errors? Or are there any other sideeffects so that I should > remove this ruleset? "Safe" is an imprecise concept, but I think ignoring those messages is safe for my understanding of safety. My understanding is that all of the rules are still being converted into compilable C and that only the specific rules that contain utf8 characters are being mangled in the process, making them generally non-matchable. See Henrik's comments above (comment #6 and comment #12) > FYI: I still have one 3.4.1 installation left and there are no such warnings > using this ruleset on 3.4.1. Seems to be an issue only on 3.4.2. That's probably because 3.4.1 was liberally sprinkled with "use bytes;" pragmas, which effectively removed handling of "wide" characters as characters rather than as a sequence of unrelated bytes. That wasn't a maintainable strategy given the modern reality of how Perl handles Unicode. If you want to understand the details, "perldoc bytes" is a place to start and it references additional documentation that may be helpful. Because this could be seen as a problem with a 3rd-party rule distribution that is distributing rules in a bad format, I am tempted to just close this as "INVALID" (i.e. not OUR problem,) but I do think we need to nail down the code truth in documentation and probably rework sa-compile for 4.0 to create re2c input files in a more tightly specified way. *** Bug 5607 has been marked as a duplicate of this bug. *** I would wish for a better error message, one which says WHICH channel on was parsing. I also have heinlein, but also schaal-it.net, and cannot say for sure without more testing which of them delivers wrong characters now. Closing this as 3.4 will not receive any more fixes, and I'm considering sa-compile deprecated for 4.0.0 (atleast the project should vote on it officially). Thanks Henrik. Just to confirm, you are saying the issue does no longer exist in sa-compile v4+, so we can stop tracking at this point? If it still exists we may want to open a bug on v4 for tracking, until deprecation of sa-compile has been confirmed, or simply define/document the re2c input requirements more strictly. There is no issue if one doesn't put raw UTF-8 in cf files, some guidelines have been put into documentation about that. And as said, probably sa-compile will be gone in 4.0 (per Bug 7962). Origin of the warning seemed to be from fixup_re which created utf8 encoded strings, should be silenced now. Judging from sa-compile temp files, nothing changed, so nothing should break (assuming the utf-8 stuff works properly in the first place, there aren't any unit tests for it). Sending trunk/lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm Transmitting file data .done Committing transaction... Committed revision 1898776. Now UTF-8 rules might actually work: Sending spamassassin-3.4/lib/Mail/SpamAssassin/Plugin/BodyRuleBaseExtractor.pm Sending spamassassin-3.4/sa-compile.raw Sending trunk/sa-compile.raw Sending trunk/t/sa_compile.t Transmitting file data ....done Committing transaction... Committed revision 1898791. |