Bug 7960 - PDFInfo misses valid metadata
Summary: PDFInfo misses valid metadata
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: SVN Trunk (Latest Devel Version)
Hardware: All All
: P2 normal
Target Milestone: Undefined
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-03-03 04:21 UTC by Bill Cole
Modified: 2022-03-03 08:41 UTC (History)
2 users (show)



Attachment Type Modified Status Actions Submitter/CLA Status

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Cole 2022-03-03 04:21:55 UTC

    
Comment 1 Bill Cole 2022-03-03 04:31:43 UTC
As reported on the Users' mailing list by Ricky Boone on 2022-03-02, the PDFInfo plugin fails to parse out much of the metadata from a sizable fraction of today's PDFs. 

I've fixed this in r1898546 by removing the optimization (no longer valid) of skipping lines in the PDF with high-bit-set characters.
Comment 2 Henrik Krohns 2022-03-03 08:41:29 UTC
Committed some more cleanups. The PDF "parsing" is unbelievably naive, so I did the same for UTF-16 decoding.. I guess it does the job for now ¯\_(ツ)_/¯

Sending        trunk/lib/Mail/SpamAssassin/Plugin/PDFInfo.pm
Transmitting file data .done
Committing transaction...
Committed revision 1898557.