Bug 8027 - extracttext plugin fails when executable installed in a path containing a space
Summary: extracttext plugin fails when executable installed in a path containing a space
Status: RESOLVED FIXED
Alias: None
Product: Spamassassin
Classification: Unclassified
Component: Plugins (show other bugs)
Version: 4.0.0
Hardware: All All
: P2 minor
Target Milestone: 4.0.0
Assignee: SpamAssassin Developer Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-14 10:39 UTC by Sidney Markowitz
Modified: 2022-10-09 03:12 UTC (History)
1 user (show)



Attachment Type Modified Status Actions Submitter/CLA Status
Skip tests in extracttext.t if executable is in a path with a space patch None Sidney Markowitz [HasCLA]

Note You need to log in before you can comment on or make changes to this bug.
Description Sidney Markowitz 2022-08-14 10:39:15 UTC
The default installation for at least one of the available Windows installation files for Tesseract is in a subdirectory of C:\Program Files. The entire command line is one config entry that is parsed by splitting on space. That breaks if there is an embedded space, and there is no provision for quoting fields in teh value. Fixing this could be done by making the executable ma,e be a separate config entry from the command line arguments, or the parsing code can be made more complex to handle quotes.

Until this is fixed a viable workaround is to install tesseract and pdftotext only in directory paths with no spaces.
Comment 1 Sidney Markowitz 2022-10-08 16:33:57 UTC
This bug is showing up in tests run on Github action Windows runners since we have added "cat" to the test, as apparently the Windows runners have a "cat" program in Path in a directory under C:\Program Files
Comment 2 Sidney Markowitz 2022-10-09 03:08:20 UTC
Created attachment 5837 [details]
Skip tests in extracttext.t if executable is in a path with a space

The underlying cause is that sub helper_app_pipe_open in Utils.pm fails when the path of the helper app contains a space. This sub is currently used in DCC, Pyzor and ExtractText plugins.

The requirement that there can't be spaces in the paths for dcc, pyzor, and any application used in the configuration for extracttext is good enough for the 4.0.0 release. However, to avoid test failures in GitHub actions, where the Windows runner has a cat.exe in a subdirectory of C:\Program Files that is in PATH, this patch is only in the t/extracttext.t test file, and skips tests if the executable that is found has a space in the path.

As this patch is only for the test, it can be committed for the 4.0.0 release without an RTC vote.

I'll open a new enhancement issue for after 4.0.0 for supporting executables with space in the path in sub helper_app_pipe_open.
Comment 3 Sidney Markowitz 2022-10-09 03:12:22 UTC
trunk % svn ci -m "bug 8027 - skip extracttext tests if executable found in path with space to avoid test failure" t/extracttext.t
Sending        t/extracttext.t
Transmitting file data .done
Committing transaction...
Committed revision 1904466.