Tika-1130 demonstrates that a newline character is incorrectly being inserted between runs within an SDT. Will submit a patch + test shortly. This is a cleanup of: https://issues.apache.org/bugzilla/show_bug.cgi?id=54849
Created attachment 30482 [details] [PATCH] This issue appears to be limited to contiguous runs within something that isn't a paragraph (in Tika-1130, the runs are in a cell). I added test cases to guarantee newline/tab behavior in contiguous runs within cells and in paragraphs going forward.
Thanks Tim, patch applied in r1496458, and changelog updated for it in r1496461.
Thank you! I'll update the patch to tika 1130 this evening to include Ray's original test for "BigCompany" instead of just "Company."