Created attachment 25685 [details] test case Hi I tried to extract text from attached ppt. I get '75 years' instead of '≥75 years'. BR, Piotr Lipski
This could well be a case of microsoft making up their own codepoints for stuff Could you please confirm which character number is used in the file (use org.apache.poi.poifs.dev.POIFSViewer or similar to track it down), then confirm what unicode codepoint your character should actually be?
It should be \u2265 instead I get \uf0b3.
I've added the method StringUtil.mapMsCodepointString() which converts the symbol characters to the unicode equivalents. To keep the strings in sync with the binary representation, I've decided not to include this as the default in TextBox.getText() & Co. Applied with r1648415