Bug 68234 - getBytesInCodePage(String,int)@CodePageUtil can fail to set encoding.
Summary: getBytesInCodePage(String,int)@CodePageUtil can fail to set encoding.
Status: RESOLVED WONTFIX
Alias: None
Product: POI
Classification: Unclassified
Component: HPSF (show other bugs)
Version: 5.3.x-dev
Hardware: PC All
: P2 normal (vote)
Target Milestone: ---
Assignee: POI Developers List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-27 02:34 UTC by zhonghao
Modified: 2024-02-25 20:17 UTC (History)
0 users



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description zhonghao 2023-11-27 02:34:13 UTC
The code is as follows:

public static byte[] getBytesInCodePage(final String string, final int codepage) 
    throws UnsupportedEncodingException 
    {
        String encoding = codepageToEncoding(codepage);
        return string.getBytes(encoding);
    }

When codepageToEncoding throws exceptions, encoding will not be set. 
NPOI fixed a bug:
https://github.com/nissl-lab/npoi/commit/9ee6fa7de8361239dc962ccf9d5c99e65587b234

The buggy code is identical, but the fixed code handles exceptions:

public static byte[] GetBytesInCodePage(String string1, int codepage)
        {
   
            String cp = CodepageToEncoding(codepage);
            Encoding encoding;
            try
            {
                encoding = Encoding.GetEncoding(cp);
            }
            catch (Exception)
            {
                encoding = Encoding.ASCII;
            }
            return encoding.GetBytes(string1);
            //return string1.GetBytes(encoding);
        }
Comment 1 PJ Fanning 2023-11-27 10:24:25 UTC
I don't think we should ignore encoding issues. My gut is to not make any changes. Please provide a real world example of where we should handle invalid encodings.
Comment 2 Dominik Stadler 2024-02-25 20:17:13 UTC
As mentioned by PJ, this is likely not something that we want to change even if NPOI decided to handle it this way.