Issue 127661

Summary: SHA1/MD5 in sal/rtl/source/digest.c give invalid results for some inputs (ODF interop issues)
Product: General Reporter: JimF <jfoug>
Component: securityAssignee: AOO security list <security>
Status: CONFIRMED --- QA Contact:
Severity: Major    
Priority: P2 CC: damjan, jfoug, mseidel, oooforum, petko
Version: 3.3.0 or older (OOo)Flags: mseidel: 4.2.0_release_blocker?
Target Milestone: ---   
Hardware: All   
OS: All   
Issue Type: DEFECT Latest Confirmation in: 4.2.0-dev
Developer Difficulty: ---

Description JimF 2018-01-15 22:17:52 UTC
There are 2 bugs in sal/rtl/source/digest.c  The bug will give incorrect results for SHA1 and MD5 when the last limb of the hashed data is between 52 and 55 bytes long.  So using that code to hash any data which is   52 <= length(data)%64 <= 55 will give bogus hash results.

The bug is seen on lines 710 and line 1160.  The bug is in the endMD5 and endSHA1 functions.  The bug has been reported to Libre Office, (same source base), and is being actively handled there.  This is the bug report made on Libre office, which lists findings to date, along with the code changes and plans for that product.  The bug there stems from the source being forked from Apache at some time, and the bug already being in that code base.

https://bugs.documentfoundation.org/show_bug.cgi?id=114939

The real problem is that the SHA1 is used in the encryption validation of ODF encrypted data files.  Since this broken hash has been used to create these encrypted files, then the broken code must be preserved to be used as a fall-back when opening existing ODF files.    If the code for the SHA1 in this spot is simply 'fixed', then there will be many documents rendered non usable. Also, if this bug is left alone with no change, then documents created by Libre office (or others) which use correct SHA1 will not be usable.  The method of using both hash algorithms (the correct one, and the buggy one), and using the first that 'works', is probably the only way to proceed, without impacting existing customer data files.
Comment 1 oooforum (fr) 2019-07-22 13:03:00 UTC
>The bug has been reported to Libre Office, (same source base), and is being
>actively handled there.
Well, maybe you could ask to the developer to publish his patch under AL v.2. We'll happy to commit it into AOO trunk.
Comment 2 JimF 2019-07-24 19:02:22 UTC
I am not part of either the AOO group (in any way) or in the Libra office group.  I simply found this bug while building other tools to process the ODF data files, doing data recovery.   There were files created by both suites which were NOT correctly finding known passwords using FIPS compliant SHA1 algorithm.  I simply investigated the raw source, found the problem, was fully able to recreate the problem, AND to recreate the broken values found in the file (thus matching the hash), by adding an extra 64 bytes of NULL data to the data being hashed.

I simply brought the bug forward to both Apache and to Libre.  So far, Libra has taken steps to correct their SHA1 and MD5. I believe they may have removed the MD5 code, since it was deprecated and unused.  For the SHA1, they corrected the hash, BUT kept the older buggy code.  Upon import of ODF file, if the correct hash did not correctly work for the user password, the buggy code was used.  If that buggy code produced a correct match, then the ODF file was allowed to be properly opened, BUT when re-saved, the ODF file was done so using proper SHA1, so that the non-compliant saved file was not carried forward.

NOTE, Libre actually HAD test vectors which they had commented out (for longer hashes), due to those test vectors failing.  However, once the bug was fixed as I recommended, those test vectors worked properly. The test vectors were FIPS compliant, so with the buggy code, those test vectors were failing.  Now they can be included due to proper functionality and the hashes produced all now match.

How the Apache group chooses to deal with this bug is beyond the scope of my involvement.  It certainly is not the worst issue ever.  A ODF file saved by AOO which triggers the bug will be validated and opened by AOO with the buggy SHA1.  But it will not be usable in any other suite.  I am simply the messenger, finding and documenting the bug, and trying my best to document how to do the 'raw' fix, and also posting warnings about fixing this and then having users already saved files (in the non-compliant format) NOT being able to be opened properly using only the new correct and compliant code.

Jim.
Comment 3 oooforum (fr) 2019-07-25 07:28:06 UTC
(In reply to JimF from comment #2)
> I am simply the messenger, finding and documenting the bug
Well, AOO project is only composed of volunteers
And a messenger can involved and become one of them.
Comment 4 Peter 2019-07-27 08:21:59 UTC
I have asked on LO dev List and received no answer since. https://lists.freedesktop.org/archives/libreoffice/2019-July/083191.html

The discussion on OpenOffice dev List surfaced a nice alternative by using the code from APR. This is maybe more effort but is in general a more interesting direction to follow.

https://apr.apache.org/
Comment 5 damjan 2023-06-03 10:18:32 UTC
Confirming, this can be reproduced by fiddling with the unit test in sal/qa/rtl/digest/rtl_digest.cxx.

Since 52 <= length(data)%64 <= 55 gives wrong hash results, and
length(data) % 64 gives a value between 0 and 63, this is 64 possible values.

The broken range of these values, 52-55, gives 55-52+1 = 4 possible values, and 4 out of 64 = 6.25% of all files written with the previous wrong hash, will be unreadable if this change is made.

That's many files that we would reject, just because they were written with an older buggy version. We  really should do something to check for the bad hash and open them anyway.

But we also need to stop writing these invalid files - as soon as possible.
Comment 6 Matthias Seidel 2023-06-03 11:42:04 UTC
I remember users complaining about OpenOffice not being able to open their password protected files anymore.

Maybe these files were opened/saved with LO and then AOO couldn't verify the (new) correct hash.

For the user this appears as "data loss". I agree that this should be fixed ASAP.
Comment 7 damjan 2023-06-04 07:28:08 UTC
In main/sal/rtl/source/digest.c these are the buggy functions:

static void __rtl_digest_endMD5 (DigestContextMD5 *ctx)
static void __rtl_digest_endSHA (DigestContextSHA *ctx)

As per OpenGrok, the tree of their callers is as follows:

__rtl_digest_endMD5() is called by:
rtl_digest_getMD5() which is called by:
1. sal's digest unit tests in main/sal/qa/rtl/digest/rtl_digest.cxx.
2. main/vcl/source/gdi/pdfwriter_impl2.cxx
   PDFWriterImpl::computeEncryptionKey()
   PDFWriterImpl::computeODictionaryValue()
   PDFWriterImpl::computeUDictionaryValue()
   when writing PDFs, which are really an export-only format, so I don't think
   we need to patch anything here.
3. trunk/main/vcl/source/gdi/pdfwriter_impl.cxx
   PDFWriterImpl::computeDocumentIdentifier()
   PDFWriterImpl::emitTrailer()
   again, PDF is an export-only format, some files we previously wrote would be
   corrupt to others, but when this is fixed, they won't be.
4. main/sal/rtl/source/uuid.cxx
   rtl_createNamedUuid()
   which is called by a unit test in main/sal/qa/rtl/uuid/rtl_Uuid.cxx, but
   never called by any other code.
5. main/sdext/source/pdfimport/pdfparse/pdfentries.cxx
   PDFFile::decrypt()
   password_to_key()
   check_user_password()
   ok so maybe we import PDFs too... That means some encrypted PDFs currently
   fail to open, but will be fixed. Should we be opening corrupt encrypted PDFs
   that earlier versions wrote though?
6. main/tools/bootstrp/md5.cxx
   calc_md5_checksum()
   +-main() in main/tools/bootstrp/so_checksum.cxx, which builds the
     so_checksum binary, which checksums command line arguments and prints out
     their MD5, and never appears to be used during the build.
7. main/sdext/source/pdfimport/filterdet.cxx
   checkDocChecksum()
   +-getAdditionalStream()
     +-PDFDetector::detect()
     +-PDFIHybridAdaptor::filter()
8. main/sal/osl/unx/signal.c
   calc_md5_checksum()
   +-ReportCrash()
9. main/crashrep/source/win32/soreport.cpp
   calc_md5_checksum()
   which is unused, it's in an "#if 0" block.

So far it looks to me like for MD5 we don't need compatibility with previously written files with badly calculated checksums. Let's see SHA next.
Comment 8 damjan 2023-06-04 19:21:14 UTC
No wait, there's another file of rtl_digest_getMD5() callers that I missed:

rtl_digest_getMD5()
+-1 __rtl_digest_MD5
| | struct Direct_Impl's m_get
| +-1 rtl_digest_get()
|   +-1 main/oox/source/core/filterdetect.cxx
|   |   only for SHA1
|   +-2 main/sal/rtl/source/random.c
|   |   uses RTL_RANDOM_DIGEST, irrelevant to us.
|   +-3 OConnectionWrapper::createUniqueId()
|   |   uses SHA1, not MD5
|   +-4 OfficePipeId::operator()
|   |   really uses MD5, hashes installation path for IPC ID.
|   +-5 CreateMD5FromString()
|     +-1 OfficeIPCThread::EnableOfficeIPCThread()
|         Also hashes installation path for IPC ID.
+-2 rtl_digest_MD5()
| +-1 PDFWriterImpl::checkAndEnableStreamEncryption()
| +-2 PDFWriterImpl::enableStringEncryption()
| +-3 prepareIV() in main/svl/source/passwordcontainer/passwordcontainer.cxx
| | +- PasswordContainer::DecodePasswords()
| | +- PasswordContainer::EncodePasswords()
| +-4 calc_md5_checksum() in main/crashrep/source/win32/soreport.cpp
|   +- WriteChecksumFile()
|      Used in the crash report.
+-3 rtl_digest_initHMAC_MD5()
| +-1 __rtl_digest_HMAC_MD5
| | | struct Direct_Impl's m_init
| | +-1 rtl_digest_init()
| |   +- included in all the places seen for 1.1
| +-2 rtl_digest_HMAC_MD5()
|     only used in a test
+-4 rtl_digest_getHMAC_MD5()
  +-1 rtl_digest_HMAC_MD5()
  |   same as 3.2
  +-2 __rtl_digest_HMAC_MD5
      struct Direct_Impl's m_get
      same as 1

The only one of potential concern is the PasswordContainer, but that only appears to be used for storing website passwords. The MD5 hash is used for the initialization vector for encryption and decryption, so the changed MD5 function might mean users can't decrypt their stored (website?) passwords and have to re-enter them all.

Others are of far lesser concern, eg. IIRC the office IPC ID is opaque anyway, PDF writing was already discussed, and crash reports are currently unused.
Comment 9 damjan 2023-06-05 04:00:09 UTC
Actually wait, we have more MD5 users:

(struct) Digest_Impl __rtl_digest_MD5
+-2 rtl_digest_createMD5()
  +-1 all the places we saw already
  +-2 rtl_digest_create() with rtl_Digest_AlgorithmMD5
    +-1 BinaryCodec_RCF() in main/oox/source/core/binarycodec.cxx
    +-2 MSCodec_Std97::MSCodec_Std97()
    |   in main/filter/source/msfilter/mscodec.cxx
    +-3 GenerateStd97Key() in main/comphelper/source/misc/docpasswordhelper.cxx
    +-4 OfficePipeId::operator () ()
    |   already seen, office IPC ID
    +-5 CreateMD5FromString()
        already seen, office IPC ID

2.2.1, 2.2.2 and 2.2.3 deal with Microsoft Office documents, so it's not just ODF that's affected...
Comment 10 damjan 2023-06-10 15:02:32 UTC
Actually the Microsoft Office document code (GenerateStd97Key, BinaryCodec_RCF, etc.) all use rtl_digest_rawMD5() instead of rtl_digest_endMD5(), which doesn't have this bug.

So it's only PasswordContainer, and I have patch that can read files written with the bad MD5 sum and convert them.

Now onto __rtl_digest_endSHA().
Comment 11 damjan 2023-06-12 06:41:46 UTC
__rtl_digest_endSHA() appears to be used by both SHA and SHA1 hash functions.

Thankfully, other than in its own unit tests, SHA is completely unused in our code.

SHA-1 on the other hand, is impossible, I am giving up and trying an automated tool to generate call graphs (anybody know any?), there is simply too many to draw and go through by hand.

Here is what I have so far, finished for SHA but very incomplete for SHA-1:

__rtl_digest_endSHA()
|
+-1 rtl_digest_getSHA()
| +-1 __rtl_digest_SHA_0
| |  +-1 rtl_digest_SHA()
| |  |   only used in tests
| |  +-2 rtl_digest_createSHA()
| |    +-1 rtl_digest_create() with rtl_Digest_AlgorithmSHA
| |        only used in tests
| +-2 rtl_digest_SHA()
|     see 1.1.1
|
|
+-2 rtl_digest_getSHA1()
  +-1 SHA1DigestContext::finalizeDigestAndDispose()
  | | in main/package/source/package/zipapi/sha1context.cxx
  | |
  | +-1 ZipOutputStream::closeEntry()
  | | | in main/package/source/package/zipapi/ZipOutputStream.cxx
  | | | deals with WRITING, and should always be writing the CORRECT SHA-1:
  | | +-1 ZipPackageFolder::saveChild()
  | | +-2 ZipPackage::WriteMimetypeMagicFile()
  | | +-3 ZipPackage::WriteManifest()
  | | +-4 ZipPackage::WriteContentTypes()
  | |
  | +-2 ZipFile::StaticGetDigestContextForChecksum()
  |   +-1 ZipFile::StaticHasValidPassword()
  |   | | used when READING, needs to deal with bad SHA-1...
  |   | +-1 ZipFile::StaticGetDataFromRawStream()
  |   | +-2 ZipFile::hasValidPassword()
  |   |   +-1 ZipFile::getInputStream()
  |   |   +-2 ZipFile::getDataStream()
  |   |
  |   +-2 ZipOutputStream::putNextEntry()
  |       used when WRITING, and should always be writing the CORRECT SHA-1.
  |
  +-2 __rtl_digest_SHA_1
  | | struct Direct_Impl's m_get
  | +-1 rtl_digest_get()
  |   +-1 lclDeriveKey() in main/oox/source/core/filterdetect.cxx
  |   | +-1 lclGenerateEncryptionKey()
  |   |     see 2.2.1.3
  |   +-2 lclCheckEncryptionData()
  |   | +-1 PasswordVerifier::verifyEncryptionData()
  |   | | +-1 FilterDetect::extractUnencryptedPackage()
  |   | | |   see 2.2.1.3.1.1.3
  |   | | +-2 DocPasswordHelper::requestAndVerifyDocPassword()
  |   | |     see 2.2.1.3.1.1
  |   | +-2 lclGenerateEncryptionKey()
  |   |     see 2.2.1.3
  |   +-3 lclGenerateEncryptionKey()
  |   | +-1 PasswordVerifier::verifyPassword()
  |   |   +-1 DocPasswordHelper::requestAndVerifyDocPassword()
  |   |     | in main/comphelper/source/misc/docpasswordhelper.cxx
  |   |     +-1 ScfApiHelper::QueryEncryptionDataForMedium()
  |   |     | | in main/sc/source/filter/ftools/fapihelper.cxx
  |   |     | +-1 XclRoot::RequestEncryptionData()
  |   |     |   | in main/sc/source/filter/excel/xlroot.cxx
  |   |     |   +-1 XclImpDecryptHelper::ReadFilepass()
  |   |     |     +-1 ImportExcel::Read()
  |   |     |         used when READING .xls files
  |   |     |
  |   |     +-2 FilterBase::requestEncryptionData()
  |   |     |   in oox/source/core/filterbase.cxx and used when READING
  |   |     +-3 FilterDetect::extractUnencryptedPackage()
  |   |     |   used when READING ooxml
  |   |     +-4 SharedConfigData::requestEncryptionData()
  |   |     |   in main/oox/source/dump/dumperbase.cxx
  |   |     |   and used when READING ooxml
  |   |     +-5 CheckPasswd_Impl()
  |   |       | in main/sfx2/source/appl/appopen.cxx
  |   |       +-1 DocumentInserter::CreateMedium()
  |   |       |   ...
  |   |       +-2 SfxObjectShell::LoadOwnFormat()
  |   +-4 OConnectionWrapper::createUniqueId()
  |       used as a key for connection pooling, doesn't persist to disk.
  |
  +-3 rtl_digest_SHA1()
  +-4 rtl_digest_initHMAC_SHA1()
  +-5 rtl_digest_getHMAC_SHA1()