Bug 51851

Summary: Error in apr_atomic_xchgptr
Product: APR Reporter: Mattias Engdegård <mattiase>
Component: APRAssignee: Apache Portable Runtime bugs mailinglist <bugs>
Status: CLOSED FIXED    
Severity: normal CC: danielsh, davi
Priority: P2 Keywords: FixedInTrunk
Version: HEAD   
Target Milestone: ---   
Hardware: PC   
OS: Linux   

Description Mattias Engdegård 2011-09-20 18:23:09 UTC
The version of apr_atomic_xchgptr() in atomic/unix/ia32.c is completely broken:

APR_DECLARE(void*) apr_atomic_xchgptr(volatile void **mem, void *with)
{
    void *prev;
#if APR_SIZEOF_VOIDP == 4
    asm volatile ("xchgl %2, %1"
                  : "=a" (prev), "+m" (*mem)
                  : "0" (with));
#elif APR_SIZEOF_VOIDP == 8
    asm volatile ("xchgq %q2, %1"
                  : "=a" (prev), "+m" (*mem)
                  : "r" ((unsigned long)with));
#else
#error APR_SIZEOF_VOIDP value not supported
#endif
    return prev;
}

For x86-64, the generated asm will be something like

  a0:   48 87 37                xchg   %rsi,(%rdi)
  a3:   c3                      retq   

which is clearly wrong; the return value (%rax) isn't set, so the function returns garbage. This makes svn crash on startup when this code is used.
Suggested patch:

--- atomic/unix/ia32.c  (revision 1173061)
+++ atomic/unix/ia32.c  (arbetskopia)
@@ -117,7 +117,7 @@
 #elif APR_SIZEOF_VOIDP == 8
     asm volatile ("xchgq %q2, %1"
                   : "=a" (prev), "+m" (*mem)
-                  : "r" ((unsigned long)with));
+                  : "0" ((unsigned long)with));
 #else
 #error APR_SIZEOF_VOIDP value not supported
 #endif

The cast to unsigned long should really go away as well; it just breaks the code on IL32P64 platforms for no good reason.

By the way, the same function in atomic/unix/builtins.c is correct, but it adds an mfence instruction. I'm not sure that is needed - the XCHG instruction has an implicit memory fence.
Comment 1 Mattias Engdegård 2011-09-20 18:27:39 UTC
Related bugs: bug 50731 (wrong volatile qualifier in signature) and bug 42806 (origin of the code).
Comment 2 Mattias Engdegård 2011-09-22 09:58:27 UTC
This bug is not necessarily caught by the regression test (testatomic.c), because of a poor choice of test pointer value (NULL). The following patch makes the test stronger, and actually catches the bug:

--- test/testatomic.c	(revision 1174015)
+++ test/testatomic.c	(arbetskopia)
@@ -84,11 +84,12 @@
 static void test_xchgptr(abts_case *tc, void *data)
 {
     int a;
-    volatile void *target_ptr = NULL;
+    void *ref = "little piggy";
+    volatile void *target_ptr = ref;
     void *old_ptr;
 
     old_ptr = apr_atomic_xchgptr(&target_ptr, &a);
-    ABTS_PTR_EQUAL(tc, NULL, old_ptr);
+    ABTS_PTR_EQUAL(tc, ref, old_ptr);
     ABTS_PTR_EQUAL(tc, &a, (void *) target_ptr);
 }
 
Also note that for the bug to show up, the bad code in atomic/unix/ia32.c must be used, which is only the case when a gcc without builtin sync primitives was used for the configuration script - that is, gcc 4.0.x or older.
Comment 3 Stefan Fritsch 2013-04-21 21:32:05 UTC
Thanks for the analysis and test case.

trunk commit: r1470348
1.5: r1470349
1.4: r1470350
Comment 4 Jeff Trawick 2014-01-21 12:56:27 UTC
Fixed in apr 1.4.7 and later releases.