SMRProxy Timing Comparisons

Some timings of read access. The time units are nanoseconds. The first number is for a single reader thread, the second is for 10 reader threads.

  • smrproxy w/o membar (no shared data access) – 0.7, 1.5
  • smrproxy w/ membar (no shared data access) – 7.7, 13.6
  • smrproxy w/o membar – 0.78, 1.72
  • smrproxy w/ membar – 8.0, 14.5
  • rcu – 0.48, 0.85
  • urcu – 7.38, 12.4
  • urcu2 – 11.0, 15.1
  • rwlock – 21.6, 800.0
  • arcproxy – 12.0, 285.0 – 345.0

The RCU measurement isn’t really RCU, it’s just unsafe access without any synchronization which is what classic RCU read access looks like. The shared data access is just a simple dependent load. Non trivial data access would likely incur more dependent loads which would basically comprise most of the read access overhead.

The narrow difference between smrproxy w/ and w/o membar seems to indicate that the effect of the membar on the cpu pipleline isn’t as much as one would expect.

The rwlock reader timings aren’t much different for reader preference and writer preference rwlocks. For writers, you would want to use writer preference if there are lots of readers.

(edit) The previous set of timings were for testcases that updated statistics on data validity, so there were stores into memory and branching that showed up in the timings. I added tests for simple dependent load (load of a pointer and dereferencing it, 2 loads) after making sure the compiler didn’t optimize out the unused loads.

In actual usage with non-trivial data structures, you are likely to have much more dependent loads.

(edit 2) I added a quick and dirty simulated atomic reference counted proxy (arcproxy) to the timings using 2 atomic_fetch_add calls. As soon as you move to more than 1 thread, things go bad rather quickly from cache being thrashed by interlocked updates.

(edit 3) I wrote another atomic reference counted proxy (arcproxy) and used that to get actual timings.

(edit 4) added
urcu – user rcu, simulated counter based local quiescent state
urcu2 – user rcu, simulated counter based local quiescent state, interlocked update
smrproxy – updated w/ new timings.

(edit 5) updated urcu timings using corrected membar

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a comment