SPO600: PROJECT II – strategy 1

I attempted to compile the program with two different flag options – O2 and O3, and I got the amazing result. Recompiling with options -O1, -O2 and -O3 improved(decreased time) over 79%, 80% and 81%, respectively. What made the performance so different?

(The code is available in my previous post SPO600: PROJECT I – Rechallenge)

Size NoFlag -O1 -O2 -O3
10000 17.094 3.579 3.258 3.162
Improved/Reduced time (%) 79.064 80.942 81.505

To find out the reason, I had to look into it using objdump so that I could learn something there. I compared main() in -O0 and -O3, and I could see -O3 has shorter lines. Now, I will try to modify some code that may affect the result and to post it.
$ objdump -d source –flagO0 | less

0000000000404614 :
  404614:       a9bc7bfd        stp     x29, x30, [sp,#-64]!
  404618:       910003fd        mov     x29, sp
  40461c:       b9001fa0        str     w0, [x29,#28]
  404620:       f9000ba1        str     x1, [x29,#16]
  404624:       52807d00        mov     w0, #0x3e8                      // #1000
  404628:       b9003ba0        str     w0, [x29,#56]
  40462c:       97fff655        bl      401f80 <clock@plt>
  404630:       f9001ba0        str     x0, [x29,#48]
  404634:       b9003fbf        str     wzr, [x29,#60]
  404638:       b9403fa1        ldr     w1, [x29,#60]
  40463c:       b9403ba0        ldr     w0, [x29,#56]
  404640:       6b00003f        cmp     w1, w0
  404644:       540000ca        b.ge    40465c <main+0x48>
  404648:       97fff727        bl      4022e4 <_Z4testv>
  40464c:       b9403fa0        ldr     w0, [x29,#60]
  404650:       11000400        add     w0, w0, #0x1
  404654:       b9003fa0        str     w0, [x29,#60]
  404658:       17fffff8        b       404638 <main+0x24>
  40465c:       97fff649        bl      401f80 <clock@plt>
  404660:       f90017a0        str     x0, [x29,#40]
  404664:       f94017a1        ldr     x1, [x29,#40]
  404668:       f9401ba0        ldr     x0, [x29,#48]
  40466c:       cb000020        sub     x0, x1, x0
  404670:       9e620001        scvtf   d1, x0
  404674:       d0000040        adrp    x0, 40e000 <_ZNSt11__copy_moveILb0ELb0ESt26random_access_iterator_tagE8__copy_mIPcPhEET0_T_S6_S5_+0x8>
  404678:       91308000        add     x0, x0, #0xc20
  40467c:       fd400000        ldr     d0, [x0]
  404680:       1e601820        fdiv    d0, d1, d0
  404684:       9e780000        fcvtzs  x0, d0
  404688:       f90013a0        str     x0, [x29,#32]
  40468c:       b9403ba1        ldr     w1, [x29,#56]
  404690:       2a0103e0        mov     w0, w1
  404694:       531e7400        lsl     w0, w0, #2
  404698:       0b010000        add     w0, w0, w1
  40469c:       531f7800        lsl     w0, w0, #1
  4046a0:       2a0003e1        mov     w1, w0
  4046a4:       d0000040        adrp    x0, 40e000 <_ZNSt11__copy_moveILb0ELb0ESt26random_access_iterator_tagE8__copy_mIPcPhEET0_T_S6_S5_+0x8>
  4046a8:       91300000        add     x0, x0, #0xc00
  4046ac:       f94013a2        ldr     x2, [x29,#32]
  4046b0:       97fff6c0        bl      4021b0 <printf@plt>
  4046b4:       52800000        mov     w0, #0x0                        // #0
  4046b8:       a8c47bfd        ldp     x29, x30, [sp],#64
  4046bc:       d65f03c0        ret

$ objdump -d –source flagO3 | less

0000000000401f00 :
  401f00:       a9be7bfd        stp     x29, x30, [sp,#-32]!
  401f04:       910003fd        mov     x29, sp
  401f08:       a90153f3        stp     x19, x20, [sp,#16]
  401f0c:       97ffff79        bl      401cf0 <clock@plt>
  401f10:       aa0003f4        mov     x20, x0
  401f14:       52807d13        mov     w19, #0x3e8                     // #1000
  401f18:       94000152        bl      402460 <_Z4testv>
  401f1c:       71000673        subs    w19, w19, #0x1
  401f20:       54ffffc1        b.ne    401f18 <main+0x18>
  401f24:       97ffff73        bl      401cf0 <clock@plt>
  401f28:       cb140014        sub     x20, x0, x20
  401f2c:       f0000042        adrp    x2, 40c000 <_ZN8picosha218hash256_hex_stringIN9__gnu_cxx17__normal_iteratorIPhSt6vectorIhSaIhEEEEEEvT_S8_RNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x6d0>
  401f30:       5284e201        mov     w1, #0x2710                     // #10000
  401f34:       9e620280        scvtf   d0, x20
  401f38:       90000060        adrp    x0, 40d000 <_ZN8picosha26detailL22initial_message_digestE+0x10>
  401f3c:       fd465041        ldr     d1, [x2,#3232]
  401f40:       91098000        add     x0, x0, #0x260
  401f44:       1e611800        fdiv    d0, d0, d1
  401f48:       9e780002        fcvtzs  x2, d0
  401f4c:       97ffffc5        bl      401e60 <printf@plt>
  401f50:       52800000        mov     w0, #0x0                        // #0
  401f54:       a94153f3        ldp     x19, x20, [sp,#16]
  401f58:       a8c27bfd        ldp     x29, x30, [sp],#32
  401f5c:       d65f03c0        ret

Result with different compile options

Flag -O1

[jbae18@bbetty test]$ c++ -O1 bench.cpp -o flagO1
[jbae18@bbetty test]$ ./flagO1
Total Time: 3.576
[jbae18@bbetty test]$ ./flagO1
Total Time: 3.570
[jbae18@bbetty test]$ ./flagO1
Total Time: 3.604
[jbae18@bbetty test]$ ./flagO1
Total Time: 3.561
[jbae18@bbetty test]$ ./flagO1
Total Time: 3.583
[jbae18@bbetty test]$ ./flagO1
Total Time: 3.564
[jbae18@bbetty test]$ ./flagO1
Total Time: 3.593
[jbae18@bbetty test]$ ./flagO1
Total Time: 3.574
[jbae18@bbetty test]$ ./flagO1
Total Time: 3.594
[jbae18@bbetty test]$ ./flagO1
Total Time: 3.569

O1.PNG

Flag -O2

[jbae18@bbetty test]$ c++ -O2 bench.cpp -o flagO2
[jbae18@bbetty test]$ ./flagO2
Total Time: 3.272
[jbae18@bbetty test]$ ./flagO2
Total Time: 3.266
[jbae18@bbetty test]$ ./flagO2
Total Time: 3.255
[jbae18@bbetty test]$ ./flagO2
Total Time: 3.247
[jbae18@bbetty test]$ ./flagO2
Total Time: 3.253
[jbae18@bbetty test]$ ./flagO2
Total Time: 3.292
[jbae18@bbetty test]$ ./flagO2
Total Time: 3.252
[jbae18@bbetty test]$ ./flagO2
Total Time: 3.256
[jbae18@bbetty test]$ ./flagO2
Total Time: 3.245
[jbae18@bbetty test]$ ./flagO2
Total Time: 3.240

O2.PNG

Flag -O3

[jbae18@bbetty test]$ c++ -O3 bench.cpp -o flagO3
[jbae18@bbetty test]$ ./flagO3
Total Time: 3.227
[jbae18@bbetty test]$ ./flagO3
Total Time: 3.350
[jbae18@bbetty test]$ ./flagO3
Total Time: 3.277
[jbae18@bbetty test]$ ./flagO3
Total Time: 3.240
[jbae18@bbetty test]$ ./flagO3
Total Time: 3.257
[jbae18@bbetty test]$ ./flagO3
Total Time: 3.272
[jbae18@bbetty test]$ ./flagO3
Total Time: 3.252
[jbae18@bbetty test]$ ./flagO3
Total Time: 3.249
[jbae18@bbetty test]$ ./flagO3
Total Time: 3.239
[jbae18@bbetty test]$ ./flagO3
Total Time: 3.253

O3.PNG

**I have updated the result because I forgot to attach the screenshots of my benchmark. 

One thought on “SPO600: PROJECT II – strategy 1

  1. Pingback: SPO600: PROJECT II – prove | Jiyoung (Irene) Bae

Leave a comment