I attempted to compile the program with two different flag options – O2 and O3, and I got the amazing result. Recompiling with options -O1, -O2 and -O3 improved(decreased time) over 79%, 80% and 81%, respectively. What made the performance so different?
(The code is available in my previous post SPO600: PROJECT I – Rechallenge)
Size | NoFlag | -O1 | -O2 | -O3 |
10000 | 17.094 | 3.579 | 3.258 | 3.162 |
Improved/Reduced time (%) | 79.064 | 80.942 | 81.505 |
To find out the reason, I had to look into it using objdump so that I could learn something there. I compared main() in -O0 and -O3, and I could see -O3 has shorter lines. Now, I will try to modify some code that may affect the result and to post it.
$ objdump -d source –flagO0 | less
0000000000404614 : 404614: a9bc7bfd stp x29, x30, [sp,#-64]! 404618: 910003fd mov x29, sp 40461c: b9001fa0 str w0, [x29,#28] 404620: f9000ba1 str x1, [x29,#16] 404624: 52807d00 mov w0, #0x3e8 // #1000 404628: b9003ba0 str w0, [x29,#56] 40462c: 97fff655 bl 401f80 <clock@plt> 404630: f9001ba0 str x0, [x29,#48] 404634: b9003fbf str wzr, [x29,#60] 404638: b9403fa1 ldr w1, [x29,#60] 40463c: b9403ba0 ldr w0, [x29,#56] 404640: 6b00003f cmp w1, w0 404644: 540000ca b.ge 40465c <main+0x48> 404648: 97fff727 bl 4022e4 <_Z4testv> 40464c: b9403fa0 ldr w0, [x29,#60] 404650: 11000400 add w0, w0, #0x1 404654: b9003fa0 str w0, [x29,#60] 404658: 17fffff8 b 404638 <main+0x24> 40465c: 97fff649 bl 401f80 <clock@plt> 404660: f90017a0 str x0, [x29,#40] 404664: f94017a1 ldr x1, [x29,#40] 404668: f9401ba0 ldr x0, [x29,#48] 40466c: cb000020 sub x0, x1, x0 404670: 9e620001 scvtf d1, x0 404674: d0000040 adrp x0, 40e000 <_ZNSt11__copy_moveILb0ELb0ESt26random_access_iterator_tagE8__copy_mIPcPhEET0_T_S6_S5_+0x8> 404678: 91308000 add x0, x0, #0xc20 40467c: fd400000 ldr d0, [x0] 404680: 1e601820 fdiv d0, d1, d0 404684: 9e780000 fcvtzs x0, d0 404688: f90013a0 str x0, [x29,#32] 40468c: b9403ba1 ldr w1, [x29,#56] 404690: 2a0103e0 mov w0, w1 404694: 531e7400 lsl w0, w0, #2 404698: 0b010000 add w0, w0, w1 40469c: 531f7800 lsl w0, w0, #1 4046a0: 2a0003e1 mov w1, w0 4046a4: d0000040 adrp x0, 40e000 <_ZNSt11__copy_moveILb0ELb0ESt26random_access_iterator_tagE8__copy_mIPcPhEET0_T_S6_S5_+0x8> 4046a8: 91300000 add x0, x0, #0xc00 4046ac: f94013a2 ldr x2, [x29,#32] 4046b0: 97fff6c0 bl 4021b0 <printf@plt> 4046b4: 52800000 mov w0, #0x0 // #0 4046b8: a8c47bfd ldp x29, x30, [sp],#64 4046bc: d65f03c0 ret
$ objdump -d –source flagO3 | less
0000000000401f00 : 401f00: a9be7bfd stp x29, x30, [sp,#-32]! 401f04: 910003fd mov x29, sp 401f08: a90153f3 stp x19, x20, [sp,#16] 401f0c: 97ffff79 bl 401cf0 <clock@plt> 401f10: aa0003f4 mov x20, x0 401f14: 52807d13 mov w19, #0x3e8 // #1000 401f18: 94000152 bl 402460 <_Z4testv> 401f1c: 71000673 subs w19, w19, #0x1 401f20: 54ffffc1 b.ne 401f18 <main+0x18> 401f24: 97ffff73 bl 401cf0 <clock@plt> 401f28: cb140014 sub x20, x0, x20 401f2c: f0000042 adrp x2, 40c000 <_ZN8picosha218hash256_hex_stringIN9__gnu_cxx17__normal_iteratorIPhSt6vectorIhSaIhEEEEEEvT_S8_RNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x6d0> 401f30: 5284e201 mov w1, #0x2710 // #10000 401f34: 9e620280 scvtf d0, x20 401f38: 90000060 adrp x0, 40d000 <_ZN8picosha26detailL22initial_message_digestE+0x10> 401f3c: fd465041 ldr d1, [x2,#3232] 401f40: 91098000 add x0, x0, #0x260 401f44: 1e611800 fdiv d0, d0, d1 401f48: 9e780002 fcvtzs x2, d0 401f4c: 97ffffc5 bl 401e60 <printf@plt> 401f50: 52800000 mov w0, #0x0 // #0 401f54: a94153f3 ldp x19, x20, [sp,#16] 401f58: a8c27bfd ldp x29, x30, [sp],#32 401f5c: d65f03c0 ret
Result with different compile options
Flag -O1
[jbae18@bbetty test]$ c++ -O1 bench.cpp -o flagO1 [jbae18@bbetty test]$ ./flagO1 Total Time: 3.576 [jbae18@bbetty test]$ ./flagO1 Total Time: 3.570 [jbae18@bbetty test]$ ./flagO1 Total Time: 3.604 [jbae18@bbetty test]$ ./flagO1 Total Time: 3.561 [jbae18@bbetty test]$ ./flagO1 Total Time: 3.583 [jbae18@bbetty test]$ ./flagO1 Total Time: 3.564 [jbae18@bbetty test]$ ./flagO1 Total Time: 3.593 [jbae18@bbetty test]$ ./flagO1 Total Time: 3.574 [jbae18@bbetty test]$ ./flagO1 Total Time: 3.594 [jbae18@bbetty test]$ ./flagO1 Total Time: 3.569
Flag -O2
[jbae18@bbetty test]$ c++ -O2 bench.cpp -o flagO2 [jbae18@bbetty test]$ ./flagO2 Total Time: 3.272 [jbae18@bbetty test]$ ./flagO2 Total Time: 3.266 [jbae18@bbetty test]$ ./flagO2 Total Time: 3.255 [jbae18@bbetty test]$ ./flagO2 Total Time: 3.247 [jbae18@bbetty test]$ ./flagO2 Total Time: 3.253 [jbae18@bbetty test]$ ./flagO2 Total Time: 3.292 [jbae18@bbetty test]$ ./flagO2 Total Time: 3.252 [jbae18@bbetty test]$ ./flagO2 Total Time: 3.256 [jbae18@bbetty test]$ ./flagO2 Total Time: 3.245 [jbae18@bbetty test]$ ./flagO2 Total Time: 3.240
Flag -O3
[jbae18@bbetty test]$ c++ -O3 bench.cpp -o flagO3 [jbae18@bbetty test]$ ./flagO3 Total Time: 3.227 [jbae18@bbetty test]$ ./flagO3 Total Time: 3.350 [jbae18@bbetty test]$ ./flagO3 Total Time: 3.277 [jbae18@bbetty test]$ ./flagO3 Total Time: 3.240 [jbae18@bbetty test]$ ./flagO3 Total Time: 3.257 [jbae18@bbetty test]$ ./flagO3 Total Time: 3.272 [jbae18@bbetty test]$ ./flagO3 Total Time: 3.252 [jbae18@bbetty test]$ ./flagO3 Total Time: 3.249 [jbae18@bbetty test]$ ./flagO3 Total Time: 3.239 [jbae18@bbetty test]$ ./flagO3 Total Time: 3.253
**I have updated the result because I forgot to attach the screenshots of my benchmark.
One thought on “SPO600: PROJECT II – strategy 1”