Meltdown / Spectre に対応した Raspberry Pi 向け CentOS 用 Kernel を検証してみた。
さすがに rt-tests の Cyclictest 結果だけでは面白くないので、UnixBench でも試験してみた。
結論だけ言えば、やはりあまり影響はなさそうだった。
■試験環境整備
・UnixBench のビルド
まずは必要となる物をインストール
# yum groupinstall "Development Tools" # yum install perl-Time-HiRes
・UnixBench のダウンロードとビルド
# cd /usr/local/src/ # wget -N https://github.com/kdlucas/byte-unixbench/tarball/61663da4fd51a0a5d514ce670884b3ed0ef81608 # tar zxvf ./61663da4fd51a0a5d514ce670884b3ed0ef81608 # chown -R admin. /usr/local/src/kdlucas-byte-unixbench-61663da/ # cd /usr/local/src/kdlucas-byte-unixbench-61663da/UnixBench/ # cp -piav Makefile Makefile.org # sed -i -e 's/^OPTON += -march=native -mtune=native/OPTON += -mcpu=cortex-a15 -mfpu=neon-vfpv4/g' Makefile # su - admin -c "cd /usr/local/src/kdlucas-byte-unixbench-61663da/UnixBench/ ; make all"
・試験実行
# cd /usr/local/src/kdlucas-byte-unixbench-61663da/UnixBench/ # ./Run
■Meltdown / Spectre 対策前後の比較。
●4.9.68 vs 4.9.76 (Single)
Test case | Kernel-4.9.68 (Single) |
Kernel-4.9.76 (Single) |
Diff |
Dhrystone 2 using register variables | 4444482.7 | 4446369.6 | 100% |
Double-Precision Whetstone | 925.4 | 927.6 | 100% |
Execl Throughput | 859.1 | 859.7 | 100% |
File Copy 1024 bufsize 2000 maxblocks | 122908.2 | 122338.7 | 100% |
File Copy 256 bufsize 500 maxblocks | 34494 | 34371.4 | 100% |
File Copy 4096 bufsize 8000 maxblocks | 327961.5 | 327680.2 | 100% |
Pipe Throughput | 206251.7 | 206728.9 | 100% |
Pipe-based Context Switching | 49852.1 | 49842.8 | 100% |
Process Creation | 2185 | 2200.8 | 101% |
Shell Scripts (1 concurrent) | 1413.7 | 1412.3 | 100% |
Shell Scripts (8 concurrent) | 386.1 | 384.4 | 100% |
System Call Overhead | 359893.9 | 345096 | 96% |
●4.9.68 vs 4.9.76 (Multi)
Test case | Kernel-4.9.68 (Multi) |
Kernel-4.9.76 (Multi) |
Diff |
Dhrystone 2 using register variables | 17747491.1 | 17748069.1 | 100% |
Double-Precision Whetstone | 3705 | 3704.6 | 100% |
Execl Throughput | 2059.2 | 2070.2 | 101% |
File Copy 1024 bufsize 2000 maxblocks | 234876.6 | 234681 | 100% |
File Copy 256 bufsize 500 maxblocks | 63297.1 | 63788 | 101% |
File Copy 4096 bufsize 8000 maxblocks | 594215.6 | 596785.1 | 100% |
Pipe Throughput | 818667.2 | 825419.7 | 101% |
Pipe-based Context Switching | 188870.9 | 188815.8 | 100% |
Process Creation | 4639.2 | 4714 | 102% |
Shell Scripts (1 concurrent) | 2976 | 2986.7 | 100% |
Shell Scripts (8 concurrent) | 434.9 | 433.7 | 100% |
System Call Overhead | 1394643.9 | 1344020.8 | 96% |
●4.9.68.rt60 vs 4.9.76.rt61 (Single)
Test case | Kernel-4.9.68.rt60 (Single) |
Kernel-4.9.76.rt61 (Single) |
Diff |
Dhrystone 2 using register variables | 4427651.6 | 4423413.4 | 100% |
Double-Precision Whetstone | 923.2 | 921.8 | 100% |
Execl Throughput | 663.7 | 667.5 | 101% |
File Copy 1024 bufsize 2000 maxblocks | 90040.7 | 88057 | 98% |
File Copy 256 bufsize 500 maxblocks | 24912.3 | 24404.4 | 98% |
File Copy 4096 bufsize 8000 maxblocks | 254216 | 250157.8 | 98% |
Pipe Throughput | 158194.7 | 159817.2 | 101% |
Pipe-based Context Switching | 42960 | 42581.9 | 99% |
Process Creation | 1570 | 1597.7 | 102% |
Shell Scripts (1 concurrent) | 1213 | 1216.6 | 100% |
Shell Scripts (8 concurrent) | 243.1 | 244.2 | 100% |
System Call Overhead | 278139.2 | 285093.3 | 103% |
●4.9.68.rt60 vs 4.9.76.rt61 (Multi)
Test case | Kernel-4.9.68.rt60 (Multi) |
Kernel-4.9.76.rt61 (Multi) |
Diff |
Dhrystone 2 using register variables | 17657614.4 | 17656872.1 | 100% |
Double-Precision Whetstone | 3688 | 3689.6 | 100% |
Execl Throughput | 1145.7 | 1146.8 | 100% |
File Copy 1024 bufsize 2000 maxblocks | 59137.5 | 58784.8 | 99% |
File Copy 256 bufsize 500 maxblocks | 14873 | 14858 | 100% |
File Copy 4096 bufsize 8000 maxblocks | 215322.4 | 213830.9 | 99% |
Pipe Throughput | 629543.9 | 634778.9 | 101% |
Pipe-based Context Switching | 173011.4 | 173843.6 | 100% |
Process Creation | 3207.7 | 3210.3 | 100% |
Shell Scripts (1 concurrent) | 2053.9 | 2057.6 | 100% |
Shell Scripts (8 concurrent) | 192.4 | 193 | 100% |
System Call Overhead | 1088837.3 | 1115143.2 | 102% |
■【おまけ】あまり参考にならない比較
●通常Kernel とRT-Kernel の比較
・4.9.68 vs 4.9.68.rt60 (Single)
Test case | Kernel-4.9.68 (Single) |
Kernel-4.9.68.rt60 (Single) |
Diff |
Dhrystone 2 using register variables | 4444482.7 | 4427651.6 | 100% |
Double-Precision Whetstone | 925.4 | 923.2 | 100% |
Execl Throughput | 859.1 | 663.7 | 77% |
File Copy 1024 bufsize 2000 maxblocks | 122908.2 | 90040.7 | 73% |
File Copy 256 bufsize 500 maxblocks | 34494 | 24912.3 | 72% |
File Copy 4096 bufsize 8000 maxblocks | 327961.5 | 254216 | 78% |
Pipe Throughput | 206251.7 | 158194.7 | 77% |
Pipe-based Context Switching | 49852.1 | 42960 | 86% |
Process Creation | 2185 | 1570 | 72% |
Shell Scripts (1 concurrent) | 1413.7 | 1213 | 86% |
Shell Scripts (8 concurrent) | 386.1 | 243.1 | 63% |
System Call Overhead | 359893.9 | 278139.2 | 77% |
・4.9.68 vs 4.9.68.rt60 (Multi)
Test case | Kernel-4.9.68 (Multi) |
Kernel-4.9.68.rt60 (Multi) |
Diff |
Dhrystone 2 using register variables | 17747491.1 | 17657614.4 | 99% |
Double-Precision Whetstone | 3705 | 3688 | 100% |
Execl Throughput | 2059.2 | 1145.7 | 56% |
File Copy 1024 bufsize 2000 maxblocks | 234876.6 | 59137.5 | 25% |
File Copy 256 bufsize 500 maxblocks | 63297.1 | 14873 | 23% |
File Copy 4096 bufsize 8000 maxblocks | 594215.6 | 215322.4 | 36% |
Pipe Throughput | 818667.2 | 629543.9 | 77% |
Pipe-based Context Switching | 188870.9 | 173011.4 | 92% |
Process Creation | 4639.2 | 3207.7 | 69% |
Shell Scripts (1 concurrent) | 2976 | 2053.9 | 69% |
Shell Scripts (8 concurrent) | 434.9 | 192.4 | 44% |
System Call Overhead | 1394643.9 | 1088837.3 | 78% |
・4.9.76 vs 4.9.76.rt61 (Single)
Test case | Kernel-4.9.76 (Single) |
Kernel-4.9.76.rt61 (Single) |
Diff |
Dhrystone 2 using register variables | 4446369.6 | 4423413.4 | 99% |
Double-Precision Whetstone | 927.6 | 921.8 | 99% |
Execl Throughput | 859.7 | 667.5 | 78% |
File Copy 1024 bufsize 2000 maxblocks | 122338.7 | 88057 | 72% |
File Copy 256 bufsize 500 maxblocks | 34371.4 | 24404.4 | 71% |
File Copy 4096 bufsize 8000 maxblocks | 327680.2 | 250157.8 | 76% |
Pipe Throughput | 206728.9 | 159817.2 | 77% |
Pipe-based Context Switching | 49842.8 | 42581.9 | 85% |
Process Creation | 2200.8 | 1597.7 | 73% |
Shell Scripts (1 concurrent) | 1412.3 | 1216.6 | 86% |
Shell Scripts (8 concurrent) | 384.4 | 244.2 | 64% |
System Call Overhead | 345096 | 285093.3 | 83% |
・4.9.76 vs 4.9.76.rt61 (Multi)
Test case | Kernel-4.9.76 (Multi) |
Kernel-4.9.76.rt61 (Multi) |
Diff |
Dhrystone 2 using register variables | 17748069.1 | 17656872.1 | 99% |
Double-Precision Whetstone | 3704.6 | 3689.6 | 100% |
Execl Throughput | 2070.2 | 1146.8 | 55% |
File Copy 1024 bufsize 2000 maxblocks | 234681 | 58784.8 | 25% |
File Copy 256 bufsize 500 maxblocks | 63788 | 14858 | 23% |
File Copy 4096 bufsize 8000 maxblocks | 596785.1 | 213830.9 | 36% |
Pipe Throughput | 825419.7 | 634778.9 | 77% |
Pipe-based Context Switching | 188815.8 | 173843.6 | 92% |
Process Creation | 4714 | 3210.3 | 68% |
Shell Scripts (1 concurrent) | 2986.7 | 2057.6 | 69% |
Shell Scripts (8 concurrent) | 433.7 | 193 | 45% |
System Call Overhead | 1344020.8 | 1115143.2 | 83% |
●1core vs 4core
・4.9.76 (Single) vs 4.9.76 (Multi)
Test case | Kernel-4.9.76 (Single) |
Kernel-4.9.76 (Multi) |
Diff |
Dhrystone 2 using register variables | 4446369.6 | 17748069.1 | 399% |
Double-Precision Whetstone | 927.6 | 3704.6 | 399% |
Execl Throughput | 859.7 | 2070.2 | 241% |
File Copy 1024 bufsize 2000 maxblocks | 122338.7 | 234681 | 192% |
File Copy 256 bufsize 500 maxblocks | 34371.4 | 63788 | 186% |
File Copy 4096 bufsize 8000 maxblocks | 327680.2 | 596785.1 | 182% |
Pipe Throughput | 206728.9 | 825419.7 | 399% |
Pipe-based Context Switching | 49842.8 | 188815.8 | 379% |
Process Creation | 2200.8 | 4714 | 214% |
Shell Scripts (1 concurrent) | 1412.3 | 2986.7 | 211% |
Shell Scripts (8 concurrent) | 384.4 | 433.7 | 113% |
System Call Overhead | 345096 | 1344020.8 | 389% |
・4.9.76.rt61 (Single) vs 4.9.76.rt61 (Multi)
Test case | Kernel-4.9.76.rt61 (Single) |
Kernel-4.9.76.rt61 (Multi) |
Diff |
Dhrystone 2 using register variables | 4423413.4 | 17656872.1 | 399% |
Double-Precision Whetstone | 921.8 | 3689.6 | 400% |
Execl Throughput | 667.5 | 1146.8 | 172% |
File Copy 1024 bufsize 2000 maxblocks | 88057 | 58784.8 | 67% |
File Copy 256 bufsize 500 maxblocks | 24404.4 | 14858 | 61% |
File Copy 4096 bufsize 8000 maxblocks | 250157.8 | 213830.9 | 85% |
Pipe Throughput | 159817.2 | 634778.9 | 397% |
Pipe-based Context Switching | 42581.9 | 173843.6 | 408% |
Process Creation | 1597.7 | 3210.3 | 201% |
Shell Scripts (1 concurrent) | 1216.6 | 2057.6 | 169% |
Shell Scripts (8 concurrent) | 244.2 | 193 | 79% |
System Call Overhead | 285093.3 | 1115143.2 | 391% |
■思ったこと
・RT-Kernel の特性が、やはり普通の Kernel とはだいぶ異なる感じであった。
・コア数が増えても上がりにくいパフォーマンスの動向すら違う。むしろ下がることすら。