だいありー

_23(Tue)

Intel CPU で連続アクセスは速い、という話。

メモリをざらっとなめる処理があるとする。

	for (i=0; i<s; i++) {
		RVALUE *val = &vals[i];
		if (val->flags) {
		    do_something_wrong(val);
		}
	}

vals にある val をイチイチ見ていって、flags を見て、0 以外ならなんかする、という処理。

これを高速化しようと思う。よく見ると、ある条件だったら、この分岐をスキップできることがわかったとする。例えば、一つおきに見るだけで良い、ということがわかったとする。

	for (i=0; i<s; i++) {
	    if (!(*skip_func)(i)) {
		RVALUE *val = &vals[i];
		if (val->flags) {
		    do_something_wrong(val);
		}
	    }
	}

skip_func が !true の時だけ条件を見るので、メモリアクセスがなくなり、速くなりそうだ。skip_func は実験のために入れ替え可能にしておく。これをまとめたプログラムが以下。

#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

typedef struct {
    uint64_t flags;
    uint64_t data[4];
}RVALUE;

__attribute__((noinline))
static void
do_something_wrong(RVALUE *val)
{
    fprintf(stderr, "do_something_wrong: %p\n", val);
}

static int skip_0(int n){return (n%10) < 0;} /* 0/10, none */
static int skip_1(int n){return (n%10) < 1;} /* 1/10 */
static int skip_2(int n){return (n%10) < 2;} /* 2/10 */
static int skip_3(int n){return (n%10) < 3;} /* 3/10 */
static int skip_4(int n){return (n%10) < 4;} /* 4/10 */
static int skip_5(int n){return (n%10) < 5;} /* 5/10 */
static int skip_6(int n){return (n%10) < 6;} /* 6/10 */
static int skip_7(int n){return (n%10) < 7;} /* 7/10 */
static int skip_8(int n){return (n%10) < 8;} /* 8/10 */
static int skip_9(int n){return (n%10) < 9;} /* 9/10 */
static int skip_a(int n){return (n%10) < 10;} /* 10/10, all */

int
main(int argc, char *argv[])
{
    int i, j;
    const int s = 1024 * 1024 * 10; /* 10M */
    RVALUE *vals = (RVALUE *)calloc(sizeof(RVALUE), s);
    int (*skip_func)(int);

    if (argc < 2) {
	exit(1);
    }

    skip_func = NULL;
    switch (argv[1][0]) {
      case '0':
	skip_func = skip_0;
	break;
      case '1':
	skip_func = skip_1;
	break;
      case '2':
	skip_func = skip_2;
	break;
      case '3':
	skip_func = skip_3;
	break;
      case '4':
	skip_func = skip_4;
	break;
      case '5':
	skip_func = skip_5;
	break;
      case '6':
	skip_func = skip_6;
	break;
      case '7':
	skip_func = skip_7;
	break;
      case '8':
	skip_func = skip_8;
	break;
      case '9':
	skip_func = skip_9;
	break;
      case 'a':
	skip_func = skip_a;
	break;
      default:
	fprintf(stderr, "unsupported: %s\n", argv[1]);
	exit(1);
    }

    for (j=0; j<100; j++) {
	for (i=0; i<s; i++) {
	    if (!(*skip_func)(i)) {
		RVALUE *val = &vals[i];
		if (val->flags) {
		    do_something_wrong(val);
		}
	    }
	}
    }
}

0 を渡すと skip_0() が使われる。これは、必ず 0 を返すので、つまり skip しない。1 だと、10 回中 1 回 skip する。... 9 だと 10 回中 9 回スキップする。a だと、必ず 1 を返すので全部 skip する。

さて、実験結果を示す。

model name      : Intel(R) Core(TM) i5-3380M CPU @ 2.90GHz

       user     system      total        real
0  0.000000   0.000000   3.200000 (  3.213057)
1  0.000000   0.000000   3.980000 (  3.988034)
2  0.000000   0.000000   4.690000 (  4.680758)
3  0.000000   0.000000   5.370000 (  5.374492)
4  0.000000   0.000000   6.730000 (  6.750303)
5  0.000000   0.000000   7.080000 (  7.078628)
6  0.000000   0.000000   6.940000 (  6.946066)
7  0.000000   0.000000   6.210000 (  6.216578)
8  0.000000   0.000000   8.360000 (  8.358197)
9  0.000000   0.000000   5.480000 (  5.483612)
a  0.010000   0.000000   1.970000 (  1.965609)

model name      : Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz

       user     system      total        real
0  0.000000   0.000000   2.200000 (  2.205095)
1  0.000000   0.000000   2.310000 (  2.317280)
2  0.000000   0.000000   2.460000 (  2.460013)
3  0.000000   0.000000   2.540000 (  2.538290)
4  0.000000   0.000000   2.650000 (  2.649510)
5  0.000000   0.000000   2.760000 (  2.764821)
6  0.000000   0.000000   3.280000 (  3.283949)
7  0.000000   0.000000   4.120000 (  4.124645)
8  0.000000   0.000000   7.710000 (  7.709284)
9  0.000000   0.000000   3.730000 (  3.722011)
a  0.000000   0.000000   1.440000 (  1.450049)

ラズパイ3
       user     system      total        real
0  0.000000   0.000000   5.100000 (  5.119621)
1  0.000000   0.010000   5.660000 (  5.661095)
2  0.000000   0.000000   5.930000 (  5.938469)
3  0.000000   0.000000   6.430000 (  6.439446)
4  0.000000   0.000000   6.360000 (  6.391132)
5  0.000000   0.000000   5.840000 (  5.852021)
6  0.000000   0.000000   5.660000 (  5.674228)
7  0.000000   0.000000   6.890000 (  6.894989)
8  0.000000   0.000000   7.550000 (  7.564158)
9  0.000000   0.000000   6.390000 (  6.446177)
a  0.000000   0.010000   3.310000 (  3.308592)```

どれも、全部アクセスしたほうが速い（もちろん、全部スキップが一番速いが、それは比較対象ということで）。

i7-6700 を使って、perf を回してみる。

./a.out 1

 Performance counter stats for './a.out 1':

       2327.312108      task-clock (msec)         #    1.000 CPUs utilized
                 3      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               754      page-faults               #    0.324 K/sec
     8,552,271,698      cycles                    #    3.675 GHz                      (30.39%)
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    25,136,427,035      instructions              #    2.94  insns per cycle          (38.13%)
     5,154,158,059      branches                  # 2214.640 M/sec                    (38.29%)
            23,945      branch-misses             #    0.00% of all branches          (38.86%)
     1,964,678,774      L1-dcache-loads           #  844.184 M/sec                    (38.97%)
       667,506,576      L1-dcache-load-misses     #   33.98% of all L1-dcache hits    (38.94%)
        27,388,843      LLC-loads                 #   11.768 M/sec                    (31.10%)
        18,668,574      LLC-load-misses           #  136.32% of all LL-cache hits     (31.05%)
   <not supported>      L1-icache-loads
           209,135      L1-icache-load-misses     #    0.090 M/sec                    (30.99%)
     1,995,968,384      dTLB-loads                #  857.628 M/sec                    (30.94%)
               517      dTLB-load-misses          #    0.00% of all dTLB cache hits   (30.78%)
               108      iTLB-loads                #    0.046 K/sec                    (30.60%)
                56      iTLB-load-misses          #   51.85% of all iTLB cache hits   (30.43%)
   <not supported>      L1-dcache-prefetches
   <not supported>      L1-dcache-prefetch-misses

       2.327875472 seconds time elapsed

./a.out 2

 Performance counter stats for './a.out 2':

       2479.055604      task-clock (msec)         #    1.000 CPUs utilized
                 2      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               753      page-faults               #    0.304 K/sec
     9,129,307,476      cycles                    #    3.683 GHz                      (30.37%)
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    24,642,643,947      instructions              #    2.70  insns per cycle          (38.56%)
     4,996,575,860      branches                  # 2015.516 M/sec                    (38.66%)
            24,533      branch-misses             #    0.00% of all branches          (38.76%)
     1,863,534,823      L1-dcache-loads           #  751.712 M/sec                    (38.86%)
       665,528,308      L1-dcache-load-misses     #   35.71% of all L1-dcache hits    (38.96%)
        30,044,469      LLC-loads                 #   12.119 M/sec                    (31.12%)
        21,559,455      LLC-load-misses           #  143.52% of all LL-cache hits     (31.07%)
   <not supported>      L1-icache-loads
           212,872      L1-icache-load-misses     #    0.086 M/sec                    (31.02%)
     1,893,886,927      dTLB-loads                #  763.955 M/sec                    (30.95%)
               971      dTLB-load-misses          #    0.00% of all dTLB cache hits   (30.78%)
               376      iTLB-loads                #    0.152 K/sec                    (30.62%)
                39      iTLB-load-misses          #   10.37% of all iTLB cache hits   (30.46%)
   <not supported>      L1-dcache-prefetches
   <not supported>      L1-dcache-prefetch-misses

       2.479202848 seconds time elapsed

./a.out 3

 Performance counter stats for './a.out 3':

       2537.381335      task-clock (msec)         #    1.000 CPUs utilized
                 4      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               756      page-faults               #    0.298 K/sec
     9,356,138,785      cycles                    #    3.687 GHz                      (30.48%)
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    24,727,249,626      instructions              #    2.64  insns per cycle          (38.20%)
     4,948,329,824      branches                  # 1950.172 M/sec                    (38.20%)
            23,307      branch-misses             #    0.00% of all branches          (38.35%)
     1,759,461,670      L1-dcache-loads           #  693.416 M/sec                    (38.88%)
       654,410,524      L1-dcache-load-misses     #   37.19% of all L1-dcache hits    (38.94%)
        39,766,013      LLC-loads                 #   15.672 M/sec                    (31.10%)
        30,325,538      LLC-load-misses           #  152.52% of all LL-cache hits     (31.05%)
   <not supported>      L1-icache-loads
           189,102      L1-icache-load-misses     #    0.075 M/sec                    (31.00%)
     1,783,276,624      dTLB-loads                #  702.802 M/sec                    (30.95%)
               634      dTLB-load-misses          #    0.00% of all dTLB cache hits   (30.90%)
               250      iTLB-loads                #    0.099 K/sec                    (30.76%)
                42      iTLB-load-misses          #   16.80% of all iTLB cache hits   (30.60%)
   <not supported>      L1-dcache-prefetches
   <not supported>      L1-dcache-prefetch-misses

       2.537501039 seconds time elapsed

./a.out 4

 Performance counter stats for './a.out 4':

       2653.688467      task-clock (msec)         #    1.000 CPUs utilized
                 4      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               755      page-faults               #    0.285 K/sec
     9,273,553,579      cycles                    #    3.495 GHz                      (30.81%)
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    24,463,594,085      instructions              #    2.64  insns per cycle          (38.50%)
     4,833,049,097      branches                  # 1821.257 M/sec                    (38.50%)
            38,561      branch-misses             #    0.00% of all branches          (38.50%)
     1,669,552,121      L1-dcache-loads           #  629.144 M/sec                    (38.50%)
       621,956,935      L1-dcache-load-misses     #   37.25% of all L1-dcache hits    (38.47%)
        54,474,331      LLC-loads                 #   20.528 M/sec                    (31.06%)
        43,355,855      LLC-load-misses           #  159.18% of all LL-cache hits     (31.01%)
   <not supported>      L1-icache-loads
           246,031      L1-icache-load-misses     #    0.093 M/sec                    (30.96%)
     1,677,351,049      dTLB-loads                #  632.083 M/sec                    (30.92%)
               515      dTLB-load-misses          #    0.00% of all dTLB cache hits   (30.87%)
               483      iTLB-loads                #    0.182 K/sec                    (30.82%)
                 3      iTLB-load-misses          #    0.62% of all iTLB cache hits   (30.78%)
   <not supported>      L1-dcache-prefetches
   <not supported>      L1-dcache-prefetch-misses

       2.653860646 seconds time elapsed

./a.out 5

 Performance counter stats for './a.out 5':

       2681.594573      task-clock (msec)         #    1.000 CPUs utilized
                 2      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               753      page-faults               #    0.281 K/sec
     9,783,738,826      cycles                    #    3.648 GHz                      (30.81%)
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    24,068,897,294      instructions              #    2.46  insns per cycle          (38.59%)
     4,699,861,998      branches                  # 1752.637 M/sec                    (38.68%)
            25,116      branch-misses             #    0.00% of all branches          (38.77%)
     1,560,676,047      L1-dcache-loads           #  581.996 M/sec                    (38.86%)
       572,066,365      L1-dcache-load-misses     #   36.66% of all L1-dcache hits    (38.90%)
        63,843,875      LLC-loads                 #   23.808 M/sec                    (31.07%)
        50,944,021      LLC-load-misses           #  159.59% of all LL-cache hits     (31.03%)
   <not supported>      L1-icache-loads
           204,683      L1-icache-load-misses     #    0.076 M/sec                    (30.88%)
     1,578,661,520      dTLB-loads                #  588.703 M/sec                    (30.73%)
               814      dTLB-load-misses          #    0.00% of all dTLB cache hits   (30.58%)
               292      iTLB-loads                #    0.109 K/sec                    (30.43%)
                16      iTLB-load-misses          #    5.48% of all iTLB cache hits   (30.43%)
   <not supported>      L1-dcache-prefetches
   <not supported>      L1-dcache-prefetch-misses

       2.681733196 seconds time elapsed

./a.out 6

 Performance counter stats for './a.out 6':

       3095.983614      task-clock (msec)         #    1.000 CPUs utilized
                 3      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               753      page-faults               #    0.243 K/sec
    11,037,737,838      cycles                    #    3.565 GHz                      (30.86%)
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    23,859,344,685      instructions              #    2.16  insns per cycle          (38.62%)
     4,597,358,232      branches                  # 1484.943 M/sec                    (38.70%)
            27,811      branch-misses             #    0.00% of all branches          (38.78%)
     1,457,689,889      L1-dcache-loads           #  470.833 M/sec                    (38.86%)
       476,795,783      L1-dcache-load-misses     #   32.71% of all L1-dcache hits    (38.81%)
        63,475,064      LLC-loads                 #   20.502 M/sec                    (31.01%)
        58,835,138      LLC-load-misses           #  185.38% of all LL-cache hits     (30.88%)
   <not supported>      L1-icache-loads
           293,526      L1-icache-load-misses     #    0.095 M/sec                    (30.76%)
     1,477,587,652      dTLB-loads                #  477.260 M/sec                    (30.63%)
             1,026      dTLB-load-misses          #    0.00% of all dTLB cache hits   (30.50%)
             3,463      iTLB-loads                #    0.001 M/sec                    (30.49%)
                 6      iTLB-load-misses          #    0.17% of all iTLB cache hits   (30.77%)
   <not supported>      L1-dcache-prefetches
   <not supported>      L1-dcache-prefetch-misses

       3.096129522 seconds time elapsed

./a.out 7

 Performance counter stats for './a.out 7':

       3572.644866      task-clock (msec)         #    1.000 CPUs utilized
                 3      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               754      page-faults               #    0.211 K/sec
    12,181,180,947      cycles                    #    3.410 GHz                      (30.56%)
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    23,623,060,110      instructions              #    1.94  insns per cycle          (38.59%)
     4,486,955,892      branches                  # 1255.920 M/sec                    (38.66%)
            28,584      branch-misses             #    0.00% of all branches          (38.73%)
     1,353,500,319      L1-dcache-loads           #  378.851 M/sec                    (38.80%)
       407,229,242      L1-dcache-load-misses     #   30.09% of all L1-dcache hits    (38.77%)
        81,939,252      LLC-loads                 #   22.935 M/sec                    (30.98%)
        81,433,098      LLC-load-misses           #  198.76% of all LL-cache hits     (30.95%)
   <not supported>      L1-icache-loads
           290,817      L1-icache-load-misses     #    0.081 M/sec                    (30.91%)
     1,370,395,850      dTLB-loads                #  383.580 M/sec                    (30.82%)
               755      dTLB-load-misses          #    0.00% of all dTLB cache hits   (30.71%)
             1,023      iTLB-loads                #    0.286 K/sec                    (30.60%)
                72      iTLB-load-misses          #    7.04% of all iTLB cache hits   (30.49%)
   <not supported>      L1-dcache-prefetches
   <not supported>      L1-dcache-prefetch-misses

       3.572793663 seconds time elapsed

./a.out 8

 Performance counter stats for './a.out 8':

       7913.908840      task-clock (msec)         #    1.000 CPUs utilized
                 4      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               754      page-faults               #    0.095 K/sec
    18,954,643,011      cycles                    #    2.395 GHz                      (30.79%)
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    23,582,079,752      instructions              #    1.24  insns per cycle          (38.51%)
     4,421,746,574      branches                  #  558.731 M/sec                    (38.54%)
            42,481      branch-misses             #    0.00% of all branches          (38.54%)
     1,263,967,657      L1-dcache-loads           #  159.715 M/sec                    (38.54%)
       231,031,943      L1-dcache-load-misses     #   18.28% of all L1-dcache hits    (38.51%)
       151,673,082      LLC-loads                 #   19.165 M/sec                    (30.74%)
       147,241,641      LLC-load-misses           #  194.16% of all LL-cache hits     (30.85%)
   <not supported>      L1-icache-loads
           603,917      L1-icache-load-misses     #    0.076 M/sec                    (30.84%)
     1,260,088,327      dTLB-loads                #  159.225 M/sec                    (30.82%)
               730      dTLB-load-misses          #    0.00% of all dTLB cache hits   (30.81%)
               727      iTLB-loads                #    0.092 K/sec                    (30.79%)
             1,615      iTLB-load-misses          #  222.15% of all iTLB cache hits   (30.78%)
   <not supported>      L1-dcache-prefetches
   <not supported>      L1-dcache-prefetch-misses

       7.913997921 seconds time elapsed

./a.out 9

 Performance counter stats for './a.out 9':

       3641.973232      task-clock (msec)         #    1.000 CPUs utilized
                 3      context-switches          #    0.001 K/sec
                 0      cpu-migrations            #    0.000 K/sec
               752      page-faults               #    0.206 K/sec
    13,417,059,445      cycles                    #    3.684 GHz                      (30.81%)
   <not supported>      stalled-cycles-frontend
   <not supported>      stalled-cycles-backend
    23,405,171,543      instructions              #    1.74  insns per cycle          (38.49%)
     4,313,153,885      branches                  # 1184.290 M/sec                    (38.49%)
            31,343      branch-misses             #    0.00% of all branches          (38.49%)
     1,154,790,590      L1-dcache-loads           #  317.078 M/sec                    (38.49%)
       131,470,216      L1-dcache-load-misses     #   11.38% of all L1-dcache hits    (38.47%)
        11,868,351      LLC-loads                 #    3.259 M/sec                    (30.98%)
        11,264,153      LLC-load-misses           #  189.82% of all LL-cache hits     (30.95%)
   <not supported>      L1-icache-loads
           322,708      L1-icache-load-misses     #    0.089 M/sec                    (30.91%)
     1,150,387,637      dTLB-loads                #  315.869 M/sec                    (30.88%)
             1,060      dTLB-load-misses          #    0.00% of all dTLB cache hits   (30.85%)
               406      iTLB-loads                #    0.111 K/sec                    (30.81%)
                10      iTLB-load-misses          #    2.46% of all iTLB cache hits   (30.78%)
   <not supported>      L1-dcache-prefetches
   <not supported>      L1-dcache-prefetch-misses

       3.642148586 seconds time elapsed

LLC で grep してみる（last level cache かと思ったら、naruse さんによると longest latency cache らしい）。

        27,388,843      LLC-loads                 #   11.768 M/sec                    (31.10%)
        18,668,574      LLC-load-misses           #  136.32% of all LL-cache hits     (31.05%)
        30,044,469      LLC-loads                 #   12.119 M/sec                    (31.12%)
        21,559,455      LLC-load-misses           #  143.52% of all LL-cache hits     (31.07%)
        39,766,013      LLC-loads                 #   15.672 M/sec                    (31.10%)
        30,325,538      LLC-load-misses           #  152.52% of all LL-cache hits     (31.05%)
        54,474,331      LLC-loads                 #   20.528 M/sec                    (31.06%)
        43,355,855      LLC-load-misses           #  159.18% of all LL-cache hits     (31.01%)
        63,843,875      LLC-loads                 #   23.808 M/sec                    (31.07%)
        50,944,021      LLC-load-misses           #  159.59% of all LL-cache hits     (31.03%)
        63,475,064      LLC-loads                 #   20.502 M/sec                    (31.01%)
        58,835,138      LLC-load-misses           #  185.38% of all LL-cache hits     (30.88%)
        81,939,252      LLC-loads                 #   22.935 M/sec                    (30.98%)
        81,433,098      LLC-load-misses           #  198.76% of all LL-cache hits     (30.95%)
       151,673,082      LLC-loads                 #   19.165 M/sec                    (30.74%)
       147,241,641      LLC-load-misses           #  194.16% of all LL-cache hits     (30.85%)
        11,868,351      LLC-loads                 #    3.259 M/sec                    (30.98%)
        11,264,153      LLC-load-misses           #  189.82% of all LL-cache hits     (30.95%)

どう見ても、LLC-load が増えてる。ミスも増えてる（ほとんどミスする）。

さて、これは一体なんでか。LLC の prefetch じゃないか、というのが仮説。

「インテル® 64 アーキテクチャーおよびIA-32 アーキテクチャー最適化リファレンス・マニュアル」を見ると、

ストリーマー：昇順および降順のアドレスシーケンスに対して、L1 キャッシュからの読み込み要求を監視する。監視される読み込み要求には、ロード操作とストア操作およびハードウェア・プリフェッチによって開始されたL1 D キャッシュ要求、およびコードフェッチに対するL1 命令キャッシュ要求が含まれる。前方または後方の要求ストリームが検出されると、予想されるキャッシュラインがプリフェッチされる。プリフェッチされるキャッシュラインは同じ4Kページになければならない。

とあるので、これが働くかどうかが勝負の分かれ目っぽい。

というわけで、とりあえずうまくいかないことがわかりました。

_19(Fri)

原田さんのビスケットをテキストでやるとどうなるかな、と思って、ちょっと書いてみた。

require 'pp'

class Cake
  def initialize texts = [' ' * 80]
    @texts = texts
    @rules = []
  end

  def <<(rule)
    @rules << rule
  end

  def kick
    @texts = @texts.map{|text|
      @rules.find{|rule|
        obj = rule.apply(text)
        # p [rule, text, obj]
        break obj if obj
      } || text
    }
  end

  def start
    while true
      kick
      pp @texts
      STDOUT.flush
      sleep 0.01
    end
  end

  def insert text
    @new_texts << text
  end

  class Rule
    def apply text, cake
      # do something
    end
  end

  class GsubRule
    def initialize pattern, replace
      @pattern = pattern
      @replace = replace
    end

    def apply text
      if text.match? @pattern
        text.gsub(@pattern){ @replace }
      end
    end
  end
end

$cake = Cake.new(['>                                       '])
def r pat, txt
  $cake << Cake::GsubRule.new(pat, txt)
end
END{$cake.start}

# cake lang
r(/> /  , ' >')
r(/>\z/ , '<')
r(/ </  , '< ')
r(/\A</ , '>')

実行結果：https://gyazo.com/233586655aedc3462da6455b471c3a9d

１次元しかマッチしないのでつまんないよね。

r(/ </  , '< ')

を

r / </  , '< '

こうすると Ruby のコンパイラに通らなくて悩んだ。割り算かー。。。

_18(Thu)

配列の前半に、削除したいデータがあるとする。例えば、ソートされた配列に対して、3 より小さいものは不要だな、という場合。普通に考えれば Enumerator#drop_while を使う。

ary = [1,2,3,4,5]
ary = ary.drop_while{|e|
  e < 3
}
p ary #=> [3, 4, 5]

しかし、drop_while を思いつかなかった私は、Array#delete_if で消せばいいじゃん、と考えた。

ary = [1,2,3,4,5]
ary.delete_if{|e|
  e < 3
}
p ary #=> [3, 4, 5]

ここで、ary が十分に長くて、消したい要素が最初の数個であるとわかっている場合、delete_if でブロックを全要素に適用するのは無駄である。

そこで、条件があたらなければ、break するのはどうか、と考えた。

ary = [1,2,3,4,5]
ary.delete_if{|e|
  if e < 3
    true
  else
    break
  end
}
p ary #=> ?

さて、これは動くか。結論は動く。JRuby と MRI で確認した。

しかし、本当に動くのか？　これは Ruby として期待できる仕様なのか？　たまたまうまく動いているのではないか？　例えば、先に全ブロックを適用した結果を集めておいて、最後に集めた結果を参照しながらまとめて削除、という仕様も考えられる。この場合、break で止めたら変更されない、といった実装も考えられる。

まつもとさんに聞いたら、「どっちでもいいよ」とのことで、どうやらこれが動くのはたまたまらしい。「依存したコードを書くほうがどうかと思うけど」とのこと。

でもさっき書いちゃった...。

_ささだ(2017-05-19 02:36:33 +0900)

　テストコメント。

_16(Tue)

手元の mswin64 ruby に openssl をついにインストールしたので、経緯をメモしておきます。

これまで、明確に openssl をインストールしてこなかったのは、インストールが面倒だったのもあるんですが、建前としては、openssl がない場合にちゃんとインストールができるか、動かせるか、というのがありました。

Ruby（MRI）では明確に必要なライブラリというものを明記していません。ただ、実質 rubygems を利用しようとすると openssl と zlib が必要になるので、ふつーは入れるよね、というものだと思います。そもそも、Windows の場合はインストーラでインストールすることが殆どでしょう。ただ、何らかの理由で自分でビルドしたい、ただ openssl や zlib を使わないような環境ではビルドしないような特異な例があるかもしれないかなぁ、と思って、とりあえず使い続けてきました（gem を入れないといけないことがなくて、困んなかったし、入れるときは linux virtual machine を使うし...）。

Ruby インタプリタ開発者だったら入れてテストしろよって声もありそうですが、Windows 以外ではやってるので、まぁいいかと（Windows で失敗してたら、多分それは私の手に負えなさそう）。

ただ、最近（数年？）のテストで、openssl もしくは zlib がないと失敗し、長大なログを吐くものがあり、それが邪魔だったので、意図を聞いたら、まぁないのは特殊だろう、ということになり、事実上、Ruby のビルドには openssl と zlib が必要なようです。やりたい人は、特殊な処理（例えば、当該テストのスキップを明示）が必要とのことでした。

特殊であることは意図していないので、（重い腰をあげて）openssl と zlib をビルドしてインストールしました。ぐぐったらバイナリ配布もあったので、それでも良かったかもしれないけど、正直、ちょっと不安だったのでソースから。

参考文献

どちらも参考になりますが、どちらも古いので、今回試した openssl 1.1.0e のビルドは、そのままでは進めませんでした。

zlib は簡単ですね。ビルドして出来たものを適当に置くだけです。.dll をパスが通る場所におきます。

openssl。長かった。ActivePerl をインストール（インストーラで）。nasm をインストール（http://www.nasm.us/ からダウンロードして置くだけ）。openssl のソースコードを https://www.openssl.org/source/openssl-1.1.0e.tar.gz からダウンロードしておくだけ（リンクは新しいものを使って下さい）。展開して、

    $ perl Configure { VC-WIN32 | VC-WIN64A | VC-WIN64I | VC-CE } [opts]
    $ nmake
    $ nmake test
    $ nmake install

と INSTALL に書いてある手順通りでやるだけ。今回 mswin64 用に作りたかったので、VC-WIN64A を選択（AMD だそうです。I は Itanium）。そういえば、nmake test はやってないや。

[ops] の部分に、--prefix と --openssl ディレクトリを指定しましょう。NOTE.win には

 For VC-WIN64, the following defaults are use:

     PREFIX:      %ProgramW6432%\OpenSSL
     OPENSSLDIR:  %CommonProgramW6432%\SSL

となっています。私はよくわからないので適当なディレクトリを指定しました。

ビルドが終わると、[prefix]/bin に .dll が入っているので、これをパスの通るところにおきます（もしくは、ここにパスを通す）。

なるせさんの文章には

目当てのバイナリはout32dll\libeay32.dllとout32dll\ssleay32.dllです。

とありますが、見当たりません（ないことを Ruby コミッタ―の使う slack で言ったら、色々「だからおまえは駄目なんだ」みたいなことを言われた）。後述しますが、とりあえず動いたのでいいんでしょう。

次は、ruby のビルドです。

srcdir/win32/configure.bat --target=x64-mswin64 --with-zlib-lib=... --with-zlib-include=... --with-openssl-lib=... --with-openssl-include=...

と、パスを指定してあげます。あとはビルドするだけ。

と思ったら、openssl の拡張ライブラリがビルド出来ません。openssl/extconf.rb を見ると、crypto libeay32 のどちらかに CRYPTO_malloc があり、ssl ssleay32 の中に SSL_new がある、という仮定を置いているようですが、openssl の lib/ を見ると、libssl.lib と libcrypto.lib が置いてあります。頭に lib 付ければいいのかなと思って、

Index: ext/openssl/extconf.rb
===================================================================
--- ext/openssl/extconf.rb	(revision 58739)
+++ ext/openssl/extconf.rb	(working copy)
@@ -44,8 +44,8 @@
   end
 
   result = have_header("openssl/ssl.h")
-  result &&= %w[crypto libeay32].any? {|lib| have_library(lib, "CRYPTO_malloc")}
-  result &&= %w[ssl ssleay32].any? {|lib| have_library(lib, "SSL_new")}
+  result &&= %w[libcrypto].any? {|lib| have_library(lib, "CRYPTO_malloc")}
+  result &&= %w[libssl].any? {|lib| have_library(lib, "SSL_new")}
   unless result
     Logging::message "=== Checking for required stuff failed. ===\n"
     Logging::message "Makefile wasn't created. Fix the errors above.\n"

こうしたら、無事ビルドが進みました。テストも、ほぼ通ったようです。（なお、https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=58742 で、このパッチなしに通るようになったそうです）

これからは gem を手元の Windows マシン上で扱えると思うと胸熱です。

追記：上記オプションには --prefix がついてないので、色々アカン感じになったというオチ。

K.Sasada's Home Page

Diary - 2017 May

皐月

_23(Tue)

_19(Fri)

_18(Thu)

_16(Tue)