perl – 缓存的schwartzian变换

发布时间：2020-12-16 06:24:19 所属栏目：大数据来源：网络整理

导读：我正在经历“中级Perl”,这很酷.我刚刚完成了关于“Schwartzian变换”的部分,在它沉入之后,我开始想知道为什么变换不使用缓存.在具有多个重复值的列表中,转换会重新计算每个值的值,因此我想为什么不使用哈希来缓存结果.这里有一些代码： # a place to keep o

我正在经历“中级Perl”,这很酷.我刚刚完成了关于“Schwartzian变换”的部分,在它沉入之后,我开始想知道为什么变换不使用缓存.在具有多个重复值的列表中,转换会重新计算每个值的值,因此我想为什么不使用哈希来缓存结果.这里有一些代码：

# a place to keep our results
my %cache;

# the transformation we are interested in
sub foo {
  # expensive operations
}

# some data
my @unsorted_list = ....;

# sorting with the help of the cache
my @sorted_list = sort {
  ($cache{$a} //= &foo($a)) <=> ($cache{$b} //= &foo($b))
} @unsorted_list;

我错过了什么吗？为什么书中没有列出Schwartzian变换的缓存版本,一般来说只是更好地传播,因为乍一看我认为缓存版本应该更高效？

编辑：daxim已在评论中指出这被称为orcish maneuver.所以我并不疯狂,虽然我不太明白这个名字.

解决方法

(许多其他评论已编辑)

在某种程度上,数组查找比散列查找更有效(即$a-> [1]比$cache {$a}更快),规范形式可能比你的代码更有效,即使是很多重复.

基准测试结果：

这是我的基准测试代码：

# when does an additional layer of caching improve the performance of 
# the Schwartzian transform?

# methods:
#   1. canonical Schwartzian transform
#   2. cached transform
#   3. canonical with memoized function

# inputs:
#   1. few duplicates (rand)
#   2. many duplicates (int(rand))

# functions:
#   1. fast
#   2. slow

use Benchmark;
use Math::BigInt;
use strict qw(vars subs);
use warnings;
no warnings 'uninitialized';

# fast_foo: a cheap operation,slow_foo: an expensive operation
sub fast_foo { my $x = shift; exp($x) }
sub slow_foo { my $x = shift; my $y = new Math::BigInt(int(exp($x))); $y->bfac() }

# XXX_memo_foo: put caching optimization inside call to 'foo'
my %fast_memo = ();
sub fast_memo_foo {
  my $x = shift;
  if (exists($fast_memo{$x})) {
    return $fast_memo{$x};
  } else {
    return $fast_memo{$x} = fast_foo($x);
  }
}

my %slow_memo = ();
sub slow_memo_foo {
  my $x = shift;
  if (exists($slow_memo{$x})) {
    return $slow_memo{$x};
  } else {
    return $slow_memo{$x} = slow_foo($x);
  }
}

my @functions = qw(fast_foo slow_foo fast_memo_foo slow_memo_foo);
my @input1 = map { 5 * rand } 1 .. 1000;         # 1000 random floats with few duplicates
my @input2 = map { int } @input1;                # 1000 random ints with many duplicates

sub canonical_ST {
  my $func = shift @_;
  my @sorted = map { $_->[0] }
    sort { $a->[1] <=> $b->[1] }
    map { [$_,$func->($_)] } @_;
  return;
}

sub cached_ST {
  my $func = shift @_;
  my %cache = ();
  my @sorted = sort {
    ($cache{$a} //= $func->($a)) <=> ($cache{$b} //= $func->{$b})
  } @_;
  return;
}

foreach my $input ('few duplicates','many duplicates') {
  my @input = $input eq 'few duplicates' ? @input1 : @input2;
  foreach my $func (@functions) {

    print "nInput: $inputnFunction: $funcn-----------------n";
    Benchmark::cmpthese($func =~ /slow/ ? 30 : 1000,{
              'Canonical' => sub { canonical_ST($func,@input) },'Cached'    => sub { cached_ST($func,@input) }
             });
  }
}

和结果(Strawberry Perl 5.12)：

Input: few duplicates
Function: fast_foo
-----------------
           Rate Canonical    Cached
Canonical 160/s        --      -18%
Cached    196/s       22%        --

Input: few duplicates
Function: slow_foo
-----------------
            Rate Canonical    Cached
Canonical 7.41/s        --       -0%
Cached    7.41/s        0%        --

Input: few duplicates
Function: fast_memo_foo
-----------------
           Rate Canonical    Cached
Canonical 153/s        --      -25%
Cached    204/s       33%        --

Input: few duplicates
Function: slow_memo_foo
-----------------
            Rate    Cached Canonical
Cached    20.2/s        --       -7%
Canonical 21.8/s        8%        --

Input: many duplicates
Function: fast_foo
-----------------
           Rate Canonical    Cached
Canonical 179/s        --      -50%
Cached    359/s      101%        --

Input: many duplicates
Function: slow_foo
-----------------
            Rate Canonical    Cached
Canonical 11.8/s        --      -62%
Cached    31.0/s      161%        --

Input: many duplicates
Function: fast_memo_foo
-----------------
           Rate Canonical    Cached
Canonical 179/s        --      -50%
Cached    360/s      101%        --

Input: many duplicates
Function: slow_memo_foo
-----------------
            Rate Canonical    Cached
Canonical 28.2/s        --       -9%
Cached    31.0/s       10%        --

我对这些结果感到有些震惊 – 规范的Schwartzian变换在最有利的条件下(昂贵的函数调用,很少重复或没有记忆)只有一点点优势,并且在其他情况下处于相当大的劣势.排序函数中的OP缓存方案甚至优于排序外的memoization.当我做基准时,我并没有期待这一点,但我认为OP正在做点什么.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!