使用numpy数组优化python函数

发布时间：2020-12-20 11:28:56 所属栏目：Python 来源：网络整理

导读：我一直在尝试优化我过去两天写的 python脚本.使用几个分析工具(cProfile,line_profiler等)我将问题缩小到下面的函数. df是一个numpy数组,有3列和1,000,000行(数据类型为float).使用line_profiler,我发现只要需要访问numpy数组,函数就会花费大部分时间. full_

我一直在尝试优化我过去两天写的 python脚本.使用几个分析工具(cProfile,line_profiler等)我将问题缩小到下面的函数.

df是一个numpy数组,有3列和1,000,000行(数据类型为float).使用line_profiler,我发现只要需要访问numpy数组,函数就会花费大部分时间.

full_length = head df [rnd_truck,2]

和

full_weight = df [rnd_truck,1]

占用大部分时间,然后是

full_length = df [rnd_truck,2]

full_weight = df [rnd_truck,1]

线.

据我所知,瓶颈是由访问时间引起的,该函数试图从numpy数组中获取一个数字.

当我以MonteCarlo(df,15.,1000.)运行该功能时,在具有8GB RAM的i7 3.40GhZ 64位Windows机器上调用该功能需要37秒.在我的应用程序中,我需要运行它1,000以确保收敛,这将执行时间超过一个小时.我尝试使用operator.add方法进行求和,但它根本没有帮助我.看起来我必须想出一种更快的方式来访问这个numpy数组.

任何想法都会受到欢迎！

def MonteCarlo(df,head,span):
    # Pick initial truck
    rnd_truck = np.random.randint(0,len(df))
    full_length = df[rnd_truck,2]
    full_weight = df[rnd_truck,1]

    # Loop using other random truck until the bridge is full
    while 1:
        rnd_truck = np.random.randint(0,len(df))
        full_length += head + df[rnd_truck,2]
        if full_length > span:
            break
        else:
            full_weight += df[rnd_truck,1]

    # Return average weight per feet on the bridge
    return(full_weight/span)

下面是我正在使用的df numpy数组的一部分：

In [31] df
Out[31]: 
array([[  12.,220.4,108.4],[  11.,106.2],220.3,113.6],...,[   4.,13.9,36.8],[   3.,13.7,33.9],10.7]])

解决方法

正如其他人所指出的那样,这根本不是矢量化的,所以你的缓慢实际上是由于Python解释器的缓慢. Cython可以在这里以最小的变化为您提供帮助：

>>> %timeit MonteCarlo(df,5,1000)
10000 loops,best of 3: 48 us per loop

>>> %timeit MonteCarlo_cy(df,1000)
100000 loops,best of 3: 3.67 us per loop

MonteCarlo_cy就在哪里(在IPython笔记本中,在％load_ext cythonmagic之后)：

%%cython
import numpy as np
cimport numpy as np

def MonteCarlo_cy(double[:,::1] df,double head,double span):
    # Pick initial truck
    cdef long n = df.shape[0]
    cdef long rnd_truck = np.random.randint(0,n)
    cdef double full_weight = df[rnd_truck,1]
    cdef double full_length = df[rnd_truck,2]

    # Loop using other random truck until the bridge is full
    while True:
        rnd_truck = np.random.randint(0,n)
        full_length += head + df[rnd_truck,1]

    # Return average weight per feet on the bridge
    return full_weight / span

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!