加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

IronPython中大规模评估表达式的表现

发布时间:2020-12-20 13:22:32 所属栏目:Python 来源:网络整理
导读:在C#-4.0应用程序中,我有一个具有相同长度的强类型IList字典 – 一个动态强类型列的表. 我希望用户根据将在所有行上聚合的可用列提供一个或多个( python-)表达式.在静态上下文中它将是: IDictionarystring,IList table;// ...IListint a = table["a"] as IL
在C#-4.0应用程序中,我有一个具有相同长度的强类型IList字典 – 一个动态强类型列的表.
我希望用户根据将在所有行上聚合的可用列提供一个或多个( python-)表达式.在静态上下文中它将是:

IDictionary<string,IList> table;
// ...
IList<int> a = table["a"] as IList<int>;
IList<int> b = table["b"] as IList<int>;
double sum = 0;
for (int i = 0; i < n; i++)
    sum += (double)a[i] / b[i]; // Expression to sum up

对于n = 10 ^ 7,这在我的笔记本电脑上以0.270秒运行(win7 x64).对于具有两个int参数的委托替换表达式需要0.580秒,对于非类型委托,需要1.19秒.
用IronPython创建委托

IDictionary<string,IList> table;
// ...
var options = new Dictionary<string,object>();
options["DivisionOptions"] = PythonDivisionOptions.New;
var engine = Python.CreateEngine(options);
string expr = "a / b";
Func<int,int,double> f = engine.Execute("lambda a,b : " + expr);

IList<int> a = table["a"] as IList<int>;
IList<int> b = table["b"] as IList<int>;
double sum = 0;
for (int i = 0; i < n; i++)
    sum += f(a[i],b[i]);

它需要3.2秒(而使用Func< object,object,object>则需要5.1秒) – 因子4到5.5.这是我正在做的事情的预期开销吗?有什么可以改进的?

如果我有很多列,那么上面选择的方法就不够了.一种解决方案可以是确定每个表达式所需的列,并仅将它们用作参数.我尝试失败的另一个解决方案是使用ScriptScope并动态解析列.为此,我定义了一个RowIterator,它具有活动行的RowIndex和每列的属性.

class RowIterator
{
    IList<int> la;
    IList<int> lb;

    public RowIterator(IList<int> a,IList<int> b)
    {
        this.la = a;
        this.lb = b;
    }
    public int RowIndex { get; set; }

    public int a { get { return la[RowIndex]; } }
    public int b { get { return lb[RowIndex]; } }
}

ScriptScope可以从IDynamicMetaObjectProvider创建,我希望它可以通过C#的动态实现 – 但是在运行时engine.CreateScope(IDictionary)正在尝试调用,但失败了.

dynamic iterator = new RowIterator(a,b) as dynamic;
var scope = engine.CreateScope(iterator);
var expr = engine.CreateScriptSourceFromString("a / b").Compile();

double sum = 0;
for (int i = 0; i < n; i++)
{
    iterator.Index = i;
    sum += expr.Execute<double>(scope);
}

接下来,我尝试让RowIterator从DynamicObject继承并使其成为一个正在运行的示例 – 具有可怕的性能:158秒.

class DynamicRowIterator : DynamicObject
{
    Dictionary<string,object> members = new Dictionary<string,object>();
    IList<int> la;
    IList<int> lb;

    public DynamicRowIterator(IList<int> a,IList<int> b)
    {
        this.la = a;
        this.lb = b;
    }

    public int RowIndex { get; set; }
    public int a { get { return la[RowIndex]; } }
    public int b { get { return lb[RowIndex]; } }

    public override bool TryGetMember(GetMemberBinder binder,out object result)
    {
        if (binder.Name == "a") // Why does this happen?
        {
            result = this.a;
            return true;
        }
        if (binder.Name == "b")
        {
            result = this.b;
            return true;
        }
        if (base.TryGetMember(binder,out result))
            return true;
        if (members.TryGetValue(binder.Name,out result))
            return true;
        return false;
    }

    public override bool TrySetMember(SetMemberBinder binder,object value)
    {
        if (base.TrySetMember(binder,value))
            return true;
        members[binder.Name] = value;
        return true;
    }
}

我对使用属性名称调用TryGetMember感到惊讶.从文档中我可以预期,TryGetMember只会被调用未定义的属性.

可能为了一个明智的性能,我需要为我的RowIterator实现IDynamicMetaObjectProvider以使用动态CallSites,但是找不到适合我的例子.在我的实验中,我不知道如何在BindGetMember中处理__builtins__:

class Iterator : IDynamicMetaObjectProvider
{
    IList<int> la;
    IList<int> lb;

    public Iterator(IList<int> a,IList<int> b)
    {
        this.la = a;
        this.lb = b;
    }
    public int RowIndex { get; set; }
    public int a { get { return la[RowIndex]; } }
    public int b { get { return lb[RowIndex]; } }

    public DynamicMetaObject GetMetaObject(Expression parameter)
    {
        return new MetaObject(parameter,this);
    }

    private class MetaObject : DynamicMetaObject
    {
        internal MetaObject(Expression parameter,Iterator self)
             : base(parameter,BindingRestrictions.Empty,self) { }

        public override DynamicMetaObject BindGetMember(GetMemberBinder binder)
        {
            switch (binder.Name)
            {
                case "a":
                case "b":
                    Type type = typeof(Iterator);
                    string methodName = binder.Name;
                    Expression[] parameters = new Expression[]
                    {
                        Expression.Constant(binder.Name)
                    };
                    return new DynamicMetaObject(
                        Expression.Call(
                            Expression.Convert(Expression,LimitType),type.GetMethod(methodName),parameters),BindingRestrictions.GetTypeRestriction(Expression,LimitType));
                default:
                    return base.BindGetMember(binder);
            }
        }
    }
}

我确定我上面的代码不是最理想的,至少它还没有处理列的IDictionary.对于如何改进设计和/或性能的任何建议,我将不胜感激.

解决方法

我还将IronPython的性能与C#实现进行了比较.表达式很简单,只需在指定的索引处添加两个数组的值即可.直接访问阵列提供了基线和理论最优.通过符号字典访问值仍然具有可接受的性能.

第三个测试从一个天真(和坏的意图)表达树创建一个委托,没有任何花哨的东西,如呼叫端缓存,但它仍然比IronPython更快.

通过IronPython脚本编写表达式需要花费最多的时间.我的探查器向我展示了大部分时间花在PythonOps.GetVariable,PythonDictionary.TryGetValue和PythonOps.TryGetBoundAttr上.我认为还有改进的余地.

时序:

>直达:00:00:00.0052680
>通过词典:00:00:00.5577922
>编译代表:00:00:03.2733377
>脚本:00:00:09.0485515

这是代码:

public static void PythonBenchmark()
    {
        var engine = Python.CreateEngine();

        int iterations = 1000;
        int count = 10000;

        int[] a = Enumerable.Range(0,count).ToArray();
        int[] b = Enumerable.Range(0,count).ToArray();

        Dictionary<string,object> symbols = new Dictionary<string,object> { { "a",a },{ "b",b } };

        Func<int,object> calculate = engine.Execute("lambda i: a[i] + b[i]",engine.CreateScope(symbols));

        var sw = Stopwatch.StartNew();

        int sum = 0;

        for (int iteration = 0; iteration < iterations; iteration++)
        {
            for (int i = 0; i < count; i++)
            {
                sum += a[i] + b[i];
            }
        }

        Console.WriteLine("Direct: " + sw.Elapsed);



        sw.Restart();
        for (int iteration = 0; iteration < iterations; iteration++)
        {
            for (int i = 0; i < count; i++)
            {
                sum += ((int[])symbols["a"])[i] + ((int[])symbols["b"])[i];
            }
        }

        Console.WriteLine("via Dictionary: " + sw.Elapsed);



        var indexExpression = Expression.Parameter(typeof(int),"index");
        var indexerMethod = typeof(IList<int>).GetMethod("get_Item");
        var lookupMethod = typeof(IDictionary<string,object>).GetMethod("get_Item");
        Func<string,Expression> getSymbolExpression = symbol => Expression.Call(Expression.Constant(symbols),lookupMethod,Expression.Constant(symbol));
        var addExpression = Expression.Add(
                                Expression.Call(Expression.Convert(getSymbolExpression("a"),typeof(IList<int>)),indexerMethod,indexExpression),Expression.Call(Expression.Convert(getSymbolExpression("b"),indexExpression));
        var compiledFunc = Expression.Lambda<Func<int,object>>(Expression.Convert(addExpression,typeof(object)),indexExpression).Compile();

        sw.Restart();
        for (int iteration = 0; iteration < iterations; iteration++)
        {
            for (int i = 0; i < count; i++)
            {
                sum += (int)compiledFunc(i);
            }
        }

        Console.WriteLine("Compiled Delegate: " + sw.Elapsed);



        sw.Restart();
        for (int iteration = 0; iteration < iterations; iteration++)
        {
            for (int i = 0; i < count; i++)
            {
                sum += (int)calculate(i);
            }
        }

        Console.WriteLine("Scripted: " + sw.Elapsed);
        Console.WriteLine(sum); // make sure cannot be optimized away
    }

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读