c# – 如何将LINQ分区为对象查询?
这是资源分配问题.我的目标是运行查询以获取任何时隙的最高优先级班次.
数据集非常大.对于这个例子,假设1000家公司各有100个班次(尽管真实数据集更大).它们都被加载到内存中,我需要对它们运行一个LINQ to Objects查询: var topShifts = (from s in shifts where (from s2 in shifts where s2.CompanyId == s.CompanyId && s.TimeSlot == s2.TimeSlot orderby s2.Priority select s2).First().Equals(s) select s).ToList(); 问题在于,如果没有优化,LINQ to Objects将比较两个集合中的每个对象,进行所有1,000 x 100与1,000 x 100的交叉连接,这相当于100亿(10,000,000)个比较.我想要的是只比较每个公司内的对象(就像公司在SQL表中被索引一样).这将产生1000组100×100个对象,总计1000万(10,000)个比较.随着公司数量的增长,后者将线性扩展而不是指数级扩展. 像I4o这样的技术可以让我做这样的事情,但不幸的是,我没有在我正在执行这个查询的环境中使用自定义集合的奢侈.此外,我只希望在任何给定的数据集上运行此查询一次,因此持久索引的值是有限的.我希望使用一种扩展方法,它可以按公司对数据进行分组,然后在每个组上运行表达式. 完整示例代码: public struct Shift { public static long Iterations; private int companyId; public int CompanyId { get { Iterations++; return companyId; } set { companyId = value; } } public int Id; public int TimeSlot; public int Priority; } class Program { static void Main(string[] args) { const int Companies = 1000; const int Shifts = 100; Console.WriteLine(string.Format("{0} Companies x {1} Shifts",Companies,Shifts)); var timer = Stopwatch.StartNew(); Console.WriteLine("Populating data"); var shifts = new List<Shift>(); for (int companyId = 0; companyId < Companies; companyId++) { for (int shiftId = 0; shiftId < Shifts; shiftId++) { shifts.Add(new Shift() { CompanyId = companyId,Id = shiftId,TimeSlot = shiftId / 3,Priority = shiftId % 5 }); } } Console.WriteLine(string.Format("Completed in {0:n}ms",timer.ElapsedMilliseconds)); timer.Restart(); Console.WriteLine("Computing Top Shifts"); var topShifts = (from s in shifts where (from s2 in shifts where s2.CompanyId == s.CompanyId && s.TimeSlot == s2.TimeSlot orderby s2.Priority select s2).First().Equals(s) select s).ToList(); Console.WriteLine(string.Format("Completed in {0:n}ms",timer.ElapsedMilliseconds)); timer.Restart(); Console.WriteLine("nShifts:"); foreach (var shift in shifts.Take(20)) { Console.WriteLine(string.Format("C {0} Id {1} T {2} P{3}",shift.CompanyId,shift.Id,shift.TimeSlot,shift.Priority)); } Console.WriteLine("nTop Shifts:"); foreach (var shift in topShifts.Take(10)) { Console.WriteLine(string.Format("C {0} Id {1} T {2} P{3}",shift.Priority)); } Console.WriteLine(string.Format("nTotal Comparisons: {0:n}",Shift.Iterations/2)); Console.WriteLine("Any key to continue"); Console.ReadKey(); } } 样本输出:
问题: >如何对查询进行分区(同时仍作为单个LinQ查询执行),以便将比较从100亿减少到1000万? 解决方法
怎么样
var topShifts = from s in shifts.GroupBy(s => s.CompanyId) from a in s.GroupBy(b => b.TimeSlot) select a.OrderBy(p => p.Priority).First(); 似乎得到相同的输出,但100015比较 与@ Geoff的编辑他只是减少了我的比较:-) (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |