Sorting in Perl
John Klassa / Raleigh Perl Mongers / June,2000
?
Introduction
- Perl has a built-in "sort" function.
- It uses a quicksort algorithm,which has good (in fact,O(n log n)) performance. (Note,however,that a simple bubble sort can be faster than most other sorts for very short lists.)
- It's easy to use. You say:
@s = sort @a;
and you've got a sorted array...
?
So Why Do We Need a Talk?
- In order to do useful sorts,you need to add a bit of code to your "sort" calls.
- If you haven't had the need to do a complex sort,you will.
- The built-in "sort" compares the sortkeys O(n log n) times. How (and how often) you generate the sortkeys,and what you sort them with,makes a huge difference.
?
The Basics
?
More Basics
?
Variations
?
Combination Sorts
?
Sort Subroutines
?
Advanced Sorting: Motivation
- Everything that happens in your sort subroutine/clause happens O(n log n) times.
- If you do something expensive (like an extraction via a regexp,or perhaps a "stat" on a file),this is a Bad Thing.
- The Goal: Extract the sortkeys just once.
?
Solution #1: "Orcish" Maneuver
?
Problems with the OM
- Performs an extra test after each sortkey lookup.
- False values are recomputed each time.
?
A Better Way: The Schwartzian Transform
- The Schwartzian Transform creates a sorted list by transforming the original list into an intermediate form,where the sortkeys are cached,and then pulling the original list back out.
?
But First,A Digression...
- Just as @a = (1,2,3) creates an array,$aref = [1,3] creates a reference to an anonymous array.
- Just as $a[0] is the first element of @a,$aref->[0] is the first element of the anonymous array to which $aref refers.
- Understanding this is central to understanding the ST.
?
The Schwartzian Transform
?
ST: Mechanics
- Map the list into a new one that contains the extracted sortkeys and the original values.
- Sort on the sortkeys.
- Map the resulting list into a new one that contains the original values in the sorted order.
?
ST: Verbose Approach
- Verbosely,in code:
# @a exists,and contains filenames
@x = map { [ $_,-M ] } @a; # transform: value,sortkey
@sx = sort { $a->[1] <=> $b->[1] } @x; # sort
@s = map { $_->[0] } @sx; # restore original values
?
ST: Final
- Put it all together? The key is to read it backwards.
# @a exists,and contains filenames
@s = map { $_->[0] } # restore original values
sort { $a->[1] <=> $b->[1] } # sort
map { [$_,-M] } @a; # transform: value,sortkey
?
Can We Do Better?
- I didn't think so until I read the Guttman-Rosler paper.
- Turns out,yes. Use packed sortkeys and the default sort.
?
The "Packed Default" Sort
- So-named because it uses packed sortkeys,then sorts them with the default "sort" (i.e. no sort subroutine or sort clause. just the native,all-in-C comparison routine).
- The benefits: fast,one-time sortkey generation; fast comparison; fast extraction.
?
PD: The Mechanics
- Pack the sortkeys into a single string (tack on subkeys,if any).
- Tack on the original values (or an index,if the original values are complex data structures).
- Sort.
- Retrieve original values via "substr","split" or whatever.
?
PD: Example
?
Conclusion
- Using "sort" is always O(n log n).
- For complicated sorts,how you pull out the sortkeys and how you compare them is what matters.
- The ST is my personal favorite. It's easy to remember,and it's fast.
- The PD sort is faster,but it's also a bit more cryptic (unless you're a natural with "pack",and have a desire to really understand your data). By the way,Uri Guttman started on a Sort::Records module that does a PD sort under the covers,but did not finish it or publish it to CPAN. He has,offered to give us the current source,design ideas,help,etc. if anyone in raleigh.pm would like to pick it up.?
References
- More About the Schwartzian Transform (Joseph Hall),at:?http://www.stllinux.org/meeting_notes/1997/0918/schwtr.html
- A Fresh Look at Efficient Perl Sorting (Uri Guttman and Larry Rosler),at:?http://www.sysarch.com/perl/sort_paper.html
Revisions
- November 20,2002: Rob West: Update based on feedback from Uri Guttman.
- August 22,2003: Rob West: Updated References links.
(编辑:李大同)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|