Array#sortBy will sort an array based on a given field or mapping function, or directly on the elements of the array itself if none is passed. If the field being sorted on is a string, it will attempt an alphanumeric sort.
The problem of alphanumeric sorting raises a number of tricky issues concerning ordering (also called "collation"). Sugar here attempts a compromise between convenience and flexibility, with some sensible defaults applied up front. When using sortBy on an alphanumeric field, four customizable settings will determine the resulting order. All four settings are set directly on the Array class.
|
Name
|
Type
|
Description
|
|---|---|---|
| AlphanumericSortOrder | String | Base character order to apply when sorting. |
| AlphanumericSortIgnore | RegExp | Anything that matches is ignored when sorting. |
| AlphanumericSortIgnoreCase | Boolean | Convert all strings to lowercase before comparing. |
| AlphanumericSortEquivalents | Object | Table of characters to be treated as equivalent when sorting. |
All four fields have defaults that have been given a lot of attention linguistically to provide the best possible results for the most major world languages. AlphanumericSortOrder is by default the following string:
AÁÀÂÃĄBCĆČÇDĎÐEÉÈĚÊËĘFGĞHıIÍÌİÎÏJKLŁMNŃŇÑOÓÒÔPQRŘSŚŠŞTŤUÚÙŮÛÜVWXYÝZŹŻŽÞÆŒØÕÅÄö
AlphanumericSortIgnore is null by default, so punctuation and other special characters will also be counted when sorting. AlphanumericSortIgnoreCase is true by default so case will be ignored. Note however that the above ordering actually includes lowercase characters directly after their uppercase equivalents (not shown above), so if AlphanumericSortIgnore is set to false, their order will be properly respected.
Finally, AlphanumericSortEquivalents default to containing diacritic letters (letters with accents over them) that are known to be equivalent to standard letters in major Western European languages, most notably French and German. The exact equivalents are as follows (also includes lowercase equivalents):
| Letter: | Á | À | Â | Ã | Ä | Ç | É | È | Ê | Ë | Í | Ì | İ | Î | Ï | Ó | Ò | Ô | Õ | Ö | ß | Ú | Ù | Û | Ü |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Equivalent: | A | C | E | I | O | S | U | ||||||||||||||||||
Changing ordering behavior is as easy as changing any of these four defaults. If, for example, you are ordering on fields you know to be Russian, changing AlphanumericSortOrder to a string of Russian characters correctly ordered is all that is required for Array#sortBy to correctly order Russian. Note that this is only required if the standard Unicode order is not sufficient. If any string being sorted on does not exist in AlphanumericSortOrder, it will fall back to standard Unicode order.
AlphanumericSortEquivalents can be modified simply by changing the contents of the table, or removing it entirely if no equivalents are desired. Note that both uppercase and lowercase variants must be included in the table to work on both.
With these defaults, surprisingly few collisions exist for the most major world languages but there are 3 that are noteworthy. Polish orders the character Ç as a separate letter, while French treats it as a C. Likewise Scandanavian languages like Swedish and Norwegian order Ä and Ö after Z, where German orders them as A and O. In such cases, simply removing that entry from the equivalents table will result in the correct order being preserved, so setting Ä and Ö to null in AlphanumericSortEquivalents will result in perfect Scandanavian sort order.
Proper collation algorithms (which are massively complex) can handle other situations which Sugar cannot. Most notably are expansions and contractions of multiple characters, such as sorting ll as a separate character between l and m in Spanish. If such cases are required, Array#sortBy itself may provide a workaround by modifying the value via the mapping function passed. For example replacing ll with a distinct character or token and including it in AlphanumericSortOrder would solve the contraction issue for a strict Spanish sort. This works as the return value of the mapping function of Array#sortBy is simply used for sort ordering, and has no effect on actual content of the array. Expansions could likewise be added on a per language basis.