My Project
stxxl::sort -- Sorting Comparison-Based
Author
Roman Dementiev (2006)

stxxl::sort is an external memory equivalent to STL std::sort. The design and implementation of the algorithm is described in detail in [DemSan03].

Prototype

template < typename ExtIterator,
typename StrictWeakOrdering
>
void sort ( ExtIterator first,
ExtIterator last,
StrictWeakOrdering cmp,
unsigned_type M
)

Description

Requirements on Types

StrictWeakOrdering Comparison Concept

Model of StrictWeakOrdering Comparison concept must:

Examples

A comparator class for integers: my_less_int.

struct my_less_int
{
bool operator() (int a, int b) const
{
return a < b;
}
int min_value() const { return std::numeric_limits<int>::min(); };
int max_value() const { return std::numeric_limits<int>::max(); };
};

A comparator class my_less, that could be instantiated as e.g. my_less<int>, my_less<unsigned long>, etc.

template <typename ValueType>
struct my_less
{
typedef ValueType value_type;
bool operator() (const value_type & a, const value_type & b) const
{
return a < b;
}
value_type min_value() const { return std::numeric_limits<value_type>::min(); };
value_type max_value() const { return std::numeric_limits<value_type>::max(); };
};

Preconditions

[first, last) is a valid range.

Complexity

stxxl::sort chooses the block size (parameter B) equal to the block size of the container, the last and first iterators pointing to (e.g. stxxl::vector's block size).

The second term in the I/O complexity accounts for the merge phases of the external memory sorting algorithm [DemSan03]. Avoiding multiple merge phases speeds up the sorting. In practice one should choose the block size B$of the container to be sorted such that there is only one merge phase needed: $ \lceil {\log}_{M/B}(2N/M) \rceil) = 1 $. This is possible for $ M > DB $ and $ N < M^2/2DB $. But still this restriction gives a freedom to choose a variety of blocks sizes. The study [DemSan03] has shown that optimal B for sorting lies in the range $ [M^2/(4N),3M^2/(8N)] $. With such choice of the parameters the stxxl::sort always performs $ 4N/DB $ I/Os.

Internal Memory Consumption

The stxxl::sort consumes slightly more than M bytes of internal memory.

External Memory Consumption

The stxxl::sort is not in-place. It requires about N bytes of external memory to store the sorted runs during the sorting process [DemSan03]. After the sorting this memory is freed.

Example

struct MyCmp: public std::less<int> // ascending order
{
static int min_value() const
{ return std::numeric_limits<int>::min(); }
static int max_value() const
{ return std::numeric_limits<int>::max(); }
};
typedef stxxl::VECTOR_GENERATOR<int>::result vec_type;
vec_type V;
// ... fill here the vector with some values
// Sort in ascending order use 512 MiB of main memory
stxxl::sort(V.begin(), V.end(), MyCmp(), 512*1024*1024);
// sorted

Sorted Order Checking

STXXL gives an ability to automatically check the order in the output of STXXL sorters and intermediate results of sorting (the order and a meta information in the sorted runs). The check is switched on if the source codes and the library are compiled with the option -DSTXXL_CHECK_ORDER_IN_SORTS and the option -DNDEBUG is not used. For details see the compiler.make file in the STXXL tar ball. Note, that the checking routines require more internal work as well as additional $ N/DB $ I/Os to read the sorted runs. Therefore for the final non-debug version of a user application on should switch this option off.

This checker checks the stxxl::sort, stxxl::ksort, and the pipelined sorter.