What's New in Asio - Performance

What's New in Asio - Performance

Article 2 of n

The forthcoming release of Asio, which will ship as part of Boost 1.74, includes a great number of new features and improvements. In this series of articles we will preview some of these changes.

The new release of Asio includes a number of performance improvements, primarily for Linux systems. These improvements fall mainly into the following four categories:

All of these performance changes automatically apply, without the need to modify most application source code.

Single-buffer vs scatter-gather

Asio has long provided scatter-gather support for all of its read and write operations, building on the concepts of const buffer sequences and mutable buffer sequences. These operations are implemented in terms of the scatter-gather system calls, namely readv, writev, recvmsg, and sendmsg.

However, on Linux the single-buffer system calls, read, write, recv, and send, can be significantly faster. To this end, Asio's implementation now employs template specialisation to detect, at compile time, whether the user has passed a single buffer type. If so, the implementation automatically selects the single-buffer system call rather than the multi-buffer equivalent.

Polymorphic executors

Since Asio 1.13.0, I/O objects have defaulted to using a polymorphic wrapper type as their associated I/O executor. This polymorphic executor was called asio::executor, and it used a reference counting implementation strategy. In benchmarks, the cost of this reference counting could be significant.

Although asio::executor does now use a lighter reference counting approach, the new default polymorphic executor, asio::any_io_executor, utilises a small buffer optimisation to avoid this cost altogether for the typical case. This has the added benefit of avoiding hidden sharing of executor objects between threads.

Happy path optimisations

Previously, Asio's implementation took a conservative approach to error handling, clearing errno prior to all system calls, and testing for well known errors (such as EINTR or EAGAIN) before deciding if an operation was successful.

The system call wrappers in Asio's implementation have been rewritten to return as early as possible, avoiding accessing errno and error codes unless actually on an error path.

Native I/O executor detection

Starting with Asio 1.13.0, users can construct I/O objects like sockets to use arbitrary executor types. This means that, in addition to io_context, sockets can be created to dispatch directly to asio::thread_pool, asio::system_executor, or even user-defined executors.

However, there can be a small cost associated with dispatching to an unknown executor. The io_context is currently the only "native" I/O execution context, and Asio's asynchronous operations now use a fast path for completion handler dispatch, when the native I/O executor type io_context::executor_type is detected.

Benchmark results

The short, simple benchmark below gives us some idea of the impact of these performance improvements in practice. This program creates 100 pairs of connected UNIX domain sockets, and transfers 100,000 short 64-byte messages back and forth between them. It uses the default, polymorphic I/O executor.

When run using the Asio 1.16.1 release (shipped with Boost 1.73), the benchmark run time is as follows:

$ time ./a.out

real    0m41.418s
user    0m11.447s
sys     0m29.969s

Using the Asio 1.17.0 release (which is shipped with the Boost 1.74 beta), we get:

$ time ./a.out

real    0m34.323s
user    0m9.311s
sys     0m25.011s

This is a performance improvement of over 16%.

The full listing of the benchmark follows:

#include <asio.hpp>
#include <array>
#include <memory>

using socket_type = asio::local::stream_protocol::socket;
constexpr std::size_t buffer_size = 64;
constexpr int connections = 100;
constexpr int iterations = 100000;

struct echo_session
{
  socket_type socket;
  std::array<char, buffer_size> data{0};
  int count = 0;

  explicit echo_session(asio::io_context& ctx)
    : socket(ctx)
  {
  }

  friend void do_read(
      std::unique_ptr<echo_session> self)
  {
    auto& socket = self->socket;
    auto buffer = asio::buffer(self->data);
    socket.async_read_some(buffer,
        [self = std::move(self)](auto error, auto n) mutable
        {
          if (!error)
            do_write(std::move(self), n);
        }
      );
  }

  friend void do_write(
      std::unique_ptr<echo_session> self,
      std::size_t n)
  {
    auto& socket = self->socket;
    auto buffer = asio::buffer(self->data, n);
    asio::async_write(socket, buffer,
        [self = std::move(self)](auto error, auto) mutable
        {
          if (!error && self->count++ < iterations)
            do_read(std::move(self));
        }
      );
  }
};

int main()
{
  asio::io_context ctx(1);

  for (int i = 0; i < connections; ++i)
  {
    auto session1 = std::make_unique<echo_session>(ctx);
    auto session2 = std::make_unique<echo_session>(ctx);
    connect_pair(session1->socket, session2->socket);
    do_read(std::move(session1));
    do_write(std::move(session2), buffer_size);
  }

  ctx.run();
}

It was compiled with the following command line options:

g++-10 -std=c++20 -Wall -Wextra -Iinclude -O3 -flto -pthread bench.cpp

and run on an x86-64 Debian Linux system with the 5.6.0 kernel.