Simple Bandwidth Scanner technical details

Author:Matt Traudt, juga
Date:29 March 2018
Last Update:19 Sep 2018
Status:Draft

Note

this document may become an specification of the sbws method to measure relays and/or scale the measuremensts

Conventions

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 (RFC 2119 and RFC 8174) when, and only when, they appear in all capitals, as shown here.

Background

Some of the Tor directory authorities run bandwidth scanners to measure the bandwidth of relays and include their measurements in their network status votes. Clients use the consensus of these weights to inform their path selection process with the hope that every circuit they build will have roughly equal performance, regardless of the relays chosen. This achieves a form of load balancing.

Historically, the directory authorities that ran bandwidth scanners (bandwidth authorities), ran torflow. Time passed, it slowly become less maintained, and the collective knowledge of how it worked slipped away.

The bandwidth authorities became increasingly unhappy having to run torflow. Twice yearly Tor Project meetings came and went. Everyone agreed a replacement was needed, but it was harder to reach consensus on what it should look like and what its goals where.

Simple Bandwidth Scanner (sbws) is a project motivated by discussions at the Rome 2018 Tor Project meeting. It aims to be a quick to implement, easy to maintain replacement for torflow. It should not receive many fancy features; instead, developer time should be spent on finding and implementing a better load balancing solution than bandwidth scanning, such as peerflow.

This document describes the implementation contained within the accompanying sbws package.

Anatomy of a Tor network using sbws

First and foremost, there needs to be one or more webservers that exist somewhere on the Internet. They MUST serve up a file of at least some minimum size, and MUST support both HTTP GET and HEAD requests on that file. For both HTTP verbs, they MUST support requests with the Range header and MUST support not compressing responses. Beyond these requirements, the webservers MAY support TLS connections, optionally with a valid certificate. Both apache and nginx fit these requirements.

Every directory authority that wishes to also vote on relay bandwidth (AKA a bandwidth authority) MUST run one or more sbws scanner clients (or trust someone to run one or more sbws scanner clients for them). The scanners run continuously, constantly building two-hop circuits to the previously described webservers and measuring the amount of bandwidth relays are capable of handling on these measurement circuit. Over these circuits it collects RTT data (by repeatedly requesting a single byte from the webserver) and available bandwidth data (by starting small and progressively requesting larger amounts of data until the request takes long enough to fulfill, and then requesting that amount many times).

Periodically the operator of an sbws scanner MUST run the sbws generate command in order to generate a v3bw file. This aggregates the previous few days’ worth of measurement results into one RTT and one bandwidth per relay ever measured within the validity period into a single file for the tor process the bandwidth authority is running to read. The bandwidth authority includes these aggregated results in its votes.

Sbws dependencies

Sbws scanners run tor for themselves and do not require a system tor process to exist. Tor MUST be installed, and it SHOULD be any up-to-date version supported by the network team.

Sbws uses the python library Stem to launch and control tor and the python library Requests to make HTTP(S) requests. Both are generally packaged for most major Linux distributions, and are always available in PyPI.

How it all works

We now describe various core parts of sbws.

Simple relay prioritization

This may be the most complex part of sbws.

Sbws makes an effort to prioritize measurements of relays that don’t have many recent results. For example: relays that just joined the Tor network or relays that haven’t been online in the past few days. This goal is achieved using a min-priority queue and the concept of freshness.

Freshness is defined as the amount of time between when the measurement was made and the time sbws will stop considering it valid. Thus, a measurement made more recently will have more time until it is no longer valid (higher freshness) and a measurement made a long time ago will have very little time until it is no longer valid (lower freshness).

Over time, sbws will make many measurements for a given relay. The sum of these measurements’ freshnesses is the relay’s priority. As we are prioritizing like a min-priority queue, a higher sum of freshnesses means worse priority while a lower sum of freshnesses means better priority.

Example: AlphaRelay33 joined the network yesterday and sbws has measured it once so far. BetaRelay87 has been in the network for years and has been getting measured regularly approximately once a day. BetaRelay87 has five measurements that are still valid, with freshnesses 10, 100, 500, 1000, and 1500. AlphaRelay33’s one measurement has freshness 3000. Because the sum of BetaRelay87’s 5 measurements is greater than AlphaRelay33’s one measurement, AlphaRelay33 has better priority and will be measured next before BetaRelay87.

Example: AlphaRelay33 is still a brand new relay with its one measurement in the last day with freshness 3000. CharlieRelay9 has been in the network for a long time, but had technical issues last week and hasn’t been online in many days. When CharlieRelay9’s operator finally gets him back online, he still has one valid measurement with freshness 10. Because AlphaRelay33’s measurement is fresher, CharlieRelay9 has better priority and will get measured first.

Sometimes measurements fail. Hopefully they fail because of transient issues, and with that hope in mind, it would be nice if a relay with a failed measurement didn’t have to wait a long time to have another chance at a successful measurement. For this reason, when summing the freshnesses of results for a given relay, sbws will artificially reduce the freshness for measurements that were not successful. This makes the sum of freshnesses lower for that relay, and therefore the priority better so it can be measured again sooner.

Simple result storage

Internally, sbws has a hierarchy of Result classes for easy managing of different types of result (success, error-because-of-circuit-error, error-because-[…] etc.). These results get converted into JSON strings and stored – one per line – in text files in a data directory.

The text files are simply named after the date. For example: 2018-03-20.txt.

The sbws scanner only appends to these files, and it automatically starts a new file when the system’s clock ticks past midnight UTC.

To avoid any weird timezone-related issues, consumers of sbws scanner data (such as the generate and stats scripts) SHOULD read more files than strictly necessary. For example, if the validity period is 5 days, they should read 6 days of files. Because all results have a Unix timestamp, consumers of sbws data can easily determine which results are just outside the validity period as they are reading them in.

Simple result processing

Every hour the directory authorities vote to come to a consensus about the state of the Tor network. The bandwidth authorities need to use the results that have been gathered to inform their vote about relays’ bandwidths. To do this they use sbws generate.

The relays’ bandwidth measurements (Results) to be added to the Bandwidth File MUST be first selected and MUST be then then scaled.

Selecting bandwidth measurements

Each relay bandwidth measurements are selected in the following way:

  1. At least two bandwidth measurements (Result s) MUST have been obtained within an arbitrary number of seconds (currently one day). If they are not, the relay MUST NOT be included in the Bandwith File.
  2. The measurements than are are older than an arbitrary number of senconds in the past MUST be discarded. Currently this number is the same as data_period (5 days).

If the number of relays to include in the Bandwidth File are less than a percententage (currently 60%) than the number of relays in the consensus, additional Header Lines MUST be added (see XXX) to the Bandwith File and the relays SHOULD NOT be included.

Scaling bandwidth measurements

Consensus bandwidth obtained by new implementations MUST be comparable to the consensus bandwidth, therefore they MUST implement torflow_scaling. The bandwidth_file_spec appendix B describes torflow scaling and a linear scaling method.