HOPR Network Design and Flaws — A System Analysis
In recent months, a project called HOPR has been advertising their project as an “open incentivized mixnet’’. Given the many nominal…
In recent months, a project called HOPR has been advertising their project as an “open incentivized mixnet’’. Given the many nominal similarities of this project to that of ours at Nym technologies; similar claims, messaging and usage of same literature ie mixnet, relays, metadata protection, incentivized nodes etc. and the fact that we have been asked time and time again by the community on how the two projects compare, we decided to carry a comprehensive analysis to answer these questions.
While we hope the HOPR team succeeds in this venture, we have found serious privacy and security concerns that as privacy researchers decided to address as otherwise there can be consequential harm to the end users.
It’s worth mentioning that we have raised these concerns with HOPR team many times, on Telegram and in person, including their AMA where all our technical questions were left unanswered and later regarded as spam. This article was also shared with them but we have not heard a response. So we would like to invite HOPR to engage and answer these questions to remove the technical doubts concerning privacy claims of their design as otherwise users are put in danger.
In this post we first describe each system architecture. Then we review attacks known to be of concern to privacy networks and compare their impact on HOPR and the Nym mixnet and how each network architecture can defend against them. We believe there are foreseeable hard challenges that the HOPR network will run into due to its underlying design, and for which no technical solutions currently exist.
So let’s get to it.
HOPR calls their design a mixnet, similar to that of Nym Technologies, and relies on the Sphinx packet format, again similar to Nym, to encrypt data packets multiple times. That is, however, where similarities end, as there are fundamental differences at the architectural level between the two systems that are highly consequential for users’ privacy and network scalability.
In general, the first major decision when designing an anonymity network is whether to use a p2p or a client-server architecture. At first sight, p2p architectures may look like a great choice for a privacy network given the general scalability properties of p2p networks and their suitability for other use cases. On a closer examination however, such designs have been found to suffer from serious shortcomings when applied to anonymous routing, both in terms of privacy and scalability. Let’s explain that in the context of the two projects.
In a nutshell, HOPR is a p2p network where participants are peers that act as both end users and relay nodes that route the data packets of others. New participants join by contacting one or more “bootstrap nodes” that give to new users a list of peers they can connect to to become part of the HOPR network. A gossip protocol is used to discover more of the network: peers share their list of contacts with others to enable them to connect to each other. Knowledge of many other peers and existing connections between them is necessary to send data via HOPR. Peers encapsulate data in Sphinx packets together with per-hop probabilistic payments and route these packets through a few intermediaries to a receiving peer.
Example of a p2p network
In contrast, the Nym network is a decentralized mixnet that is designed as a client-server architecture. In Nym there is a clear distinction of roles between the end users, who send and receive anonymous data packets, and the relays or mix nodes that provide the service to those users. The expectations from these two classes of participants are significantly different: while the end users may be offline most of the time and have small devices or flaky connections, the mix nodes run on a set of well-maintained decentralized servers that are incentivized to reliably provide high quality of service to thousands of users at any time. In Nym all end users obtain the full list of available mix nodes with their corresponding keys. This list is published at regular intervals in the Nym blockchain, ensuring that all participants have a verifiable, global, up-to-date and consistent view on the mix nodes forming the network. Instead of establishing ad hoc connections to a set of arbitrary peers (as in HOPR), in Nym the mixnodes are structured in a layered topology — which is known to be the optimal mixnet topology in terms of both privacy and scalability. All users choose routes through the mixnet layers following the same probabilistic algorithm, a feature that maximizes privacy by not making users stand out based on their routing choices.
Nym client-server mixnet architecture
One of the principles of good privacy design is to ensure that the system protects the privacy of end users while making the infrastructure providers publicly verifiable and accountable. This is why making a clear separation of roles between end users and mix nodes (or relays) is important: it allows deploying verification mechanisms to publicly assess if mix nodes are following the protocols and providing a good quality of service to the end users — without diminishing privacy for end users.
In Nym this public scrutiny of nodes is possible because (1) mix nodes have no privacy requirements themselves because they are only infrastructure providing a service to others; and (2) the size of the mixnet (number of active mix nodes) is orders of magnitude smaller than the user base, which makes it practical to verify all nodes.
In contrast, by conflating the two roles and having all peers be both things, HOPR gets the worst of all worlds. On one hand, peers have to disseminate information about themselves (eg, their IP address and contacts) to random strangers to be part of the network infrastructure and thus able to send and receive messages. This in itself compromises the privacy of peers in their role as end users. Then, on the other hand, since the network comprises all end users linked by ad hoc connections subject to churn, the system lacks a global network view. This makes it infeasible to implement publicly trusted verification mechanisms to evaluate the behavior of peers in their role as relays, meaning that as the network grows there are no good solutions for the detection of adversarial peers.
HOPR claims to provide privacy guarantees towards global passive network adversaries and adversaries who run a few malicious peers. These claims, however, don’t stand up to scrutiny, as there are various attacks that such adversaries can carry out that the HOPR architecture simply is not adequate to defend against.
Below we review these attacks and describe if and how they can be deployed on each of the two system architectures.
Eclipse attacks
The Eclipse attack is a well known attack that affects many p2p designs and is particularly devastating on networks that attempt to provide private communications. A key feature exploited by this attack is the fact that each peer only knows a small subset of other peers in the network, rather than having a global view on the full network. This is by design in such networks, as requiring global knowledge would make it impossible for these networks to scale. Note that the fraction of the network known to any individual peer diminishes as the network grows, meaning that a global up-to-date view on the full network becomes ever more elusive as the network scales up.
The Eclipse attack is conducted by an adversary who runs malicious peers. The malicious peers befriend a victim peer and provide contacts of further malicious peers, populating the connections of the victim with peers controlled by the adversary. When later on the victim selects routes for “anonymously” relaying data packets, she will be selecting from a list filled with mostly (or only) adversarial relays. In this situation it becomes possible for the adversary to capture the full route and compromise the victim’s communications, identifying the sender and receiver and thus breaking the privacy promises. Even more concerning, this is a stealthy attack where the victim will never know that she was choosing adversarial relays that were monitoring her communications. Note that in this respect Eclipse attacks are more devastating on mixnets than they are on networks such as Bitcoin, where the victim may mitigate or at least detect the attack by seeking alternative ways of obtaining the main chain.
Depending on the adversary’s motivation, this attack can be deployed in a targeted fashion against selected users, used to confirm that a pair of suspects are communicating, or more generally deployed opportunistically against as many HOPR users as possible. A malicious “bootstrap node” can deploy the attack by itself, meaning that bootstrap nodes are single points of failure when it comes to Eclipse attacks. Even if bootstrap nodes are honest, adversaries may strategically position their peers near bootstrap nodes to facilitate the attack. Eclipse attacks are a fundamental problem of this type of network that’s genuinely hard to solve and for which no practical effective solutions are available.
Eclipse attacks are however not a concern in client-server architectures such as the one chosen by Nym, because the full list of active mix nodes and their keys is available to all users via the public Nym blockchain and updated at regular intervals. All users choose routes from the full set of mix nodes (relays) in exactly the same way. Thus, it is no longer possible for the adversary to poison the contact lists of targeted victims by giving them a version of the network that is plagued by adversaries.
Furthermore, in Nym the mixnet is structured in layers and the placement of mixes in layers is randomized in a way that can’t be biased, which prevents adversarial nodes from choosing their position in the network in order to optimize attacks or target selected users. To compromise a significant amount of packets in Nym, an adversary would need to control a large fraction of the mix nodes, and even then, this would affect the packets of random users, rather than specific targets of interest to the adversary.
Finally, each of the Nym mix nodes requires substantial stake, community support, and reliable quality of service over time to be part of the Nym network — meaning that there is a high cost to compromising them. In contrast, in the case of HOPR, it’s enough for the adversary to spawn cheap malicious peers and then advertise those to targeted victims to trick them into choosing malicious peers for relaying their data packets.
Predecessor attacks
In networks such as HOPR, sender privacy properties are based on the uncertainty of an intermediate relay about whether its predecessor is the original sender of the message (or just an intermediate relay). Similarly, uncertainty on whether a relay’s successor is the final destination (or just another intermediary) is what provides privacy for the message recipient.
This privacy concept, first proposed in the late 90s by a system called Crowds, was designed for networks that use hop-by-hop routing and where the network is fully connected, meaning that all peers have direct connections to all other peers. Neither of these characteristics apply to HOPR, which relies on source routing (their use of the Sphinx packet format requires the sender to select a full route through the network and encrypt the packet with the keys of the chosen intermediaries, which must be locally available when preparing the packet) and sparse connectivity (because it is expensive for a peer to maintain network connections, which involve committing collateral for setting up payment channels, with a vast number of other peers).
As HOPR scales up, we can expect the network to become more sparse due to limits on the number of connections that each peer maintains simultaneously. We can also expect the amount of contacts known by each peer to remain stable and thus represent a smaller fraction of the full network as the network grows.
Source routing that is dependent on recipient location in the network, combined with sparsely connected networks, is ill-suited to achieve sender privacy. This is because an intermediary can determine with high confidence that its predecessor is the sender of the packet. Even worse, if the first and last intermediaries collaborate, they may easily be able to determine the sender-receiver pair of the packet, completely breaking the communication privacy that was promised.
The reason for the bad privacy properties of this network design is that a peer who wants to send a message to a recipient will have very few routing choices in a sparse network, as well as incentives to choose the shortest route (particularly if payments are per hop). Given the stringent routing constraints imposed by network sparsity and partial knowledge, added to the cost of longer routes, and to the fact that routing choices are dependent on the recipient, there is very little room for uncertainty about the choice of route that a peer has to take to send data to a given recipient. This determinism is what allows intermediaries to confidently infer whether their predecessor is the original sender or their successor is the end recipient of a packet they are routing and therefore de-anonymize who is talking to who.
That peers only have partial knowledge of the network also makes HOPR users vulnerable to Epistemic Attacks, further diminishing communication privacy. This attack relies on exploiting the simple fact that in source routing, a victim cannot choose relays it is not aware of. Thus, an adversary who knows the list of HOPR relays known to a set of victims can use this knowledge as a fingerprint to identify which route could have been chosen by which victim and who is likely to be the final recipient.
In contrast, in client-server mixnets with a layered architecture such as Nym, anonymity is based on a different principle. First-layer nodes know they are receiving packets from senders, but because all senders may choose all existing routes with the same probability independently of the receiver, routing choices leak no information about the possible sender or destination of a packet. Nodes situated after the first layer learn no information about the possible sender of a packet they are routing. In Nym, a single honest node in a packet’s route is enough to prevent this attack and provide a high degree of privacy.
Tracing packets through timing attacks
The key principle of communication privacy is to aggregate packets from many users into a big anonymity set, so that it becomes impossible to figure out who sent data to whom. If too few packets are being “mixed” together at each intermediary, then the network cannot guarantee adequate communication privacy against network adversaries who observe packets arriving and departing from peers. Such an adversary can exploit the timing of arrival and departure of packets to link the inputs to the outputs of intermediaries and therefore trace packets from sender to recipient.
In a mixnet, the anonymity provided by each mix node is a function of the number of output packets that may correspond to each input packet. This is determined by two parameters: (1) the volume of packets arriving at the node and (2) the random delay added to packets for mixing (reordering) purposes.
Unfortunately, networks where all participants act as relays have the effect of “thin traffic” going through each node, because (by design) traffic is spread out thin through the network. This is a simple back-of-the-envelope calculation best illustrated with an example. Consider a million users each sending on average a packet per minute with an average path length of 3 hops. In HOPR, such a situation implies that each peer is routing on average 4 packets per minute: its own, and 3 from others that have chosen it as relay. If each packet is randomly delayed on average 1 second per hop, in the vast majority of cases a packet will not coincide with any other packet while traversing the relay, meaning that it will have no anonymity whatsoever towards a passive network adversary. If the average delay per hop is 1 minute, then the packet will coincide with a handful of other packets, obtaining a tiny anonymity set.
In contrast, in Nym a user base of a million where each user generates a packet per minute (about 17,000 packets per second) is served by a network with about 60 mix nodes organized in 3 layers (each mix node can process more than a thousand of packets per second). Whenever a packet enters a node where it’s delayed a second on average, there are thousands of outputs that may correspond to that packet sent by the node over the next few seconds. As the packet traverses various mixnet layers, its anonymity set grows to encompass messages sent by the whole user base, i.e., tens of thousands of messages for a 1 second latency per hop (compared to at best a handful messages for HOPR), and several million for a latency of 1 minute per hop (compared to a few dozen for HOPR). As we can see, for the same end-to-end latency, aggregating traffic in fewer nodes leads to privacy protection that is orders of magnitude stronger than spreading traffic over all peers. Note that HOPR cannot reach anonymity sets comparable to Nym’s by adding cover traffic, unless it sends thousands of dummy messages per real message, which is unsustainable.
This is because of the following: in Nym, once a packet enters the mixnet, the number of outputs that may correspond to that packet depends on the traffic being generated by the full user base of Nym. In HOPR, the number of relay outputs that may correspond to an input packet is much smaller as it is only influenced by the (small) fraction of traffic generated in a local region of the network that happens to go through the same relays at the same time.
Note that in terms of scalability, in Nym a growing user base results in stronger privacy for all packets, as the Nym design aggregates all users’ packets into one huge anonymity set. In HOPR on the other hand, the traffic per node remains constant as the network grows. This is because all new peers are both senders and relays, meaning that any increase in total traffic load is absorbed by the increased number of relays. This means that as HOPR network scales, the privacy offered by the network does not become better.
Client enumeration attacks
Conflating end users and relays in a network (by having all peers fulfill both functions) has the particularly pernicious effect of exposing all participants, i.e., all end users, to client enumeration attacks. In HOPR this is a very low-cost attack that can be deployed by an adversary that runs a few peers — or even just one peer. By simply engaging in the peer discovery protocol with as many other HOPR peers as possible, the adversary can compile over time an extensive list of HOPR user addresses and who they have payment channels (available routes) with.
Again, this has no simple solution in a network like HOPR, where by design all peers have to advertise themselves in order to establish ad hoc connections to other peers and thus provide the network connectivity needed for relaying data packets from senders to receivers.
Enumeration not only exposes the IP addresses of all end users but also enables target selection for further attacks. For example, the adversary can use the knowledge obtained through enumeration to position the adversarial nodes around target victims and then compromise their communications.
In contrast, in Nym only the mix node addresses are publicly exposed. These are the infrastructure providers who relay data packets for others (end users). End users only expose their IP address to the mixnet gateway of their choosing. Gateways relay packets from end users to the first layer of the mixnet, shielding end user addresses from the set of nodes forming the mixnet. In Nym there is no need for end users to advertise their network address or known contacts to other users. Exposure of their IP address is limited to their entry point to the mixnet, which is a gateway node freely chosen by the user based on their own criteria. Given an adversary that can run a few nodes (gateways) and users in the network, in Nym this adversary can at most learn the addresses of those few users that the adversary can persuade to choose a malicious gateway.
Disincentives to delay (mix) packets
What differentiates mixnets from onion routing networks like Tor is the reordering (mixing) of packets at each intermediate node to make packets untraceable as they are routed. HOPR has been unclear about how packet mixing is implemented by relays. According to HOPR’s whitepaper, they are developing a “Chaumian mixnet”, while the algorithm currently implemented seems to work rather differently from a Chaumian mix. Instead of batching and shuffling a fixed number of packets, in HOPR a node adds its incoming packets to a single queue from which it picks packets one at the time (uniformly at random) to process and send on to the next hop.
The mixing algorithm as currently implemented has undesirable dependencies on network load. When the traffic load in the network is high the nodes’ queues grow and so does the variance of latency per hop, implying that the sender has no guarantees on bounds for delivery time of sent packets. On the other hand, when the traffic load in the network is low, the queue will be empty most of the time, and packets are reordered minimally or even not at all, making the node ineffective in providing any privacy.
Furthermore, note that if a sender and receiver pair exchange multiple packets, small anonymity sets per packet can lead to full identification via statistical disclosure attacks. In contrast to Nym, HOPR does not allow clients to control the desired anonymity by adjusting how long each packet is kept at a mix node. As a result, HOPR’s current mixing technique can neither tune the level of offered anonymity nor bound end-to-end latency.
Assuming that HOPR upgrades their design to use concepts of Poisson mixing similar to those already implemented by Nym, we find that the HOPR design embeds implicit disincentives for intermediate relays to respect sender-specified delays. At the same time, HOPR lacks secure verification mechanisms for checking the latency applied by intermediaries.
In essence, mix nodes have two tasks: processing and retaining packets for an amount of time, and relaying them to their next destination. While HOPR has a mechanism to check that a node relayed the message to the next hop, it is impossible in HOPR (for the sender or anyone else) to verify that nodes are correctly delaying the packets they relay.
When looking closely at HOPR, we find that the design incentivizes relay nodes to not delay packets at all beyond the strict minimum required to perform the Sphinx processing. The disincentive is due to the fact that respecting a requested delay increases the risk of the next hop going offline or closing the payment channel, which would result in an economic loss for the relay currently keeping the packet. Sending packets immediately increases the chances of the relay receiving a payment for routing the packet. Note that if relays send packets immediately, it is trivial for a passive network observer to link inputs to outputs and trace packets through the network. In this sense, economically motivated selfish behavior on the part of relays facilitates packet tracing for passive network adversaries.
Client-server architectures such as Nym on the other hand can benefit from using verifiable pseudo-randomly created measurement packets that sample whether mix nodes are delaying packets according to the protocol. Such a verification technique is practical in Nym because the number of mix nodes is orders of magnitude smaller than the user base and thus full coverage and a consistent collective view is feasible to obtain.
These verification techniques cannot be applied in HOPR for two reasons. First, because in HOPR traffic is spread out too thin to obtain enough samples to produce reliable measurements per hop that are not easily biased by an adversary running a subset of peers. Second, the fact that different peers have knowledge of different subsets of the network makes it impossible to obtain a global reputation score on relay behaviour that reliably indicates to the full user base the extent to which nodes are following the routing protocol and respecting sending-specified delays.
Payment protocol leaks
Both projects incentivise nodes to relay traffic. However, while in Nym payments are based on publicly verifiable network-wide measurements, HOPR relies on probabilistic micropayment channels established ad hoc between pairs of nodes.
When combining a payment protocol with a private routing network it is crucial that the payments do not undermine the privacy properties offered by the network. We find that this is not the case with the payment protocol adopted by HOPR.
The Sphinx packet format, used by HOPR to encapsulate and route data packets, provides privacy properties that allow concealing towards a relay its position in the packet’s route. Thanks to this property of Sphinx, an intermediate node cannot determine whether its successor is an intermediary or the final destination, which provides some degree of recipient privacy. Similarly, not revealing the position in the path contributes to sender privacy.
In HOPR payments are performed hop-by-hop along a packet’s route, i.e., payments are conducted between adjacent nodes via a shared payment channel. To send a packet through HOPR, the sender locks in its channel with the first relay the aggregate payment for all intermediaries in the packet’s route. When processing the packet, the first relay takes its share from that aggregate payment and forwards to the second relay the remaining funds, along with the packet. In return, the second relay reveals a secret needed by the first to prove that the packet was relayed and thus redeem the payment. This is repeated for each intermediate relay, which is incentivised to forward the packet to the next hop in order to receive the secret needed to redeem their payment.
This payment protocol leaks information that allows nodes to determine their position in the route. In particular, relays can see the payment amount that is transferred forward for paying the subsequent intermediaries, which trivially reveals the number of intermediaries between a relay and the final destination. In the case of the last relay in the route, the payment amount transferred to the recipient is zero, which means that the last intermediary has no uncertainty whatsoever about the fact that the successor is the end recipient of the data packet rather than an intermediary. Similarly, a high distance from the destination indicates to a relay that the predecessor is more likely to be the original sender of the packet. In this way, the privacy protections afforded by Sphinx are severely undermined by the information leaked via the payment protocol. In addition, note that the information leaked by the payment protocol further aggravates predecessor attacks allowing an adversary running some relays to better infer sender-recipient pairs.
Summary
We hope to have clarified the differences between Nym and HOPR. Although the latter seems to advertise similar claims and goals to the ones that motivate Nym, the designs differ on many core points. While Nym was designed to offer strong privacy to its users by using the best available technologies, HOPR’s design relies on techniques that suffer from known flaws.
We would like to stress that we have raised our concerns with the HOPR team several times, including this article before publishing it. The described attacks in this article fall out of scope of responsible disclosure as they cannot be easily fixed or patched, since they are inherent to peer-to-peer networks, and despite decades of research no practical solutions are currently available. We hope that HOPR does not overlook these issues and that they will adequately communicate the privacy limitations of their system to the users, as otherwise users may have a false sense of security which could endanger them.