Fixing(*) Wireguard
Join me while I prescribe a solution to the problem and prevail as a Wireguard fan!
Fixing(*) Wireguard
Introduction
I must admit the title is a little bit misleading. Wireguard is a very nice tunneling implementation by J. Donenfeld. There is nothing fundamentally wrong with it. However where I live, the protocol is crippled and somewhat blocked by the provider firewalls. Join me while I prescribe a solution to the problem and prevail as a Wireguard fan!
The Protocol
Wireguard has several packet types. Let’s start with describing the protocol.
Following the Noise protocol, first the initiator sends the following packet to the responder.
msg = handshake_initiation {
u8 message_type
u8 reserved_zero[3]
u32 sender_index
u8 unencrypted_ephemeral[32]
u8 encrypted_static[AEAD_LEN(32)]
u8 encrypted_timestamp[AEAD_LEN(12)]
u8 mac1[16]
u8 mac2[16]
}
Then, the responder replies with the following:
msg = handshake_response {
u8 message_type
u8 reserved_zero[3]
u32 sender_index
u32 receiver_index
u8 unencrypted_ephemeral[32]
u8 encrypted_nothing[AEAD_LEN(0)]
u8 mac1[16]
u8 mac2[16]
}
After this point, it is possible to send data packets in the following form:
msg = packet_data {
u8 message_type
u8 reserved_zero[3]
u32 receiver_index
u64 counter
u8 encrypted_encapsulated_packet[]
}
One last packet type is the cookies. They are basically used to enforce DDoS protection. Also they prevent any adversary from lifting a side-channel attack.
msg = packet_cookie_reply {
u8 message_type
u8 reserved_zero[3]
u32 receiver_index
u8 nonce[24]
u8 encrypted_cookie[AEAD_LEN(16)]
}
Blocking Wireguard
It is rather easy to block the protocol by just looking at first 4 bytes. In my case, the handshake (initiation + response) packets are getting dropped by matching rules. Let’s see the details.
All packet types have an initial field called u8 packet_type
. This
is a u8 (byte) value. Here are the constants defined for each packet
type:
Packet Type | u8 message_type | message_type l.s. nibble |
---|---|---|
initiation | 1 | 0001 |
response | 2 | 0010 |
data | 4 | 0100 |
cookie | 3 | 0011 |
So all we need to look for is, UDP trafic on default Wireguard port
51820 and a 01 00 00 00 in network-byte order. This is just 1
when it is converted on my little-endian system.
No decryption or signature verification is required for this kind of
observation. Now, as a firewall, just memoize this and look for a
packet with message_type
2, a response in the other direction. If you
see a such a packet, just drop such for 5 minutes.
This will make Wireguard KEX and handshake fail. The initiator will never get the response and will retry sending initiation packets.
Simple, huh?
Internals
Before proceeding, let’s look into the details of the implementation in the kernel.
The first structure is called struct wg_peer
defined in
drivers/net/wireguard/peer.h. It defines the peer and holds relevant information.
struct wg_peer {
struct wg_device *device;
struct prev_queue tx_queue, rx_queue;
struct sk_buff_head staged_packet_queue;
int serial_work_cpu;
bool is_dead;
struct noise_keypairs keypairs;
struct endpoint endpoint;
struct dst_cache endpoint_cache;
rwlock_t endpoint_lock;
struct noise_handshake handshake;
atomic64_t last_sent_handshake;
struct work_struct transmit_handshake_work, clear_peer_work, transmit_packet_work;
struct cookie latest_cookie;
struct hlist_node pubkey_hash;
u64 rx_bytes, tx_bytes;
struct timer_list timer_retransmit_handshake, timer_send_keepalive;
struct timer_list timer_new_handshake, timer_zero_key_material;
struct timer_list timer_persistent_keepalive;
unsigned int timer_handshake_attempts;
u16 persistent_keepalive_interval;
bool timer_need_another_keepalive;
bool sent_lastminute_handshake;
struct timespec64 walltime_last_handshake;
struct kref refcount;
struct rcu_head rcu;
struct list_head peer_list;
struct list_head allowedips_list;
struct napi_struct napi;
u64 internal_id;
};
Initiation packet is generated in noise.c:
bool
wg_noise_handshake_create_initiation(struct message_handshake_initiation *dst,
struct noise_handshake *handshake)
{
u8 timestamp[NOISE_TIMESTAMP_LEN];
u8 key[NOISE_SYMMETRIC_KEY_LEN];
bool ret = false;
/* We need to wait for crng _before_ taking any locks, since
* curve25519_generate_secret uses get_random_bytes_wait.
*/
wait_for_random_bytes();
down_read(&handshake->static_identity->lock);
down_write(&handshake->lock);
if (unlikely(!handshake->static_identity->has_identity))
goto out;
dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION);
handshake_init(handshake->chaining_key, handshake->hash,
handshake->remote_static);
/* e */
curve25519_generate_secret(handshake->ephemeral_private);
if (!curve25519_generate_public(dst->unencrypted_ephemeral,
handshake->ephemeral_private))
.. skiped for brevity ..
Response packet is generated similarly.
bool wg_noise_handshake_create_response(struct message_handshake_response *dst,
struct noise_handshake *handshake,
bool is_blocked)
{
u8 key[NOISE_SYMMETRIC_KEY_LEN];
bool ret = false;
/* We need to wait for crng _before_ taking any locks, since
* curve25519_generate_secret uses get_random_bytes_wait.
*/
wait_for_random_bytes();
down_read(&handshake->static_identity->lock);
down_write(&handshake->lock);
if (handshake->state != HANDSHAKE_CONSUMED_INITIATION)
goto out;
dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE);
dst->receiver_index = handshake->remote_index;
/* e */
curve25519_generate_secret(handshake->ephemeral_private);
.. skipped for brevity ..
The packet validation function is in receive.c.
static size_t validate_header_len(struct sk_buff *skb)
{
if (unlikely(skb->len < sizeof(struct message_header)))
return 0;
if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_DATA) &&
skb->len >= MESSAGE_MINIMUM_LENGTH)
return sizeof(struct message_data);
if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION) &&
skb->len == sizeof(struct message_handshake_initiation))
return sizeof(struct message_handshake_initiation);
if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE) &&
skb->len == sizeof(struct message_handshake_response))
return sizeof(struct message_handshake_response);
if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE) &&
skb->len == sizeof(struct message_handshake_cookie))
return sizeof(struct message_handshake_cookie);
return 0;
}
Similarly, in the same file, the incoming packet sieve can be found in
wg_receive_handshake_packet()
:
static void wg_receive_handshake_packet(struct wg_device *wg,
struct sk_buff *skb)
{
enum cookie_mac_state mac_state;
struct wg_peer *peer = NULL;
/* This is global, so that our load calculation applies to the whole
* system. We don't care about races with it at all.
*/
static u64 last_under_load;
bool packet_needs_cookie;
bool under_load;
if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE)) {
net_dbg_skb_ratelimited("%s: Receiving cookie response from %pISpfsc\n",
wg->dev->name, skb);
wg_cookie_message_consume(
(struct message_handshake_cookie *)skb->data, wg);
return;
}
under_load = atomic_read(&wg->handshake_queue_len) >=
MAX_QUEUED_INCOMING_HANDSHAKES / 8;
if (under_load) {
last_under_load = ktime_get_coarse_boottime_ns();
} else if (last_under_load) {
under_load = !wg_birthdate_has_expired(last_under_load, 1);
if (!under_load)
last_under_load = 0;
}
mac_state = wg_cookie_validate_packet(&wg->cookie_checker, skb,
under_load);
if ((under_load && mac_state == VALID_MAC_WITH_COOKIE) ||
(!under_load && mac_state == VALID_MAC_BUT_NO_COOKIE)) {
packet_needs_cookie = false;
} else if (under_load && mac_state == VALID_MAC_BUT_NO_COOKIE) {
packet_needs_cookie = true;
} else {
net_dbg_skb_ratelimited("%s: Invalid MAC of handshake, dropping packet from %pISpfsc\n",
wg->dev->name, skb);
return;
}
switch (SKB_TYPE_LE32(skb)) {
case cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION): {
struct message_handshake_initiation *message =
(struct message_handshake_initiation *)skb->data;
if (packet_needs_cookie) {
wg_packet_send_handshake_cookie(wg, skb,
message->sender_index);
return;
}
peer = wg_noise_handshake_consume_initiation(message, wg);
if (unlikely(!peer)) {
net_dbg_skb_ratelimited("%s: Invalid handshake initiation from %pISpfsc\n",
wg->dev->name, skb);
return;
}
wg_socket_set_peer_endpoint_from_skb(peer, skb);
net_dbg_ratelimited("%s: Receiving handshake initiation from peer %llu (%pISpfsc)\n",
wg->dev->name, peer->internal_id,
&peer->endpoint.addr);
wg_packet_send_handshake_response(peer);
break;
}
case cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE): {
struct message_handshake_response *message =
(struct message_handshake_response *)skb->data;
if (packet_needs_cookie) {
wg_packet_send_handshake_cookie(wg, skb,
message->sender_index);
return;
}
peer = wg_noise_handshake_consume_response(message, wg);
if (unlikely(!peer)) {
net_dbg_skb_ratelimited("%s: Invalid handshake response from %pISpfsc\n",
wg->dev->name, skb);
return;
}
wg_socket_set_peer_endpoint_from_skb(peer, skb);
net_dbg_ratelimited("%s: Receiving handshake response from peer %llu (%pISpfsc)\n",
wg->dev->name, peer->internal_id,
&peer->endpoint.addr);
if (wg_noise_handshake_begin_session(&peer->handshake,
&peer->keypairs)) {
wg_timers_session_derived(peer);
wg_timers_handshake_complete(peer);
/* Calling this function will either send any existing
* packets in the queue and not send a keepalive, which
* is the best case, Or, if there's nothing in the
* queue, it will send a keepalive, in order to give
* immediate confirmation of the session.
*/
wg_packet_send_keepalive(peer);
}
break;
}
}
if (unlikely(!peer)) {
WARN(1, "Somehow a wrong type of packet wound up in the handshake queue!\n");
return;
}
local_bh_disable();
update_rx_stats(peer, skb->len);
local_bh_enable();
wg_timers_any_authenticated_packet_received(peer);
wg_timers_any_authenticated_packet_traversal(peer);
wg_peer_put(peer);
}
These will be enough for understanding the patch. Let’s look into the solution.
The Solution
The solution is pretty simple. The most significant bit of the least significant nibble of the message_type is unused. If we can flip that bit, the matching firewall would never think that the UDP traffic belongs to Wireguard and let you complete the handshake. After that point, the data transmission would follow without any problems.
Here are the new u8 message_type
’s:
Packet Type | u8 message_type | message_type l.s. nibble |
---|---|---|
initiation | 9 | 1001 |
response | 10 | 1010 |
data | 12 | 1100 |
cookie | 11 | 1011 |
The Patch
Let see a patch against Linux 5.16.3 and beyond to flip the bit.
--- ./drivers/net/Kconfig 2022-01-27 14:03:05.000000000 +0300
+++ ../linux-5.16.3/drivers/net/Kconfig 2022-02-01 16:23:30.534927386 +0300
@@ -116,6 +116,14 @@
Say N here unless you know what you're doing.
+config WIREGUARD_EVRIM
+ bool "A fix for blocking wireguard traffic"
+ depends on WIREGUARD
+ default n
+ help
+ This will fix the problem of blocking wireguard traffic.
+ Say N to censorship.
+
config EQUALIZER
tristate "EQL (serial line load balancing) support"
help
First, we need a switch to activate the fix. Enable
CONFIG_WIREGUARD_EVRIM y
in your config.
--- ./drivers/net/wireguard/peer.h 2022-01-27 14:03:05.000000000 +0300
+++ ../linux-5.16.3/drivers/net/wireguard/peer.h 2022-02-01 16:23:30.538927403 +0300
@@ -40,6 +40,7 @@
struct sk_buff_head staged_packet_queue;
int serial_work_cpu;
bool is_dead;
+ bool is_blocked;
struct noise_keypairs keypairs;
struct endpoint endpoint;
struct dst_cache endpoint_cache;
Add a flag to the peer structure (can be accessed via
PACKET_PEER(skb)
) to decide whether the peer is blocked or not. The
peer will send initiaion packet with u8 message_type
9 if that’s the
case.
--- ./drivers/net/wireguard/peer.c 2022-01-27 14:03:05.000000000 +0300
+++ ../linux-5.16.3/drivers/net/wireguard/peer.c 2022-02-01 16:23:30.538927403 +0300
@@ -61,6 +61,7 @@
INIT_LIST_HEAD(&peer->allowedips_list);
wg_pubkey_hashtable_add(wg->peer_hashtable, peer);
++wg->num_peers;
+ peer->is_blocked = false; // unblocked by default
pr_debug("%s: Peer %llu created\n", wg->dev->name, peer->internal_id);
return peer;
The peer is by default unblocked.
bool wg_noise_handshake_create_response(struct message_handshake_response *dst,
- struct noise_handshake *handshake)
+ struct noise_handshake *handshake,
+ bool is_blocked)
{
u8 key[NOISE_SYMMETRIC_KEY_LEN];
bool ret = false;
@@ -648,7 +653,11 @@
if (handshake->state != HANDSHAKE_CONSUMED_INITIATION)
goto out;
- dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE);
+ if (is_blocked)
+ dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE | 8);
+ else
+ dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE);
+
dst->receiver_index = handshake->remote_index;
/* e */
If CONFIG_WIREGUARD_EVRIM
is enabled, make sure initiation packets
have u8 message_type
9.
--- ./drivers/net/wireguard/noise.c 2022-01-27 14:03:05.000000000 +0300
+++ ../linux-5.16.3/drivers/net/wireguard/noise.c 2022-02-01 16:23:30.537927399 +0300
@@ -499,7 +499,11 @@
if (unlikely(!handshake->static_identity->has_identity))
goto out;
- dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION);
+#if IS_ENABLED(CONFIG_WIREGUARD_EVRIM)
+ dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION | 8);
+#else
+ dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION);
+#endif
handshake_init(handshake->chaining_key, handshake->hash,
handshake->remote_static);
@@ -632,7 +636,8 @@
}
Send response with u8 message_type
10 if the peer is blocked.
--- ./drivers/net/wireguard/receive.c 2022-01-27 14:03:05.000000000 +0300
+++ ../linux-5.16.3/drivers/net/wireguard/receive.c 2022-02-01 16:25:33.512454687 +0300
@@ -36,16 +36,20 @@
{
if (unlikely(skb->len < sizeof(struct message_header)))
return 0;
- if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_DATA) &&
+ if (((SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_DATA)) ||
+ (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_DATA | 8))) &&
skb->len >= MESSAGE_MINIMUM_LENGTH)
return sizeof(struct message_data);
- if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION) &&
+ if (((SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION)) ||
+ (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION | 8))) &&
skb->len == sizeof(struct message_handshake_initiation))
return sizeof(struct message_handshake_initiation);
- if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE) &&
+ if (((SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE)) ||
+ (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE | 8))) &&
skb->len == sizeof(struct message_handshake_response))
return sizeof(struct message_handshake_response);
- if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE) &&
+ if (((SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE)) ||
+ (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE | 8))) &&
skb->len == sizeof(struct message_handshake_cookie))
return sizeof(struct message_handshake_cookie);
return 0;
@@ -108,7 +112,8 @@
bool packet_needs_cookie;
bool under_load;
- if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE)) {
+ if ((SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE)) ||
+ (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE | 8))) {
net_dbg_skb_ratelimited("%s: Receiving cookie response from %pISpfsc\n",
wg->dev->name, skb);
wg_cookie_message_consume(
@@ -139,6 +144,7 @@
}
switch (SKB_TYPE_LE32(skb)) {
+ case cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION | 8):
case cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION): {
struct message_handshake_initiation *message =
(struct message_handshake_initiation *)skb->data;
@@ -158,9 +164,13 @@
net_dbg_ratelimited("%s: Receiving handshake initiation from peer %llu (%pISpfsc)\n",
wg->dev->name, peer->internal_id,
&peer->endpoint.addr);
+ if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION | 8)) {
+ peer->is_blocked = true;
+ }
wg_packet_send_handshake_response(peer);
break;
}
+ case cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE | 8):
case cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE): {
struct message_handshake_response *message =
(struct message_handshake_response *)skb->data;
@@ -551,8 +561,11 @@
if (unlikely(prepare_skb_header(skb, wg) < 0))
goto err;
switch (SKB_TYPE_LE32(skb)) {
+ case cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION | 8):
case cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION):
case cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE):
+ case cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE | 8):
+ case cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE | 8):
case cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE): {
int cpu, ret = -EBUSY;
@@ -578,6 +591,7 @@
&per_cpu_ptr(wg->handshake_queue.worker, cpu)->work);
break;
}
+ case cpu_to_le32(MESSAGE_DATA | 8):
case cpu_to_le32(MESSAGE_DATA):
PACKET_CB(skb)->ds = ip_tunnel_get_dsfield(ip_hdr(skb), skb);
wg_packet_consume_data(wg, skb);
This is a backward compatible way of handling incoming packets. Note
that if we receive an initiation with u8 message_type
9, then we set
peer->is_blocked = true;
.
--- ./drivers/net/wireguard/send.c 2022-01-27 14:03:05.000000000 +0300
+++ ../linux-5.16.3/drivers/net/wireguard/send.c 2022-02-01 16:23:30.541927415 +0300
@@ -91,7 +91,7 @@
peer->device->dev->name, peer->internal_id,
&peer->endpoint.addr);
- if (wg_noise_handshake_create_response(&packet, &peer->handshake)) {
+ if (wg_noise_handshake_create_response(&packet, &peer->handshake, peer->is_blocked)) {
wg_cookie_add_mac_to_packet(&packet, sizeof(packet), peer);
if (wg_noise_handshake_begin_session(&peer->handshake,
&peer->keypairs)) {
The above adds peer blocking status parameter to the
wg_noise_create_response
function.
@@ -166,6 +166,9 @@
struct message_data *header;
struct sk_buff *trailer;
int num_frags;
+#if !IS_ENABLED(CONFIG_WIREGUARD_EVRIM)
+ struct wg_peer *peer = keypair->entry.peer;
+#endif
/* Force hash calculation before encryption so that flow analysis is
* consistent over the inner packet.
@@ -203,7 +206,16 @@
*/
skb_set_inner_network_header(skb, 0);
header = (struct message_data *)skb_push(skb, sizeof(*header));
- header->header.type = cpu_to_le32(MESSAGE_DATA);
+
+#if IS_ENABLED(CONFIG_WIREGUARD_EVRIM)
+ header->header.type = cpu_to_le32(MESSAGE_DATA | 8);
+#else
+ if (peer && peer->is_blocked)
+ header->header.type = cpu_to_le32(MESSAGE_DATA | 8);
+ else
+ header->header.type = cpu_to_le32(MESSAGE_DATA);
+#endif
+
header->key_idx = keypair->remote_index;
header->counter = cpu_to_le64(PACKET_CB(skb)->nonce);
pskb_put(skb, trailer, trailer_len);
Fix the data packets.
--- ./drivers/net/wireguard/cookie.c 2022-01-27 14:03:05.000000000 +0300
+++ ../linux-5.16.3/drivers/net/wireguard/cookie.c 2022-02-01 16:23:30.535927390 +0300
@@ -9,6 +9,7 @@
#include "messages.h"
#include "ratelimiter.h"
#include "timers.h"
+#include "queueing.h"
#include <crypto/blake2s.h>
#include <crypto/chacha20poly1305.h>
@@ -185,7 +186,11 @@
((u8 *)skb->data + skb->len - sizeof(*macs));
u8 cookie[COOKIE_LEN];
- dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE);
+ if (PACKET_PEER(skb)->is_blocked)
+ dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE | 8);
+ else
+ dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE);
+
dst->receiver_index = index;
get_random_bytes_wait(dst->nonce, COOKIE_NONCE_LEN);
Fix cookie packets u8 message_type
, or it with 8.
Conclusion
In this article, I’ve gone over a simple patch to bypass firewall
rules that blocks Wireguard. A more modern patch against Linux 5.18.0
and later can be found
here. Just
enable WIREGUARD_FLIP_MSB
config option (WIREGUARD_EVRIM
it was
before) and recompile. You can also adjust the MESSAGE_FLIP_CONSTANT
in net/wireguard/messages.h.
/* Constant used by WIREGUARD_FLIP_MSB */
#define MESSAGE_FLIP_CONSTANT MESSAGE_FLIP8
I wish the field u8 message_type
in the protocol is removed in the
future releases since it is rather easy to identify it over the
wire. I did not bother to update the UDP checksum since the data is
already encrypted with AEAD.
I hope nobody else really needs this patch ;) Finally, if you need custom crypto algorithm implementations and out-of-tree modules, just let me know. PQC’s are pretty popular these days.