scipio avatar

Learn Zig Series (#95) - Mini Project: Chat Server - Protocol Design

scipio

Published: 01 Jul 2026 › Updated: 01 Jul 2026Learn Zig Series (#95) - Mini Project: Chat Server - Protocol Design

Learn Zig Series (#95) - Mini Project: Chat Server - Protocol Design

Learn Zig Series (#95) - Mini Project: Chat Server - Protocol Design

zig.png

What will I learn?

  • Why a real multi-user program starts with the messages, not the sockets -- a protocol is a contract written in bytes, and getting it right up front saves you from rewrites later;
  • How to model every message a chat server and its clients exchange as a single tagged union, so the compiler forces you to handle each variant;
  • How to pick a wire format (length-prefixed binary frames) and why that beats newline-delimited text the moment usernames or messages can contain surprises;
  • How to write an encoder that serialises any message to bytes, and a decoder that reconstructs it while treating every length field as hostile;
  • How Zig's error unions turn a truncated or malformed frame into a named error instead of a silent buffer overrun;
  • How to unit-test the whole codec with a round-trip property -- encode, decode, assert you got back exactly what you put in -- without ever opening a socket;
  • How this hand-rolled protocol compares to reaching for protobuf, JSON, or a framework in C, Rust, or Go.

Requirements

  • A working modern computer running macOS, Windows or Ubuntu;
  • An installed Zig 0.14+ distribution (download from ziglang.org);
  • The ambition to learn Zig programming.

Difficulty

  • Intermediate

Curriculum (of the Learn Zig Series):

Learn Zig Series (#95) - Mini Project: Chat Server - Protocol Design

Solutions to Episode 94 Exercises

Last episode we defeated the NAT -- STUN to discover our public mapping, a rendezvous server to introduce two strangers, and a UDP hole punch to get them talking directly. The three exercises pushed that toward something deployable: detecting your own NAT type, keeping the punched hole alive, and building the rendezvous itself. Each solution below is complete and compilable, and every one of them tests its logic in isolation -- no live sockets, the same discipline we've kept since episode 12.

Exercise 1: Detect your NAT type

The trick is that you send a STUN binding request to two different public servers from the same local socket, then compare the public ports they each report. If the ports match, your NAT reuses one mapping regardless of destination (a cone NAT, punchable). If they differ, it mints a fresh port per destination (symmetric, the one that ruins everything). The network part is just two parseBindingResponse calls from episode 94; the part worth testing is the classification:

const std = @import("std");

const NatClass = enum { cone, symmetric };

// Two different STUN servers, one local socket. Equal public ports -> the NAT
// keeps a single mapping (cone, hole punching will work). Different ports ->
// a new mapping per destination (symmetric, the punch almost always fails).
fn classify(port_from_server_a: u16, port_from_server_b: u16) NatClass {
    return if (port_from_server_a == port_from_server_b) .cone else .symmetric;
}

test "same public port means cone, different means symmetric" {
    try std.testing.expectEqual(NatClass.cone, classify(49152, 49152));
    try std.testing.expectEqual(NatClass.symmetric, classify(49152, 52001));
}

The key insight is that the whole verdict collapses to a single equality check on two u16 values. All the hard work happened in episode 94's XOR-MAPPED-ADDRESS parser; the classifier itself is trivial, which is exactly why it deserves its own tiny test instead of being buried inside the networking code where you can't reach it.

Exercise 2: Add a keepalive timer

A punched hole is not permanent -- NATs expire idle UDP mappings, often after 30 seconds. So a long-lived peer connection sends a small keepalive whenever the line has been quiet too long. The production version feeds std.time.Timer (episode 70) into the predicate, but the predicate is what carries the bug risk, so that is what we test:

const std = @import("std");

const KeepAlive = struct {
    interval_ns: u64,
    last_send_ns: u64,

    // Only fire when the connection has been idle for at least the interval.
    fn due(self: KeepAlive, now_ns: u64) bool {
        return now_ns - self.last_send_ns >= self.interval_ns;
    }
    fn markSent(self: *KeepAlive, now_ns: u64) void {
        self.last_send_ns = now_ns;
    }
};

test "keepalive fires only after the idle interval elapses" {
    const sec = std.time.ns_per_s;
    var ka = KeepAlive{ .interval_ns = 15 * sec, .last_send_ns = 0 };

    try std.testing.expect(!ka.due(10 * sec)); // too soon, stay quiet
    try std.testing.expect(ka.due(15 * sec)); // idle long enough, fire
    ka.markSent(15 * sec); // real traffic OR a keepalive resets the clock
    try std.testing.expect(!ka.due(20 * sec)); // only 5s since, quiet again
}

By passing now_ns in as a parameter instead of calling the clock inside due, the timer logic becomes a pure function of two numbers -- and a pure function is a testable function. In the real loop you call markSent on every actual outbound message too, not just keepalives, so a chatty connection never wastes a byte on redundant pings.

Exercise 3: Write the rendezvous server

The rendezvous is the machine both peers can reach, whose only job is to pair them up and hand each the other's endpoint. The socket plumbing is the accept loop from episode 51; the brain is a HashMap (episode 22) keyed by a shared room name. First peer in waits; second peer in completes the pair:

const std = @import("std");

// The flat six-byte endpoint form STUN gave us in episode 94: 4 addr + 2 port.
const Endpoint = [6]u8;

const Rendezvous = struct {
    waiting: std.StringHashMap(Endpoint),

    fn init(alloc: std.mem.Allocator) Rendezvous {
        return .{ .waiting = std.StringHashMap(Endpoint).init(alloc) };
    }
    fn deinit(self: *Rendezvous) void {
        self.waiting.deinit();
    }

    // Returns the counterpart's endpoint once two peers share a room, else null
    // (meaning "you are first, sit tight until a partner shows up").
    fn arrive(self: *Rendezvous, room: []const u8, me: Endpoint) !?Endpoint {
        if (self.waiting.fetchRemove(room)) |kv| {
            return kv.value; // second peer: the pair is complete
        }
        try self.waiting.put(room, me); // first peer: register and wait
        return null;
    }
};

test "two peers in one room receive each other's endpoint" {
    var rv = Rendezvous.init(std.testing.allocator);
    defer rv.deinit();

    const a: Endpoint = .{ 203, 0, 113, 5, 0xC3, 0x50 }; // 203.0.113.5:50000
    const b: Endpoint = .{ 198, 51, 100, 9, 0xC3, 0x51 };

    try std.testing.expect(try rv.arrive("room-zig", a) == null); // A waits
    const partner = try rv.arrive("room-zig", b); // B completes the pair
    try std.testing.expectEqual(a, partner.?); // B learns A's endpoint
}

The honest detail is fetchRemove: the moment the second peer arrives, the first peer's entry is pulled out of the map in the same call that reads it. No stale entries pile up, no room ever holds three peers, and the pairing is atomic from the map's point of view. Test the pairing in isolation like this and the socket layer wrapped around it becomes almost boring -- which is precisely what you want from the part that faces the network.

Learn Zig Series (#95) - Mini Project: Chat Server - Protocol Design

Here we go ;-) For the last dozen-odd episodes we've been assembling a networking toolkit one sharp tool at a time -- raw TCP and UDP (21, 81), name resolution (82), our own framed formats (90, 91), a proxy (93), and finally NAT traversal (94) so two machines behind home routers can find each other. At the end of episode 94 I promised we'd stop building isolated pieces and put them together into something you could actually run with your friends. This is where that starts: a multi-user chat server and its clients.

And a chat system, like every honest networked program, does not begin with a socket. It begins with a question that has nothing to do with code: what exactly do these machines say to one another? Get that contract wrong and every later episode fights it. Get it right and the server, the client, and the history all fall out of it almost for free. So this episode writes not one line of accept -- instead we design the protocol, the set of messages the participants exchange, and we build the codec that turns those messages into bytes and back. This is the foundation the whole project stands on.

Why the messages come first

There's a temptation, when you sit down to build a chat server, to open a socket, read a line of text, and print it. It even works -- for about five minutes, until someone's message contains a newline, or a username has a space in it, or two messages arrive glued together in one TCP read (episode 21 taught us TCP is a byte stream, not a sequence of messages). Suddenly you're patching a "protocol" you never actually designed, and each patch contradicts the last.

The cure is to treat the protocol as a first-class artefact. Before any I/O, you enumerate every distinct thing a participant can say. In our chat that's a small, closed list:

  • a client wants to join with a chosen nickname;
  • a client wants to send a line of chat;
  • a client is leaving;
  • the server broadcasts somebody's chat line to everyone;
  • the server announces that someone joined or left;
  • the server reports an error (nickname taken, message too long, and so on).

That closed list is the whole contract. Having said that, the beauty of doing this in Zig is that "a small, closed list of alternatives, each carrying its own data" is the exact definition of a tagged union (episodes 6 and 33), and Zig's compiler will refuse to let us forget a case. The forementioned five-or-six messages are, quite literally, the entire vocabulary of the program. So the design and the datatype are the same act.

The message set as a tagged union

Let me write the two halves of the conversation as two tagged unions -- one for what clients send, one for what the server sends. Each variant carries precisely the fields that message needs and nothing more:

const std = @import("std");

// Everything a CLIENT can say to the server.
const ClientMsg = union(enum) {
    join: struct { nick: []const u8 },
    say: struct { text: []const u8 },
    leave: void, // no payload -- "I'm closing the connection cleanly"
};

// Everything the SERVER can say back to the clients.
const ServerMsg = union(enum) {
    welcome: struct { nick: []const u8 }, // your join was accepted
    chat: struct { nick: []const u8, text: []const u8 }, // someone said something
    joined: struct { nick: []const u8 }, // someone new arrived
    parted: struct { nick: []const u8 }, // someone left
    err: struct { code: ErrCode }, // your request was refused
};

const ErrCode = enum(u8) {
    nick_taken = 1,
    nick_invalid = 2,
    message_too_long = 3,
    not_joined = 4,
};

Notice how much design is captured here that a text protocol would leave implicit. The server can never send a say and a client can never send a chat -- the type system says so, at compile time. The leave variant is void because "goodbye" needs no data. And ErrCode is an explicit enum(u8), so an error travels as a single documented byte rather than a free-form English string a client would have to parse (or, worse, display raw to a user). This is the same philosophy as the reply codes in the SOCKS5 proxy (episode 93): a machine-readable code first, human text layered on top only where a human will read it.

Choosing a wire format: length-prefixed frames

Now, how do these values become bytes on a TCP connection? We have prior art from this very series. Episode 90 gave us protobuf's tag-length-value discipline; episode 91 gave us MessagePack. Either would work. But for a project whose whole point is that we understand every byte, I'll roll a deliberately small framing of our own, built from one idea we've leaned on since the key-value store and the file-sync tool: length-prefixed frames.

Every message on the wire is:

[ u16 total-length ][ u8 kind ][ ...payload... ]

The leading u16 (big-endian, always -- byte order is part of the contract, and cross-compilation from episode 35 must never reinterpret it) tells the reader exactly how many bytes this frame occupies, so it can pull one complete message off the stream even when TCP hands it a half-message or two-and-a-half messages at once. The kind byte selects the union variant. Variable-length strings inside the payload each get their own u16 length prefix, for the same reason: so the decoder never has to guess where a nickname ends.

Two hard limits fall straight out of using a u16 for length: no frame exceeds 65535 bytes, and every string is at most 65535 bytes. For a chat protocol that's not a constraint, it's a feature -- it caps how much a hostile client can make us buffer, and we'll enforce even tighter application limits (a sane nickname is not 60 kilobytes long).

The encoder

Serialisation is the easy direction, because we control the input -- we're turning our own well-formed values into bytes. I'll write into a growable buffer and back-patch the total length once I know it, a trick straight from the protobuf episode. Here's the client-side encoder:

const KIND_JOIN: u8 = 1;
const KIND_SAY: u8 = 2;
const KIND_LEAVE: u8 = 3;

fn writeStr(buf: *std.ArrayListUnmanaged(u8), alloc: std.mem.Allocator, s: []const u8) !void {
    var len_be: [2]u8 = undefined;
    std.mem.writeInt(u16, &len_be, @intCast(s.len), .big);
    try buf.appendSlice(alloc, &len_be);
    try buf.appendSlice(alloc, s);
}

// Encode a ClientMsg into a fresh, caller-owned byte slice: [len][kind][payload].
fn encodeClient(alloc: std.mem.Allocator, msg: ClientMsg) ![]u8 {
    var buf: std.ArrayListUnmanaged(u8) = .{};
    errdefer buf.deinit(alloc);

    try buf.appendSlice(alloc, &[_]u8{ 0, 0 }); // placeholder for total length
    switch (msg) {
        .join => |m| {
            try buf.append(alloc, KIND_JOIN);
            try writeStr(&buf, alloc, m.nick);
        },
        .say => |m| {
            try buf.append(alloc, KIND_SAY);
            try writeStr(&buf, alloc, m.text);
        },
        .leave => try buf.append(alloc, KIND_LEAVE),
    }

    // Back-patch the real length now that the frame is complete.
    std.mem.writeInt(u16, buf.items[0..2], @intCast(buf.items.len), .big);
    return buf.toOwnedSlice(alloc);
}

The switch (msg) is doing the load-bearing work: because ClientMsg is a tagged union, Zig forces the switch to be exhaustive. The day I add a fourth client message and forget to encode it, this file will not compile. That is the compiler catching the exact bug -- "we added a message type and half-updated the code" -- that turns into a mysterious runtime protocol mismatch in a language that lets you forget. It's episode 4's error-handling philosophy applied to a design decision instead of a return value.

The decoder: parse every length like the sender means you harm

Deserialisation is where the danger lives, because now the bytes come from someone else -- possibly a buggy client, possibly a malicious one. Every single length field in the incoming frame is a claim we must verify before we trust it. This is the same posture as the STUN attribute walk in episode 94 and the readExact discipline from the DNS episodes: a network-facing parser talks to hostile peers by definition.

First a tiny cursor that hands out bytes and refuses to read past the end:

const Reader = struct {
    data: []const u8,
    pos: usize = 0,

    fn take(self: *Reader, n: usize) ![]const u8 {
        if (self.pos + n > self.data.len) return error.Truncated;
        const out = self.data[self.pos .. self.pos + n];
        self.pos += n;
        return out;
    }
    fn u16(self: *Reader) !u16 {
        const b = try self.take(2);
        return std.mem.readInt(u16, b[0..2], .big);
    }
    fn str(self: *Reader) ![]const u8 {
        const n = try self.u16();
        return self.take(n); // length is checked by take() -- a lie becomes error.Truncated
    }
};

Every access goes through take, and take bounds-checks once, in one place. A string whose declared length runs off the end of the frame does not corrupt memory; it becomes error.Truncated. Now the decoder itself, which reads the kind byte and reconstructs the union:

fn decodeClient(frame: []const u8) !ClientMsg {
    var r = Reader{ .data = frame };
    const total = try r.u16();
    if (total != frame.len) return error.LengthMismatch; // frame lied about its size
    const kind = (try r.take(1))[0];

    return switch (kind) {
        KIND_JOIN => .{ .join = .{ .nick = try r.str() } },
        KIND_SAY => .{ .say = .{ .text = try r.str() } },
        KIND_LEAVE => .leave,
        else => error.UnknownKind, // never trust an unknown tag byte
    };
}

Three things earn their keep here. The total != frame.len check rejects a frame whose self-declared length disagrees with what we actually received -- a classic desync smoke signal. The else => error.UnknownKind branch means a garbage or future-version kind byte gets a named refusal, not a wild jump. And the returned slices (nick, text) point into the original frame buffer -- zero copies -- which is fast and perfectly safe as long as the caller uses the message before freeing the frame (episode 8's lifetime thinking, made concrete). If the server wants to keep a message around longer, it dupes the strings deliberately; the codec never hides an allocation from you.

Testing the codec: the round-trip property

Here's the quiet superpower of designing the protocol as pure functions over byte slices: we can test the entire thing with zero sockets. The property we want is simple and strong -- for any message, decode(encode(m)) gives back m. Encode it, decode it, assert equality of every field:

test "client message round-trips through encode and decode" {
    const alloc = std.testing.allocator;

    const original = ClientMsg{ .join = .{ .nick = "scipio" } };
    const bytes = try encodeClient(alloc, original);
    defer alloc.free(bytes);

    const back = try decodeClient(bytes);
    try std.testing.expectEqualStrings("scipio", back.join.nick);
}

test "a truncated frame is rejected, not read past" {
    // Claim total length 9, but only hand over 4 bytes. take() must catch it.
    const evil = [_]u8{ 0x00, 0x09, KIND_JOIN, 0x00 };
    try std.testing.expectError(error.Truncated, decodeClient(&evil));
}

test "an unknown kind byte gets a named error" {
    // Well-formed length (4), but kind 0x7F is not one we speak.
    const frame = [_]u8{ 0x00, 0x04, 0x7F, 0x00 };
    try std.testing.expectError(error.UnknownKind, decodeClient(&frame));
}

The first test is the happy path; the other two are the ones that matter for a program that faces the network. We prove that a lying length field and an unknown tag both fail loudly and safely, rather than hoping they do. This is episode 12's TDD applied to a protocol: the malicious inputs are as much a part of the spec as the valid ones, so they get tests too. Run zig test and the whole contract is verified before a single byte ever touches a real connection.

Reading a frame off a stream

The codec above works on a complete frame in memory, which keeps it pure and testable. The bridge to a live TCP socket (which we build properly in the next episode) is one small function whose only job is to turn the byte stream back into discrete frames -- read the two length bytes, then read exactly that many more:

fn readFrame(stream: anytype, buf: []u8) ![]u8 {
    var hdr: [2]u8 = undefined;
    try readExact(stream, &hdr);
    const total = std.mem.readInt(u16, &hdr, .big);
    if (total < 3 or total > buf.len) return error.BadFrameLength; // 2 len + 1 kind minimum
    buf[0] = hdr[0];
    buf[1] = hdr[1];
    try readExact(stream, buf[2..total]);
    return buf[0..total];
}

fn readExact(stream: anytype, out: []u8) !void {
    var n: usize = 0;
    while (n < out.len) {
        const got = try stream.read(out[n..]);
        if (got == 0) return error.UnexpectedEof; // peer closed mid-frame
        n += got;
    }
}

That readExact is the same loop we've written since the DNS work, and it's here for the same reason: a single read on a TCP socket can return fewer bytes than you asked for, so any code that assumes one read equals one message is quietly broken. The total > buf.len guard means a client can't announce a frame bigger than the buffer we're willing to give it -- an attacker's oversized-length trick meets a flat refusal, not an allocation storm. Because readFrame takes anytype, the exact same function drives a real std.net.Stream in the server and a FixedBufferStream in a test (the shim trick from episode 94), so even the stream-reading layer gets tested without a socket.

Performance and design considerations

A chat protocol is not a high-frequency trading feed, so raw throughput is almost never the constraint -- a human types a few messages a minute, and even a thousand users is a trickle by network standards. The costs that do matter are different ones. Allocation: our decoder returns slices into the received buffer, so parsing an incoming message allocates nothing at all; only when the server chooses to retain a message (for history, a future episode) does it copy. Framing overhead: three bytes per message (two length, one kind) plus two per string is negligible against the text payload, and far leaner than wrapping every message in JSON. Head-of-line safety: the length prefix means one slow or malformed client can be handled and disconnected without desyncing the parser, because we always know precisely where the next frame begins.

The design choice I'd defend hardest is the u16 length cap. It looks like a limitation, but it's a security boundary: it bounds, at the protocol level, how much memory a single frame can ever force us to touch. Episode 34's lesson was to profile before optimising a CPU; the network equivalent is to bound your inputs before an attacker does it for you. A protocol without a maximum message size is a denial-of-service bug wearing a friendly face.

How this compares to C, Rust, and Go

In C, this exact design is a well-trodden path -- a struct with a uint16_t length, a switch on a kind byte, memcpy into buffers. It works, and fast. What C cannot give you is the exhaustive switch: add a message type and forget one case, and the compiler shrugs. The bounds check in take is also something you'd have to write, and remember to write, at every single field access -- and the CVE history of C parsers is largely the history of the one place someone forgot.

In Go, you'd likely reach for encoding/gob or protobuf and let a code generator produce the marshalling. Less code, and goroutines make "one reader per connection" pleasant. The trade is that the wire format becomes something the library owns rather than something you can see byte-for-byte -- fine for most work, contrary to the whole point of this project.

In Rust, enum plus serde (with a compact format like bincode) gives you almost exactly our tagged-union-over-the-wire design with derive macros doing the encoding. Rust's exhaustiveness matches Zig's, and the borrow checker enforces the same "slices point into the buffer, don't outlive it" discipline that we're holding by hand and by convention. It's the closest cousin -- the main difference is that Zig makes the framing so explicit that you learn it, whereas serde is designed to let you not think about it.

Zig lands, as it keeps landing in this arc, in the sweet spot for understanding: a couple hundred readable lines, every byte and every length check ours, the compiler enforcing that we handle each message variant, and a codec so pure we tested the entire protocol -- happy path and hostile path both -- without opening a socket. You now hold the complete contract the rest of the project speaks.

Where this is heading

Step back at what we built without writing a single line of socket code. We enumerated the full set of messages two sides of a chat exchange, modelled them as tagged unions so the compiler polices every case, chose a length-prefixed binary framing that survives the realities of TCP, and wrote an encoder and a bounds-checked decoder that we verified with round-trip and adversarial tests. That is the contract. Everything from here plugs into it.

With the protocol nailed down, the next stretch of this mini-project builds the machinery that speaks it: the piece that accepts many connections at once and relays each client's words to all the others, then the piece a human actually sits in front of, and finally the memory that lets a room remember what was said before you walked in. Each of those leans on tools we already own -- the accept loop from episode 51, the threads and atomics from episodes 30, the hash maps from episode 22 -- and each of them is easy precisely because we spent this episode getting the messages right first.

The thread running through the whole networking arc hasn't budged since episode 21: a protocol is a contract written in bytes. Name your endianness, bound every length, and let tagged unions carry the shapes so the compiler catches the case you'd otherwise forget. We've now written that contract for a program you'll genuinely want to run. Next time, we make it come alive.

Bedankt en tot de volgende keer!

scipioHive account@scipio

Leave Learn Zig Series (#95) - Mini Project: Chat Server - Protocol Design to:

Written by

Does it matter who's right, or who's left?

Read more #stem posts


Best Posts From scipio

We have not curated any of scipio's posts yet. But you can encourage our curation team to review posts by visiting them regularly and by referring other readers. Because we give priority to frequently read content.

More Posts From scipio