SocketChat is a production-grade, distributed real-time chat system designed as a technical case study in System Design. It demonstrates how to build a scalable messaging platform that handles persistent connections, synchronized state, and high availability.
The project implements a distributed architecture to allow horizontal scaling across multiple backend instances.
graph TD
Client1[Client A] <--> Nginx[Nginx Load Balancer]
Client2[Client B] <--> Nginx
Nginx -- Sticky Session (IP Hash) --> Server1[Chat Server 1]
Nginx -- Sticky Session (IP Hash) --> Server2[Chat Server 2]
Server1 <--> Redis[(Redis Pub/Sub & Presence)]
Server2 <--> Redis
Server1 --> DB[(PostgreSQL)]
Server2 --> DB
- Socket.IO: Used for low-latency, bi-directional communication.
- Sticky Sessions: Implemented via Nginx
ip_hashto ensure the HTTP handshake and WebSocket upgrade happen on the same physical server. - Redis Adapter: Bridges multiple backend instances. A message sent to Server A is published to Redis and broadcasted by Server B to its connected clients.
- Authentication & Security
- C-S-A Pattern: (Cookie-Session-Auth). JWTs are stored in
httpOnlycookies to mitigate XSS attacks. - Handshake Authentication: The WebSocket connection is authenticated during the initial HTTP upgrade by parsing the cookie header.
- Transport Strategy: Optimized for
websockettransport to ensure immediate protocol upgrade and avoid common sticky-session issues in distributed environments.
- Write-Through Persistence: Messages are persisted to PostgreSQL before being broadcasted to ensure durability.
- Idempotency: Every message carries a
client_message_id(UUID). The backend usesINSERT ... ON CONFLICT DO NOTHINGto prevent duplicate messages during network retries. - Atomic Room Switching: Implements a "Clean Exit" strategy where clients explicitly leave previous rooms before joining new ones, ensuring the
io.to(room)fan-out is deterministic and synchronized.
- Atomic Counters: Uses Redis
HINCRBYto track the number of active socket connections per user across the entire cluster. - Multi-Tab Awareness: Status only changes to
offlinewhen the global connection count for a user reaches zero, preventing "status flickering" during tab refreshes.
- Frontend: Next.js 15, Tailwind CSS, Lucide React, Socket.io-client.
- Backend: Node.js, Express, TypeScript, Socket.io.
- Infrastructure:
- PostgreSQL: Persistent message and user storage.
- Redis: Distributed coordination, Pub/Sub, and presence state.
- Nginx: Reverse proxy and Load Balancer.
- Client: Generates
client_message_idand emitsmessage.send. - Server: Validates JWT session from handshake.
- Database: Attempts idempotent insert.
- Redis: Publishes message to the cluster.
- Cluster: All servers emit
message.newto the relevant room.
- Connect:
HINCRBY presence:user_id 1. If result is1, emituser.status: online. - Disconnect:
HINCRBY presence:user_id -1. If result is0, emituser.status: offline.
- Redis Pub/Sub vs. Kafka: Redis was chosen for its sub-millisecond latency for real-time fan-out, whereas Kafka would be preferred for long-term event sourcing and massive message replayability.
- IP Hash vs. Global Session:
ip_hashprovides a simpler implementation for Socket.IO handshake affinity without needing a complex global session store for the initial upgrade.
- Sharding: Partitioning the
messagestable bychannel_idto handle Billions of rows. - Global Presence: Replacing the simple Redis Hash with Redis Sorted Sets for "Last Seen" timestamps.
- Offline Queuing: Utilizing a task queue (like BullMQ) to handle push notifications when a recipient's connection count is zero.
- Infrastructure:
docker-compose up -d(Starts Postgres & Redis) - Backend Instances:
cd backend && npm run dev:1(Port 4000)cd backend && npm run dev:2(Port 4001) - Frontend:
cd frontend && npm run dev(Port 3000) - Nginx:
Point Nginx to the provided
nginx.confand visitlocalhost.