Scaling WebSocket Connections: SignalR in Enterprise Systems

The Problem: Real-Time at Scale

When I joined the FLEDEM project, the fleet management platform needed to stream real-time CAN bus telemetry from thousands of vehicles to dashboard clients. The requirements were demanding:

Thousands of concurrent vehicle connections each sending telemetry data
Sub-100ms latency for critical event detection
Automatic reconnection for vehicles with intermittent connectivity
Graceful degradation during high load or network issues

Why SignalR?

Given that FLEDEM's backend is built on ASP.NET Core 8, SignalR was the natural choice for real-time communication. SignalR provides:

Automatic fallback from WebSockets to Server-Sent Events to long polling
Built-in reconnection logic
Scale-out support with Azure SignalR Service or Redis backplane
Typed hubs with strong typing in TypeScript clients

Hub Architecture

I designed a hub-per-domain architecture to separate concerns:

1. TelemetryHub - Vehicle Data Streaming

// TelemetryHub.cs
public class TelemetryHub : Hub
{
    private readonly ITelemetryService _telemetryService;
    private readonly IFleetAccessService _fleetAccess;

    public async Task SubscribeToVehicle(Guid vehicleGuid)
    {
        // Verify user has access to this vehicle
        var hasAccess = await _fleetAccess
            .UserHasVehicleAccess(Context.UserIdentifier, vehicleGuid);
        
        if (!hasAccess) 
            throw new HubException("Access denied");

        // Add to vehicle-specific group
        await Groups.AddToGroupAsync(Context.ConnectionId, 
            $"vehicle_{vehicleGuid}");
            
        // Send last known state immediately
        var lastState = await _telemetryService
            .GetLastKnownState(vehicleGuid);
        await Clients.Caller.SendAsync("InitialState", lastState);
    }

    public async Task UnsubscribeFromVehicle(Guid vehicleGuid)
    {
        await Groups.RemoveFromGroupAsync(Context.ConnectionId, 
            $"vehicle_{vehicleGuid}");
    }
}

2. Background Service for Data Broadcasting

Instead of having vehicles push directly to hubs (which would create connection bottlenecks), I implemented a background service that:

Receives vehicle telemetry via API endpoints
Processes and validates the data
Broadcasts to subscribed clients via SignalR groups

// VehicleTelemetryBackgroundService.cs
public class VehicleTelemetryBackgroundService : BackgroundService
{
    private readonly IHubContext<TelemetryHub> _hubContext;
    private readonly BlockingCollection<TelemetryMessage> _queue;

    protected override async Task ExecuteAsync(
        CancellationToken stoppingToken)
    {
        await foreach (var batch in _queue
            .GetConsumingEnumerable(stoppingToken)
            .Buffer(TimeSpan.FromMilliseconds(50), 100))
        {
            // Group by vehicle for efficient broadcasting
            var grouped = batch.GroupBy(m => m.VehicleGuid);
            
            var tasks = grouped.Select(async group => 
            {
                var vehicleGuid = group.Key;
                var latest = group
                    .OrderByDescending(m => m.Timestamp)
                    .First();
                    
                await _hubContext.Clients
                    .Group($"vehicle_{vehicleGuid}")
                    .SendAsync("TelemetryUpdate", latest);
            });
            
            await Task.WhenAll(tasks);
        }
    }
}

Client-Side Architecture (React + TypeScript)

On the frontend, I created a custom hook for managing SignalR connections:

useSignalR Hook

// hooks/useSignalR.ts
export function useSignalR() {
  const [connection, setConnection] = 
    useState<HubConnection | null>(null);
  const [isConnected, setIsConnected] = useState(false);

  useEffect(() => {
    const newConnection = new HubConnectionBuilder()
      .withUrl("/hubs/telemetry", {
        accessTokenFactory: () => getAccessToken(),
      })
      .withAutomaticReconnect({
        nextRetryDelayInMilliseconds: (retryContext) => {
          // Exponential backoff: 0s, 2s, 10s, 30s, 60s
          const delays = [0, 2000, 10000, 30000, 60000];
          return delays[Math.min(retryContext.previousRetryCount, 4)];
        },
      })
      .configureLogging(LogLevel.Warning)
      .build();

    newConnection.onreconnecting(() => {
      console.log("SignalR reconnecting...");
      setIsConnected(false);
    });

    newConnection.onreconnected(() => {
      console.log("SignalR reconnected!");
      setIsConnected(true);
      // Re-subscribe to previous groups
      resubscribeToGroups(newConnection);
    });

    newConnection.onclose(() => {
      setIsConnected(false);
    });

    setConnection(newConnection);

    return () => {
      newConnection.stop();
    };
  }, []);

  return { connection, isConnected };
}

Vehicle Telemetry Component

// components/VehicleTelemetry.tsx
export function VehicleTelemetry({ vehicleGuid }: Props) {
  const { connection, isConnected } = useSignalR();
  const [telemetry, setTelemetry] = 
    useState<TelemetryData | null>(null);

  useEffect(() => {
    if (!connection || !isConnected) return;

    const handleUpdate = (data: TelemetryData) => {
      setTelemetry(data);
    };

    connection.on("TelemetryUpdate", handleUpdate);
    connection.on("InitialState", handleUpdate);

    // Subscribe to vehicle
    connection.invoke("SubscribeToVehicle", vehicleGuid);

    return () => {
      connection.off("TelemetryUpdate", handleUpdate);
      connection.off("InitialState", handleUpdate);
      connection.invoke("UnsubscribeFromVehicle", vehicleGuid);
    };
  }, [connection, isConnected, vehicleGuid]);

  if (!telemetry) return <div>Loading...</div>;

  return (
    <div>
      <h3>Speed: {telemetry.speed} km/h</h3>
      <h3>RPM: {telemetry.rpm}</h3>
      {/* ... more telemetry data ... */}
    </div>
  );
}

Performance Optimizations

1. Message Batching

Instead of sending every telemetry update immediately, I implemented batching:

50ms window - Collect updates for 50ms before broadcasting
Max 100 messages per batch - Prevents memory buildup
Latest value wins - Only send the most recent value per signal

This reduced outgoing message volume by 80% while maintaining perceived real-time performance.

2. Connection Pooling

SignalR connections are expensive. I optimized by:

Reusing connections across components via React Context
Implementing connection keepalive pings every 30 seconds
Gracefully closing idle connections after 5 minutes

3. Selective Subscriptions

Instead of streaming all telemetry channels, clients subscribe only to the signals they need:

await connection.invoke("SubscribeToChannels", vehicleGuid, [
  "Vehicle_Speed",
  "Engine_RPM",
  "Battery_Voltage"
]);

This reduced bandwidth by 60% for typical dashboard views.

Handling Reconnection Gracefully

Network interruptions are common in fleet management. I implemented several strategies:

1. Automatic Resubscription

// Store subscriptions in React Context
const subscriptionsRef = useRef<Set<string>>(new Set());

connection.onreconnected(async () => {
  // Resubscribe to all previous groups
  for (const vehicleGuid of subscriptionsRef.current) {
    await connection.invoke("SubscribeToVehicle", vehicleGuid);
  }
});

2. Backfilling Missed Data

When reconnecting, request data missed during disconnection:

connection.onreconnected(async (connectionId) => {
  const missedData = await connection.invoke(
    "GetDataSince", 
    vehicleGuid, 
    lastReceivedTimestamp
  );
  setTelemetry((prev) => [...prev, ...missedData]);
});

Monitoring & Observability

Real-time systems need comprehensive monitoring:

Metrics I Track

Connection count - Active WebSocket connections
Message throughput - Messages per second
Latency - Time from telemetry API call to client receipt
Reconnection rate - How often clients reconnect
Error rate - Failed message deliveries

Custom Middleware

public class SignalRMetricsMiddleware
{
    public async Task OnConnectedAsync(HubLifetimeContext context, 
        Func<HubLifetimeContext, Task> next)
    {
        _metrics.IncrementConnections();
        _logger.LogInformation(
            "SignalR connection established: {ConnectionId}", 
            context.Context.ConnectionId
        );
        
        await next(context);
    }

    public async Task OnDisconnectedAsync(
        HubLifetimeContext context, 
        Exception exception, 
        Func<HubLifetimeContext, Exception, Task> next)
    {
        _metrics.DecrementConnections();
        _logger.LogInformation(
            "SignalR connection closed: {ConnectionId}", 
            context.Context.ConnectionId
        );
        
        await next(context, exception);
    }
}

Lessons Learned

1. Don't Stream Everything

Early versions streamed all telemetry channels to all clients. This was wasteful. Let clients subscribe to only what they need.

2. Batch Aggressively

Every WebSocket frame has overhead. Batching 10-100 messages together dramatically reduces network load without impacting perceived latency.

3. Handle Reconnection Gracefully

Network interruptions will happen. Build reconnection into the architecture from day one, and always resubscribe to groups after reconnecting.

4. Monitor Everything

Real-time systems fail silently. Comprehensive metrics and logging are essential for debugging production issues.

5. Test with Load

I used SignalR's test server to simulate 5,000+ concurrent connections locally. Load testing revealed bottlenecks before they hit production.

Results

The optimized SignalR architecture for FLEDEM achieved:

Sub-100ms latency - From API ingestion to client display
3,000+ concurrent connections - On a single ASP.NET Core server
99.9% uptime - Automatic reconnection handles network blips
80% bandwidth reduction - Through batching and selective subscriptions

Conclusion

Building real-time systems at scale requires careful architecture. SignalR provides excellent primitives, but you need to:

Batch messages to reduce overhead
Use groups for efficient broadcasting
Implement robust reconnection logic
Monitor connection health and message throughput
Let clients subscribe selectively

The result is a production-ready real-time system that scales to thousands of connections while maintaining millisecond-level latency.

Building a real-time system with WebSockets or SignalR?

I'd love to discuss architecture patterns and performance optimization strategies.

Get in touch →