When scaling matters

Most Discord bots never need to worry about scaling. A bot serving one guild or a handful of communities runs comfortably on minimal resources. Scaling becomes relevant when your bot joins enough guilds that resource consumption, API rate limits, or Discord's gateway connection limits start affecting performance.

Discord requires bots in more than 2,500 guilds to use sharding. This is not optional. Beyond that threshold, Discord will not let your bot connect with a single gateway connection. Planning for this before you hit the limit saves emergency refactoring later.

Understanding Discord's limits

Limit Threshold What happens
Gateway connection 2,500 guilds per shard Must implement sharding
Global rate limit 50 requests per second Requests queued or rejected
Identify limit 1 identify per 5 seconds Shard startup must be staggered
Message content intent 100+ guilds (unverified) Must apply for privileged intent
Guild member cache Scales with guild count Memory usage increases linearly

Resource planning

Before adding infrastructure, understand what your bot actually consumes. Resource usage depends on your bot's features, caching strategy, and the number of events it processes.

Memory estimation

A rough guide for memory planning:

Guild count Estimated RAM (Node.js) Estimated RAM (Python) Hosting recommendation
1–50 50–100 MB 60–120 MB Free hosting (MonkeyBytes)
50–500 100–300 MB 120–400 MB Free hosting (MonkeyBytes)
500–2,500 300–800 MB 400 MB–1 GB Free hosting or budget VPS
2,500–10,000 800 MB–2 GB 1–3 GB VPS ($5–$10/mo)
10,000+ 2–8 GB+ 3–10 GB+ Dedicated server or multi-VPS

These are estimates based on typical bot configurations with standard caching. Bots that cache member lists or message histories use significantly more memory. Bots with minimal caching use less.

MonkeyBytes provides 1 GB of dedicated RAM per bot instance, which comfortably handles bots up to roughly 500–2,500 guilds depending on your caching configuration. For cost comparisons at each tier, see our hosting cost breakdown.

Sharding

Sharding splits your bot's guild list across multiple gateway connections. Each shard handles a subset of guilds independently. Discord assigns guilds to shards using a simple formula: shard_id = (guild_id >> 22) % num_shards.

Internal sharding

The simplest approach runs multiple shards within a single process. Both discord.js and discord.py support this natively.

discord.js ShardingManager:

// shard.js (launcher file)
const { ShardingManager } = require('discord.js');

const manager = new ShardingManager('./bot.js', {
    token: process.env.DISCORD_TOKEN,
    totalShards: 'auto' // Discord determines shard count
});

manager.on('shardCreate', shard => {
    console.log(`Launched shard ${shard.id}`);
});

manager.spawn();

discord.py AutoShardedBot:

import discord
import os

bot = discord.AutoShardedBot(
    intents=discord.Intents.default(),
    shard_count=None  # Auto-determined
)

@bot.event
async def on_ready():
    print(f'Ready with {bot.shard_count} shards')

bot.run(os.getenv('DISCORD_TOKEN'))

External sharding

For large bots (10,000+ guilds), external sharding runs each shard as a separate process or on separate servers. This allows horizontal scaling across multiple machines. External sharding is more complex to set up but provides better isolation and resource distribution.

Optimisation before scaling

Before adding more hardware or shards, optimise what you have. Many bots scale prematurely when the real problem is inefficient code.

Cache management

The biggest memory consumer in most Discord bots is the guild and member cache. By default, Discord libraries cache everything. Reduce memory usage by only caching what you need:

// discord.js - selective caching
const client = new Client({
    intents: [GatewayIntentBits.Guilds],
    makeCache: Options.cacheWithLimits({
        MessageManager: 50,      // Cache last 50 messages per channel
        GuildMemberManager: 200, // Cache 200 members per guild
        PresenceManager: 0,      // Don't cache presence data
        ReactionManager: 0,      // Don't cache reactions
    })
});

Database optimisation

If your bot queries a database on every command, slow queries become bottlenecks under load. Index your most-queried columns, use connection pooling, and cache frequently accessed data in memory with a TTL (time to live) to avoid stale data.

Rate limit handling

Discord.js and discord.py both handle rate limits automatically, but wasteful API calls still slow your bot. Batch operations where possible, use bulk endpoints for mass actions, and avoid making API calls in tight loops.

For more optimisation techniques, read our performance optimisation guide.

Infrastructure scaling path

  1. Start free: Deploy on MonkeyBytes with 1 GB RAM. Suitable for most bots up to 500+ guilds.
  2. Optimise code: Reduce caching, fix memory leaks, optimise database queries. This often doubles your capacity without changing hosting.
  3. Add sharding: When approaching 2,500 guilds, implement internal sharding. This runs on the same hosting instance.
  4. Upgrade hosting: When internal sharding maxes out your RAM, move to a VPS with more resources.
  5. External sharding: For very large bots, distribute shards across multiple servers with a coordinator process.

When to upgrade

Upgrade your hosting when you observe these signals:

  • RAM usage consistently above 80% of available memory
  • Increasing frequency of out-of-memory crashes
  • Noticeable delay in command responses under normal load
  • Discord forcing you to add shards (>2,500 guilds)
  • Database queries taking more than 100ms consistently

Do not upgrade preemptively. Monitor your actual resource usage with the tools described in our uptime monitoring guide and upgrade based on data, not assumptions.

For hosting options at each scale, compare costs in our hosting cost breakdown or evaluate VPS options in our VPS vs free hosting comparison.

Guide Performance Optimisation Comparison VPS vs Free Hosting Comparison Hosting Cost Breakdown Guide Uptime Monitoring