Scaling Spring Boot on ECS with a Shared PostgreSQL Database: Patterns for Safe Concurrency

9 min read

Scaling Spring Boot on ECS with a Shared PostgreSQL Database: Patterns for Safe Concurrency

Multiple identical application containers talk to a single Postgres DB. Without coordination you risk:

  • conflicting concurrent updates (lost updates),
  • race conditions on sequence/unique values,
  • duplicate side-effects (sending twice, double-charges),
  • deadlocks or serialization failures when transactions overlap,
  • exhausting DB connections from many containers.

Goal: ensure correctness (no data corruption or duplication), good performance and predictable scaling.


High-level architecture (recommended)

  1. Primary RDS PostgreSQL (single writer). Consider read-replicas for reads.
  2. RDS Proxy (optional but recommended) — central connection pooling to protect DB connection limits.
  3. ECS Service (multiple tasks/containers) running identical Spring Boot images.
  4. ElastiCache (Redis) — used for distributed locks, leader election, cache, idempotency tokens.
  5. SQS (optional) — push work to a queue when you want single-worker processing or smooth backpressure.
  6. Flyway for DB migrations run as part of deploy (or separately via migration job).
  7. CloudWatch / Prometheus + alarms for metrics and deadlocks/locks.

ASCII diagram:

+-------------+    +-----------------+    +-------------------+
| ECS Tasks   |    |   RDS Proxy     |    |  Postgres (RDS)   |
| SpringBoot  | -> | (connection     | -> |  Primary (+Replicas)|
| (many)      |    |  pooling)       |    |                   |
+------+------+    +-----------------+    +-------------------+
       |
       v
 +------------+
 | ElastiCache|
 |   Redis    |
 +------------+

 Optional: SQS between ECS tasks for background jobs

Principles & patterns to avoid collisions

  1. Use transactions for all multi-step DB operations.
  2. Use DB constraints (unique constraints, FK) as source of truth — rely on them to reject bad data rather than trying to prevent every race in the app.
  3. Optimistic locking (@Version) for normal concurrent updates where conflicts are rare.
  4. Pessimistic locking (SELECT FOR UPDATE, @Lock(PESSIMISTIC_WRITE)) for hot rows that must be updated serially.
  5. Advisory locks (Postgres pg_try_advisory_lock) or Redis distributed locks for coarse-grained or cross-operation coordination.
  6. Idempotency & upserts (INSERT ... ON CONFLICT DO UPDATE) for safe retries and deduplication.
  7. Leader election (Redis / DynamoDB / etc.) to ensure exactly-one container runs a scheduled job.
  8. Connection pooling + RDS Proxy to avoid hitting connection limits.
  9. Retry policies for serialization failures (retry on SerializationFailure, OptimisticLockingFailureException).
  10. Use read-replicas for scaling reads; route writes to the primary.
  11. Limit transaction scope and keep transactions short (do not call external services while holding DB locks).

Concrete examples

Below are runnable code snippets you can drop into a Spring Boot app. I’ll show:

A. Entity with optimistic locking (JPA @Version)
B. Pessimistic locking with @Lock / SELECT FOR UPDATE
C. Advisory lock via JdbcTemplate (Postgres)
D. Distributed lock with Redisson (Redis) — brief (you choose Redis or advisory locks)
E. Idempotent upsert (Postgres ON CONFLICT)
F. Retry wrapper for optimistic lock failures


A — Optimistic locking (typical safe updates)

Entity:

import jakarta.persistence.*;
import java.time.Instant;

@Entity
@Table(name = "orders")
public class Order {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String status;

    private Instant updatedAt;

    @Version
    private Long version; // optimistic lock

    // getters/setters
}

Repository:

import org.springframework.data.jpa.repository.JpaRepository;

public interface OrderRepository extends JpaRepository<Order, Long> {}

Service usage with retry:

import org.springframework.dao.OptimisticLockingFailureException;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;

@Service
public class OrderService {
    private final OrderRepository repo;

    public OrderService(OrderRepository repo) { this.repo = repo; }

    @Transactional
    public void updateOrderStatusWithRetry(Long orderId, String newStatus) {
        int attempts = 0;
        while (true) {
            attempts++;
            try {
                Order o = repo.findById(orderId).orElseThrow();
                o.setStatus(newStatus);
                o.setUpdatedAt(Instant.now());
                repo.save(o); // will check @Version
                return;
            } catch (OptimisticLockingFailureException ex) {
                if (attempts >= 3) throw ex;
                // backoff
                try { Thread.sleep(50L * attempts); } catch (InterruptedException ignored) {}
            }
        }
    }
}

When to use: good when conflicts are expected to be rare. Low overhead.


B — Pessimistic locking (SELECT FOR UPDATE) for hot rows

Repository with lock:

import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.data.jpa.repository.Lock;
import org.springframework.data.jpa.repository.Query;
import jakarta.persistence.LockModeType;
import java.util.Optional;

public interface AccountRepository extends JpaRepository<Account, Long> {
    @Lock(LockModeType.PESSIMISTIC_WRITE)
    @Query("select a from Account a where a.id = :id")
    Optional<Account> findByIdForUpdate(Long id);
}

Service:

@Service
public class AccountService {
    private final AccountRepository repo;

    public AccountService(AccountRepository repo) { this.repo = repo; }

    @Transactional
    public void transfer(Long fromId, Long toId, BigDecimal amount) {
        Account from = repo.findByIdForUpdate(fromId).orElseThrow();
        Account to   = repo.findByIdForUpdate(toId).orElseThrow();

        if (from.getBalance().compareTo(amount) < 0) throw new IllegalStateException("insufficient");

        from.setBalance(from.getBalance().subtract(amount));
        to.setBalance(to.getBalance().add(amount));

        repo.save(from);
        repo.save(to);
    }
}

Notes: SELECT FOR UPDATE will lock rows for the duration of the transaction. Keep transaction short.


C — PostgreSQL Advisory Locks (DB-level distributed lock)

Use when you want to coordinate across containers without external Redis.

Utility:

import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Component;

import java.util.Optional;

@Component
public class PgAdvisoryLock {
    private final JdbcTemplate jdbc;

    public PgAdvisoryLock(JdbcTemplate jdbc) { this.jdbc = jdbc; }

    // Try acquire (non-blocking)
    public boolean tryLock(long key) {
        Boolean ok = jdbc.queryForObject("SELECT pg_try_advisory_lock(?)", Boolean.class, key);
        return Boolean.TRUE.equals(ok);
    }

    // Release
    public boolean unlock(long key) {
        Boolean ok = jdbc.queryForObject("SELECT pg_advisory_unlock(?)", Boolean.class, key);
        return Boolean.TRUE.equals(ok);
    }

    // Blocking acquire (be careful)
    public void lock(long key) {
        jdbc.execute("SELECT pg_advisory_lock(" + key + ")");
    }
}

Usage:

public void doOneAtATime(String resource) {
    long key = Math.abs(resource.hashCode()); // 32-bit; for safety you may map to bigint
    if (!pgAdvisoryLock.tryLock(key)) {
        throw new IllegalStateException("Could not obtain lock");
    }
    try {
        // do the work in transaction
    } finally {
        pgAdvisoryLock.unlock(key);
    }
}

Pros: no external components; locks tied to DB session (locks released if connection dies).
Cons: uses DB connection; ensure lock key collisions avoided and uses long space.


D — Redis distributed lock (Redisson) — for larger scale / speed

Use Redis when you already have ElastiCache and need fast distributed locks.

Brief example (pom: org.redisson:redisson-spring-boot-starter):

import org.redisson.api.RedissonClient;
import org.redisson.api.RLock;
import java.util.concurrent.TimeUnit;

@Service
public class RedisLockService {
    private final RedissonClient redisson;

    public RedisLockService(RedissonClient redisson) { this.redisson = redisson; }

    public void doWithLock(String lockName, Runnable task) {
        RLock lock = redisson.getLock(lockName);
        boolean acquired = false;
        try {
            acquired = lock.tryLock(500, 10_000, TimeUnit.MILLISECONDS); // wait 500ms, lease 10s
            if (!acquired) throw new IllegalStateException("lock not acquired");
            task.run();
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            throw new RuntimeException(e);
        } finally {
            if (acquired) lock.unlock();
        }
    }
}

Note: Use proven libs (Redisson) for safety (auto-renewal, etc.).


E — Idempotent upsert (INSERT … ON CONFLICT)

SQL approach (JdbcTemplate):

String sql = "INSERT INTO payments (idempotency_key, amount, status) " +
             "VALUES (?, ?, ?) " +
             "ON CONFLICT (idempotency_key) DO UPDATE SET amount = EXCLUDED.amount RETURNING id;";

Long id = jdbcTemplate.queryForObject(sql, Long.class, key, amount, status);

Pattern: generate client or server idempotency key for requests. Store it in a table (with unique constraint) so retries do not produce duplicates.


F — Retry wrapper for serialization errors

Postgres SERIALIZABLE or concurrent updates may throw exceptions. Wrap critical transactions in a retry:

import org.springframework.dao.CannotAcquireLockException;
import org.springframework.transaction.TransactionSystemException;

public  T withRetries(Supplier tx) {
    int attempts = 0;
    while (true) {
        attempts++;
        try {
            return tx.get();
        } catch (ObjectOptimisticLockingFailureException | PessimisticLockingFailureException |
                 CannotAcquireLockException | TransactionSystemException ex) {
            if (attempts >= 5) throw ex;
            try { Thread.sleep(50L * attempts); } catch (InterruptedException ignored) {}
        }
    }
}

DB & infra configuration recommendations

  • Connection pool (HikariCP): configure maximumPoolSize per container so total containers × pool size < DB connection limit. Example: RDS has default connection limits — use RDS Proxy to pool connections across containers and avoid connection storms.
  • RDS Proxy: reduces pressure on DB by pooling at the proxy layer.
  • Read replicas: offload read-heavy queries to replicas. Ensure eventual consistency acceptable for reads.
  • Migrations: run Flyway migrations from a dedicated job or ensure one container runs migrations (leader election or sequential deploy).
  • Monitoring: capture DB wait events, deadlocks, long transactions, connection counts.
  • Transaction isolation: Postgres default READ COMMITTED is usually fine. Use SERIALIZABLE only where strict correctness required; be prepared to handle serialization failures.
  • Schema design: use constraints (unique indexes), foreign keys, check constraints. Let DB enforce invariants.
  • Indexing: maintain indexes for expected query patterns to avoid full-table scans and long locks.
  • Short transactions: never hold locks while calling external HTTP services — persist intent, release lock, then call external services or move to outbox pattern.
  • Outbox pattern: persist events to an outbox table inside same transaction and have a separate process publish them to message queue (guarantees exactly-once semantics when combined with idempotent consumers).

Handling scheduled / singleton tasks in ECS

If you have scheduled jobs that must run on exactly one container (e.g., nightly batch):

Options:

  1. Leader election using Redis or DynamoDB — containers compete and only leader runs scheduled tasks. Use Redisson RLock or RLeaderElection.
  2. Use a separate ECS scheduled task (preferred) — configure an ECS scheduled task or AWS Batch / Lambda that runs as a single task on schedule. This decouples from scaling the app.
  3. Use SQS + single consumer service — push job messages onto queue and run a single worker.

Recommended: use AWS scheduler / ECS scheduled task or an external scheduler to avoid ad-hoc leader election complexity.


Common pitfalls & how to avoid them

  • Too many connections: each container with Hikari default pool * number of containers > DB max. Use RDS Proxy or tune pool sizes.
  • Holding locks too long: avoid long transactions and external calls within transactions.
  • Assuming DB sequence offsets are globally unique for business logic: prefer UUIDs or central sequence with careful concurrency handling.
  • Using optimistic locking everywhere: it’s fine when conflicts are rare; for hot counters use SELECT FOR UPDATE or atomic DB operations UPDATE ... RETURNING.
  • Missing idempotency keys: write idempotency into APIs to handle retries safely.
  • Migrations race: ensure migrations run once (CI/CD pre-deploy job or leader-only migration step).

Full small example (combined flow)

Scenario: We want to create a Payment and ensure no duplicate payments if multiple containers receive the same request.

Schema:

CREATE TABLE payments (
  id BIGSERIAL PRIMARY KEY,
  idempotency_key TEXT UNIQUE NOT NULL,
  amount NUMERIC NOT NULL,
  status VARCHAR(20) NOT NULL,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT now()
);

Service code (JdbcTemplate upsert):

@Service
public class PaymentService {
    private final JdbcTemplate jdbcTemplate;

    public PaymentService(JdbcTemplate jdbcTemplate) {
        this.jdbcTemplate = jdbcTemplate;
    }

    public long createPaymentIfAbsent(String idempotencyKey, BigDecimal amount) {
        String sql = "INSERT INTO payments (idempotency_key, amount, status) " +
                     "VALUES (?, ?, 'CREATED') " +
                     "ON CONFLICT (idempotency_key) DO UPDATE SET status = payments.status " +
                     "RETURNING id";

        return jdbcTemplate.queryForObject(sql, Long.class, idempotencyKey, amount);
    }
}

This single statement is atomic and prevents duplicates regardless of concurrent callers.

If post-insert processing requires non-DB actions (e.g., calling external payment gateway), use the outbox pattern:

  1. Within transaction insert payment plus outbox event row.
  2. Separate publisher reads outbox and sends messages (exactly-once semantics via idempotency).

Testing strategies

  • Integration tests: use Testcontainers with Postgres to reproduce concurrency scenarios. Simulate multiple threads/transactions and assert final state.
  • Chaos testing: kill containers or simulate connection drops to ensure advisory lock and lock-release behavior.
  • Load testing: verify connection pool sizing and RDS CPU/IO under realistic load.
  • Unit tests: mock repositories and test retry behavior for optimistic lock exceptions.

Example: testing optimistic locking with Testcontainers (sketch)

You can use Java multithreaded test that starts two transactions simultaneously and expects one to fail and retry.

(I’ll skip full test code here to keep this message focused — I can provide a Testcontainers test on demand.)


Operational checklist for deployment (ECS-specific)

  • Use environment variables or Secrets Manager for DB credentials.
  • Deploy RDS Proxy and point containers to it.
  • Configure HikariCP pool size based on RDS max connections and expected number of containers:
    maxPoolSize <= floor((rds_max_connections - reserved)/num_containers).
  • Use CloudWatch alarms for DB connection count, locks, deadlocks, high query duration.
  • Run Flyway migrations from CI/CD pipeline or a single migration task.
  • Put scheduled singleton jobs into an ECS scheduled task (preferred) or use leader election.
  • Use ElastiCache Redis for distributed locking if you need fast non-blocking locks at scale.

Short decision guide: which locking approach to pick?

  • Rare conflicts → Optimistic locking (@Version) + retries.
  • Hot rows / counters / wallet balance → Pessimistic locking (SELECT FOR UPDATE) within short transactions OR use atomic SQL updates (UPDATE ... WHERE ... RETURNING).
  • Cross-entity operations spanning many rows → Advisory lock (coarse-grained) or queue the operation to a single worker.
  • Singleton scheduled tasks → ECS scheduled task or leader election via Redis/DynamoDB.
  • High-scale distributed locks → Redis (Redisson) with proper safety settings (watch timeouts, renewal).

Final notes

  • Use the DB as the authoritative source, backed by constraints and transactions.
  • Use optimistic locking for normal updates, pessimistic locks for hot items.
  • For cross-instance coordination use Postgres advisory locks or Redis locks (both are valid; choose based on your infra).
  • Prevent connection storms with RDS Proxy and tune connection pools.
  • Make operations idempotent and use ON CONFLICT for safe dedup/upsert.
  • Prefer running scheduled/one-off jobs as dedicated ECS tasks rather than leader-elected code inside app containers.

🤞 Never miss a story from us, get weekly updates to your inbox!

Leave a Reply

Your email address will not be published. Required fields are marked *