CDC replication

๐Ÿ“˜

CDC replication only for Bulk Syncs

CDC replication from MySQL is only available for Bulk Syncs.

When bulk-syncing data from MySQL into your data warehouse, it's preferred (though not required) for Polytomic to utilize CDC (change data capture) replication. This will avoid Polytomic running full table scans to calculate changes since the last sync. Rather, Polytomic will be able to capture changes in real-time without scanning your tables.

Requirements

To enable this, the following settings need to be enabled for your MySQL database:

  1. The Polytomic MySQL user needs to be configured with replication privileges. This can be done with the following query (replace <username> with your MySQL Polytomic user):
  2. GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO <username>@'%';
    

For example, if your Polytomic MySQL user is polytomic then your query would be:

GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO polytomic@'%';
  1. Set the following on your database:
  2. binlog_format: ROW
    binlog_row_image: full
    binlog_row_metadata: FULL
    slave_parallel_type: LOGICAL_CLOCK
    

The exact way to set these will depend on your MySQL hosting platform. If you're hosting MySQL yourself then you'll need to edit your my.cnf file, whereas if you're on AWS RDS then you'll have to edit your parameter group as shown in this screenshot:

  1. Set a log retention period of at least 1 day. We recommend 7 days.
  2. Be sure to check the Use replication for bulk syncs box in your Polytomic MySQL connection configuration:

  1. Click Save.