irpas技术客

Flink cdc使用及参数设置_HD0do

irpas 3028

Flink Sql通过CDC监听mysql

create table order_source_ms(id BIGINT,deal_amt DOUBLE,shop_id STRING,customer_id String,city_id bigint,product_count double, order_at timestamp(3),last_updated_at timestamp(3),pay_at timestamp,refund_at timestamp, tenant_id STRING,order_category STRING, h as hour(last_updated_at), pay_hour as hour(pay_at), refund_hour as hour(refund_at), m as MINUTE(last_updated_at), dt as to_DATE(cast(last_updated_at as string)), pay_dt as to_DATE(cast(pay_at as string)), refund_dt as to_DATE(cast(refund_at as string)), PRIMARY KEY(id) NOT ENFORCED) with( 'connector' ='mysql-cdc', 'hostname' ='ip', 'port'='3306', 'username' = 'username', 'password' = 'password', 'database-name'='databasename', 'scan.startup.mode'='latest-offset', 'debezium.skipped.operations'='d', 'table-name'='tablename')

可以通过SQLclient的方式执行上面的SQL语句,就建立了和mysql对应的表的连接。当然前提都是需要将需要的jar包?flink-sql-connector-mysql-cdc-2.2-SNAPSHOT.jar依赖放到flink的lib目录下面

在案例中建表时有使用到flinkSQL相关的时间函数,获取每条数据的小时,分钟,日期数据的截取

h as hour(last_updated_at), pay_hour as hour(pay_at), refund_hour as hour(refund_at), m as MINUTE(last_updated_at), dt as to_DATE(cast(last_updated_at as string)), pay_dt as to_DATE(cast(pay_at as string)), refund_dt as to_DATE(cast(refund_at as string)),

flinksql的内置函数

参数解读:

????????对于一般的参数可以通过官网查看:

OptionRequiredDefaultTypeDescriptionconnectorrequired(none)StringSpecify what connector to use, here should be?'mysql-cdc'.hostnamerequired(none)StringIP address or hostname of the MySQL database server.usernamerequired(none)StringName of the MySQL database to use when connecting to the MySQL database server.passwordrequired(none)StringPassword to use when connecting to the MySQL database server.database-namerequired(none)StringDatabase name of the MySQL server to monitor. The database-name also supports regular expressions to monitor multiple tables matches the regular expression.table-namerequired(none)StringTable name of the MySQL database to monitor. The table-name also supports regular expressions to monitor multiple tables matches the regular expression.portoptional3306IntegerInteger port number of the MySQL database server.server-idoptional(none)IntegerA numeric ID or a numeric ID range of this database client, The numeric ID syntax is like '5400', the numeric ID range syntax is like '5400-5408', The numeric ID range syntax is recommended when 'scan.incremental.snapshot.enabled' enabled. Every ID must be unique across all currently-running database processes in the MySQL cluster. This connector joins the MySQL cluster as another server (with this unique ID) so it can read the binlog. By default, a random number is generated between 5400 and 6400, though we recommend setting an explicit value.scan.incremental.snapshot.enabledoptionaltrueBooleanIncremental snapshot is a new mechanism to read snapshot of a table. Compared to the old snapshot mechanism, the incremental snapshot has many advantages, including: (1) source can be parallel during snapshot reading, (2) source can perform checkpoints in the chunk granularity during snapshot reading, (3) source doesn't need to acquire global read lock (FLUSH TABLES WITH READ LOCK) before snapshot reading. If you would like the source run in parallel, each parallel reader should have an unique server id, so the 'server-id' must be a range like '5400-6400', and the range must be larger than the parallelism. Please see?Incremental Snapshot Readingsection for more detailed information.scan.incremental.snapshot.chunk.sizeoptional8096IntegerThe chunk size (number of rows) of table snapshot, captured tables are split into multiple chunks when read the snapshot of table.scan.snapshot.fetch.sizeoptional1024IntegerThe maximum fetch size for per poll when read table snapshot.scan.startup.modeoptionalinitialStringOptional startup mode for MySQL CDC consumer, valid enumerations are "initial" and "latest-offset". Please see?Startup Reading Positionsection for more detailed information.server-time-zoneoptionalUTCStringThe session time zone in database server, e.g. "Asia/Shanghai". It controls how the TIMESTAMP type in MYSQL converted to STRING. See more?here.debezium.min.row. count.to.stream.resultoptional1000IntegerDuring a snapshot operation, the connector will query each included table to produce a read event for all rows in that table. This parameter determines whether the MySQL connection will pull all results for a table into memory (which is fast but requires large amounts of memory), or whether the results will instead be streamed (can be slower, but will work for very large tables). The value specifies the minimum number of rows a table must contain before the connector will stream results, and defaults to 1,000. Set this parameter to '0' to skip all table size checks and always stream all results during a snapshot.connect.timeoutoptional30sDurationThe maximum time that the connector should wait after trying to connect to the MySQL database server before timing out.debezium.*optional(none)StringPass-through Debezium's properties to Debezium Embedded Engine which is used to capture data changes from MySQL server. For example:?'debezium.snapshot.mode' = 'never'. See more about the?Debezium's MySQL Connector properties

?要说明的是其中的一个参数设置

'debezium.skipped.operations'='d',

这个参数的配置是监听mysql的binlog时要跳过删除操作,这个参数是找了好久才发现的,因为业务需求需要对删除操作进行过滤,一直没有找到通过flinkSQL过滤的参数,最后发现:

官网直接提供的参数配置的最后一行,通过查看debezium所提供的参数来扩展,可以通过提供的连接去找到自己还需要的监听参数,我所找到的过滤删除操作的参数地址

代码操作连接监听mysqlCDC

import org.apache.flink.api.common.eventtime.WatermarkStrategy; import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema; import com.ververica.cdc.connectors.mysql.source.MySqlSource; public class MySqlSourceExample { public static void main(String[] args) throws Exception { MySqlSource<String> mySqlSource = MySqlSource.<String>builder() .hostname("yourHostname") .port(yourPort) .databaseList("yourDatabaseName") // set captured database .tableList("yourDatabaseName.yourTableName") // set captured table .username("yourUsername") .password("yourPassword") .deserializer(new JsonDebeziumDeserializationSchema()) // converts SourceRecord to JSON String .build(); StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // enable checkpoint env.enableCheckpointing(3000); env .fromSource(mySqlSource, WatermarkStrategy.noWatermarks(), "MySQL Source") // set 4 parallel source tasks .setParallelism(4) .print().setParallelism(1); // use parallelism 1 for sink to keep message ordering env.execute("Print MySQL Snapshot + Binlog"); } }

????????这里有一点需要补充的是,我在使用flinkSQL建表的时候对于金额类型的字段使用的是double数据类型和decimal类型对应,虽然建表的时候不会报错,但是在对金额进行运算的时候会出现精度损失,所以需要使用decimal类型来建表;?

? ? ? ? 更多有关FlinkSQL的使用可以参考:FlinkSQL详细系统的全面讲解及在企业生产的实践使用


1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,会注明原创字样,如未注明都非原创,如有侵权请联系删除!;3.作者投稿可能会经我们编辑修改或补充;4.本站不提供任何储存功能只提供收集或者投稿人的网盘链接。

标签: #Flink #cdc使用及参数设置 #SQL #CDC