irpas技术客

Apache SeaTunnel 2.1.0部署及踩坑_若小鱼

网络 7673

简介

SeaTunnel 原名Waterdrop,自2021年10月12日改名为SeaTunnel。 SeaTunnel是一个非常易于使用的超高性能分布式数据集成平台,支持海量数据的实时同步。它每天可以稳定高效地同步数百亿数据,已在近100家公司的生产中使用。

特点 易于使用,配置灵活,低代码开发实时流媒体离线多源数据分析高性能、海量数据处理能力模块化和插入式机构,易于扩展支持通过SQL进行数据处理和聚合支持Spark结构化流媒体支持Spark 2.x 这里我们踩了一个坑,因为我们测试的spark环境已经升级到了3.x版本,而目前SeaTunnel只支持2.x,所以要重新部署一个2.x的spark - 工作流程

安装 安装文档

https://seatunnel.incubator.apache.org/docs/2.1.0/spark/installation

环境准备:安装jdk和sparkconfig/seatunnel-env.sh下载安装包https://www.apache.org/dyn/closer.lua/incubator/seatunnel/2.1.0/apache-seatunnel-incubating-2.1.0-bin.tar.gz解压后编辑 config/seatunnel-env.sh指定必要的环境配置,例如SPARK_HOME(SPARK下载和解压缩后的目录) 1、测试jdbc-to-jdbc 创建新的 config/spark.batch.jdbc.to.jdbc.conf 文件 env { # seatunnel defined streaming batch duration in seconds spark.app.name = "SeaTunnel" spark.executor.instances = 1 spark.executor.cores = 1 spark.executor.memory = "1g" } source { jdbc { driver = "com.mysql.jdbc.Driver" url = "jdbc:mysql://0.0.0.0:3306/database?useUnicode=true&characterEncoding=utf8&useSSL=false" table = "table_name" result_table_name = "result_table_name" user = "root" password = "password" } } transform { # split data by specific delimiter # you can also use other filter plugins, such as sql # sql { # sql = "select * from accesslog where request_time > 1000" # } # If you would like to get more information about how to configure seatunnel and see full list of filter plugins, # please go to https://seatunnel.apache.org/docs/spark/configuration/transform-plugins/Sql } sink { # choose stdout output plugin to output data to console # Console {} jdbc { # 这里配置driver参数,否则数据交换不成功 driver = "com.mysql.jdbc.Driver", saveMode = "update", url = "jdbc:mysql://ip:3306/database?useUnicode=true&characterEncoding=utf8&useSSL=false", user = "userName", password = "***********", dbTable = "tableName", customUpdateStmt = "INSERT INTO table (column1, column2, created, modified, yn) values(?, ?, now(), now(), 1) ON DUPLICATE KEY UPDATE column1 = IFNULL(VALUES (column1), column1), column2 = IFNULL(VALUES (column2), column2)" } } 启动命令 ./bin/start-seatunnel-spark.sh --master 'yarn' --deploy-mode client --config ./config/spark.batch.jdbc.to.jdbc.conf 踩坑:之前运行时报[driver] as non-empty ,定位发现sink配置里需要设置driver参数 ERROR Seatunnel:121 - Plugin[org.apache.seatunnel.spark.sink.Jdbc] contains invalid config, error: please specify [driver] as non-empty

2、测试jdbc-to-hive 创建新的 config/spark.batch.jdbc.to.hive.conf 文件 env { # seatunnel defined streaming batch duration in seconds spark.app.name = "SeaTunnel" spark.executor.instances = 1 spark.executor.cores = 1 spark.executor.memory = "1g" # 因为sink用到hive源,所以必须进行以下配置 spark.sql.catalogImplementation = "hive" } source { jdbc { driver = "com.mysql.jdbc.Driver" url = "jdbc:mysql://0.0.0.0:3306/database?useUnicode=true&characterEncoding=utf8&useSSL=false" table = "table_name" result_table_name = "result_table_name" user = "root" password = "password" } } transform { # split data by specific delimiter # you can also use other filter plugins, such as sql # sql { # sql = "select * from accesslog where request_time > 1000" # } # If you would like to get more information about how to configure seatunnel and see full list of filter plugins, # please go to https://seatunnel.apache.org/docs/spark/configuration/transform-plugins/Sql } sink { Hive { sql = "insert overwrite table seatunnel.test1 partition(province) select name,age,province from result_table_name" } } 运行命令 ./bin/start-seatunnel-spark.sh --master 'yarn' --deploy-mode client --config ./config/spark.batch.jdbc.to.jdbc.conf 踩坑:一开始运行时报错,定位发现conf文件里没有设置spark.sql.catalogImplementation = “hive”。 报错内容: ...... ERROR Seatunnel:191 - Exception StackTrace:org.apache.spark.sql.AnalysisException: Table or view not found: ...... Caused by: org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'seatunnel' not found; ......

这个不得不再吐槽一下官方文档,官方文档的sink-plugins没有提,而是在source-plugins


1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,会注明原创字样,如未注明都非原创,如有侵权请联系删除!;3.作者投稿可能会经我们编辑修改或补充;4.本站不提供任何储存功能只提供收集或者投稿人的网盘链接。

标签: #apache #SeaTunnel #210部署及踩坑 #简介SeaTunnel