clickhouse 字符串函数_Lara1111_clickhouse字符串函数

未知 02-07 6488

1. 字符串搜索?

从字符串中查找子字符串的位置，默认以字节为单位和大小写敏感，

使用衍生的*CaseInsensitive和*UTF8函数进行不区分大小写和UTF8编码的字符串搜索。

在搜索时同时使用忽略大小写和使用UTF8编码的规则，则使用衍生的*CaseInsensitiveUTF8函数。

位置索引从1开始。

?（1）、singe search 函数描述示例position(haystack, needle) locate(haystack, needle)以字节为单位，在字符串haystack中搜索子字符串needle。示例：SELECT position('Hello, world!', '!'); 返回：13positionCaseInsensitive字符串搜索不区分大小写。示例：SELECT?positionCaseInsensitive('Hello, world!', 'L'); 返回：3positionUTF8以UTF8编码单位，搜索字符串。示例：SELECT positionUTF8('中国北京', '北'); 返回：3positionCaseInsensitiveUTF8以UTF8编码单位，不缺分大小写，搜索字符串。

（2）、multiSearch

函数描述示例multiSearchAllPositions语法： multiSearchAllPositions(haystack, [needle1, needle2, ..., needlen])。搜索子字符串数组中的元素在字符串中出现的位置，返回子字符串对应的位置数组。示例：SELECT multiSearchAllPositions('Hello, World!', ['Hello', '!', 'world']); 返回：[1,13,0] ?multiSearchAllPositionsUTF8与函数multiSearchAllPositions功能相同，只是使用UTF8编码单位进行搜索。multiSearchFirstPosition(haystack, [needle1, needle2, …, needlen])返回匹配的最左边的偏移量。返回类型为UInt64。大小写和UTF8编码的相关的衍生函数： multiSearchFirstPositionCaseInsensitive, multiSearchFirstPositionUTF8, multiSearchFirstPositionCaseInsensitiveUTF8。示例：SELECT multiSearchFirstPosition('Hello, World!', ['Hello', '!', 'world']); 返回：1multiSearchFirstIndex(haystack, [needle1, needle2, …, needlen])搜索子字符串数组中的元素在字符串中出现的位置，返回第一个匹配的数组索引。大小写和UTF8编码的相关的衍生函数： multiSearchFirstIndexCaseInsensitive, multiSearchFirstIndexUTF8, multiSearchFirstIndexCaseInsensitiveUTF8。示例：SELECT multiSearchFirstIndex('Hello, World!', ['AB', '!', 'He']); 返回：2 数组中的第二个元素"!"第一次匹配字符串。multiSearchAny(haystack, [needle1, needle2, …, needlen])搜索子字符串数组中的元素在字符串中出现的位置。如果至少有一个子字符串匹配，则返回1，否则返回0。大小写和UTF8编码的相关的衍生函数：multiSearchAnyCaseInsensitive, multiSearchAnyUTF8, multiSearchAnyCaseInsensitiveUTF8。示例：SELECT multiSearchAny('Hello, World!', ['AB', '!', 'He']); 返回：1

2. 正则表达式匹配

检查字符串是否与正则表达式匹配。

正则表达式使用re2的语法。

反斜杠符号（\）用于在正则表达式中转义，为了转义表达式中的符号，必须在字符串文本中使用双斜杠。

在字符串中搜索子字符串，最好使用LIKE或position，它们能提供更快的性能。

re2正则表达式语法：Syntax · google/re2 Wiki · GitHub?。

函数描述示例match(haystack, pattern)检查字符串是否与正则表达式匹配。如果匹配则返回1，否则返回0。示例：select match('Hello','ll'); 返回：1multiMatchAny(haystack, [pattern1, pattern2, …, patternn])使用一个pattern数组的元素分别匹配字符串，如果没有匹配的正则表达式，则返回0。如果任何一个pattern匹配，则返回1。示例：select multiMatchAny('Hello World',['W','a','o']); 返回：1multiMatchAnyIndex(haystack, [pattern1, pattern2, …, patternn])使用一个pattern数组的元素分别匹配字符串，如果没有匹配的正则表达式，则返回0。如果任何一个pattern匹配，则匹配的pattern数组的索引。示例：select multiMatchAnyIndex('Hello World',['W','e']); 返回：2multiMatchAllIndices(haystack, [pattern1, pattern2, …, patternn])使用一个pattern数组的元素分别匹配字符串，返回所有匹配的pattern的元素的数组下标。示例：select multiMatchAllIndices('Hello World',['W','a','o']); 返回：[3,1]extract(haystack, pattern)使用正则表达式提取字符串的片段。如果不匹配，则返回空字符串。如果regex不含子模式，则它将获取与整个regex匹配的片段，否则，它将获取第一个子模式匹配的片段。示例1：select extract('China,Chinese','Chin[a|e]') ; 返回：China 示例2：select extract('China Node,Chinese Note','(Chin[a|e]) (No[t|d]e)') ; 返回：ChinaextractAll(haystack, pattern)使用正则表达式提取所有字符串的片段。如果不匹配，则返回空字符串。如果regex不含子模式，则它将获取与整个regex匹配的片段，否则，它将获取第一个子模式匹配的片段。示例1：select extractAll('China,Chinese','Chin[a|e]') 返回：['China','Chine'] 示例2：select extractAll('China-Node,Chinese-Note','(Chin[a|e])-') ; 返回：['China']?

3. like函数

检查字符串是否与pattern匹配，patter可包含两个符号:_和%。

_表示任意一个字节的占位。

%表示任何字节的任何数量（包括零个字符）

函数描述示例like(haystack, pattern)相同语义的操作符：" haystack LIKE pattern"。示例：select like('I Love China','%Love%') ; 返回：1notLike(haystack, pattern)like的取反。