COUNT_DISTINCT

count_distinct(expr)

功能描述

COUNT_DISTINCT
COUNT_DISTINCT
函数用于计算指定列中不同值(去重后)的数量。这是一个聚合函数,等价于
COUNT(DISTINCT expr)
COUNT(DISTINCT expr)
,但在某些场景下优化器可能会做特殊优化。

参数说明

  • expr
    expr
    :任意类型的表达式,要计算其不同值的数量。

返回类型

  • 返回
    BIGINT
    BIGINT
    类型(非空)。
  • 返回不同值的数量。

注意事项

  • 函数计算过程中,
    NULL
    NULL
    值会被忽略,不计入结果。
  • 如果所有值都是
    NULL
    NULL
    ,返回 0。
  • 如果输入为空集,返回 0。
  • COUNT_DISTINCT
    COUNT_DISTINCT
    需要维护所有不同值的集合,内存开销较大。对于大数据集,如果只需要近似值,考虑使用
    APPROX_COUNT_DISTINCT
    APPROX_COUNT_DISTINCT

使用示例

  1. 基本用法:计算不同值的数量

SELECT count_distinct(city) FROM VALUES ('Beijing'), ('Shanghai'), ('Beijing'), ('Shenzhen') AS t(city); +------------------------+ | count_distinct(city) | +------------------------+ | 3 | +------------------------+

  1. 等价于 COUNT(DISTINCT ...)

SELECT count_distinct(city), COUNT(DISTINCT city) FROM VALUES ('Beijing'), ('Shanghai'), ('Beijing'), ('Shenzhen') AS t(city); +------------------------+----------------------+ | count_distinct(city) | COUNT(DISTINCT city) | +------------------------+----------------------+ | 3 | 3 | +------------------------+----------------------+

  1. NULL 值被忽略

SELECT count_distinct(value) FROM VALUES (1), (2), (NULL), (1), (NULL), (3) AS t(value); +-------------------------+ | count_distinct(value) | +-------------------------+ | 3 | +-------------------------+

  1. 所有值都是 NULL 时返回 0

SELECT count_distinct(value) FROM VALUES (NULL), (NULL), (NULL) AS t(value); +-------------------------+ | count_distinct(value) | +-------------------------+ | 0 | +-------------------------+

  1. 按分组计算不同值数量

SELECT dept, count_distinct(name) as unique_employees FROM VALUES ('Sales', 'Alice'), ('Sales', 'Bob'), ('Sales', 'Alice'), ('IT', 'Charlie'), ('IT', 'David'), ('IT', 'Charlie') AS employees(dept, name) GROUP BY dept; +-------+------------------+ | dept | unique_employees | +-------+------------------+ | Sales | 2 | | IT | 2 | +-------+------------------+

联系我们
预约咨询
微信咨询
电话咨询
邮件咨询