使用样例数据快速开始查询分析

教程概述

通过本教程,您将了解到如何利用Lakehouse平台内置的样例数据集,无需提前准备数据即可快速使用SQL进行查询分析已评估SQL功能及性能。

样例数据集由云器平台通过名称为CLICKZETTA_SAMPLE_DATA的数据集以共享方式开放给所有账户查询。本教程将以其中的TPC-H 100gb数据集为例,介绍如何在Lakehouse中快速完成TPC-H测试集的查询测试,评估处理性能。

以下将通过以下步骤完成教程内容:

  • 环境准备:通过样例数据集检查原始数据,创建测试使用的计算集群
  • 发起查询:使用Studio Web环境创建SQL查询完成22个TPC-H SQL查询
  • 变更集群大小:调整集群大小,扩大至之前集群大小的4倍
  • 发起查询:使用调整大小后的集群再次完成22个TPC-H SQL查询
  • 观察不同集群规格下,查询时延的变化

Step01. 准备工作

首先,在您登录Lakehouse Web控制台并进入指定的工作空间后。可访问”数据”模块,检查数据管理下的数据对象列表中”clickzetta_sample_data.tpch_100gb“下的相关表是否存在。

其次,我们将为本次测试临时一个独立的计算集群用于查询分析。您可以通过访问Lakehouse Web控制台的”计算→集群”菜单,通过页面向导新建集群。

同时,您也可以通过SQL命令来创建集群。通过SQL命令操作时,您可以不离开SQL开发上下文,在Ad-hoc或者ETL开发过程中通过SQL命令控制集群创建、扩/缩容、暂停/恢复、销毁动作,往往可以提高计算资源的操作效率。

本教程通过在”开发”模块新建SQL脚本任务方式,执行以下脚本即可快速创建集群。

-- 创建分析型虚拟计算资源
create vcluster if not exists TPCH_100GB vcluster_size='Medium' vcluster_type='Analytics'  AUTO_RESUME=TRUE AUTO_SUSPEND_IN_SECOND=300 min_replicas=1 max_replicas=1 comment 'TPCH 100GB TEST';

Step2. 对样例数据进行TPC-H的查询

在”开发”模块新建SQL脚本任务,将TPC-H的22个查询语句录入,在”集群”下拉列表中选择刚才创建的测试集群,然后选中任务中的全部脚本后点击”运行”按钮进行串行查询。

查询脚本为:

-- 执行使用TPCH_100GB计算资源运行作业
use vcluster TPCH_100GB;

-- 设置查询的标签
set query_tag='tpch100g_benchmark';

-- Q1
select
        l_returnflag,
        l_linestatus,
        sum(l_quantity) as sum_qty,
        sum(l_extendedprice) as sum_base_price,
        sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
        sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
        avg(l_quantity) as avg_qty,
        avg(l_extendedprice) as avg_price,
        avg(l_discount) as avg_disc,
        count(*) as count_order
from
        clickzetta_sample_data.tpch_100g.lineitem
where
        l_shipdate <= date '1998-12-01' - interval '85' day
group by
        l_returnflag,
        l_linestatus
order by
        l_returnflag,
        l_linestatus;

-- Q2
select
s_acctbal, -- 帐户余额
s_name, -- 名字
n_name, -- 国家
p_partkey, -- 零件的号码
p_mfgr, -- 生产者
s_address, -- 供应者的地址
s_phone, -- 电话号码
s_comment -- 备注信息
from
clickzetta_sample_data.tpch_100g.part,
clickzetta_sample_data.tpch_100g.supplier,
clickzetta_sample_data.tpch_100g.partsupp,
clickzetta_sample_data.tpch_100g.nation,
clickzetta_sample_data.tpch_100g.region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and p_size = 15 -- 指定大小,在区间[1, 50]内随机选择
and p_type like '%BRASS' -- 指定类型,在TPC-H标准指定的范围内随机选择
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE' -- 指定地区,在TPC-H标准指定的范围内随机选择
and ps_supplycost = (
select
min(ps_supplycost) --聚集函数
from   -- 与父查询的表有重叠
clickzetta_sample_data.tpch_100g.partsupp,
clickzetta_sample_data.tpch_100g.supplier,
clickzetta_sample_data.tpch_100g.nation,
clickzetta_sample_data.tpch_100g.region
where
p_partkey = ps_partkey
and s_suppkey = ps_suppkey
and s_nationkey = n_nationkey
and n_regionkey = r_regionkey
and r_name = 'EUROPE'
)
order by
s_acctbal desc,
n_name,
s_name,
p_partkey
limit 100;

-- Q3
select
    l_orderkey,
    sum(l_extendedprice * (1 - l_discount)) as revenue, -- 潜在的收入,聚集操作
    o_orderdate,
    o_shippriority
from
    clickzetta_sample_data.tpch_100g.customer,
    clickzetta_sample_data.tpch_100g.orders,
    clickzetta_sample_data.tpch_100g.lineitem
where
    c_mktsegment = 'BUILDING' -- 在TPC-H标准指定的范围内随机选择
    and c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and o_orderdate < '1995-03-15' -- 指定日期段,在在[1995-03-01, 1995-03-31]中随机选择
    and l_shipdate > '1995-03-15'  -- 指定日期段,在在[1995-03-01, 1995-03-31]中随机选择
group by
    l_orderkey, -- 订单标识
    o_orderdate,  -- 订单日期
    o_shippriority -- 运输优先级
order by
    revenue desc, -- 降序排序,把潜在最大收入列在前面
    o_orderdate
limit 10;

-- Q4
select
        o_orderpriority,
        count(*) as order_count
from
        clickzetta_sample_data.tpch_100g.orders
where
        o_orderdate >= date '1993-10-01'
        and o_orderdate < date '1993-10-01' + interval '3' month
        and exists (
                select
                        *
                from
                        clickzetta_sample_data.tpch_100g.lineitem
                where
                        l_orderkey = o_orderkey
                        and l_commitdate < l_receiptdate
        )
group by
        o_orderpriority
order by
        o_orderpriority;

-- Q5
select
    n_name,
    sum(l_extendedprice * (1 - l_discount)) as revenue
from
    clickzetta_sample_data.tpch_100g.customer,
    clickzetta_sample_data.tpch_100g.orders,
    clickzetta_sample_data.tpch_100g.lineitem,
    clickzetta_sample_data.tpch_100g.supplier,
    clickzetta_sample_data.tpch_100g.nation,
    clickzetta_sample_data.tpch_100g.region
where
    c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'ASIA' -- 指定地区,在TPC-H标准指定的范围内随机选择
    and o_orderdate >= '1994-01-01' -- DATE是从1993年到1997年中随机选择的一年的1月1日
    and o_orderdate < date '1996-01-01' + interval '1' year
group by
    n_name -- 按名字分组
order by
    revenue desc; -- 按收入降序排序,注意分组和排序子句不同

-- Q6
select
    sum(l_extendedprice * l_discount) as revenue -- 潜在的收入增加量
from
    clickzetta_sample_data.tpch_100g.lineitem
where
    l_shipdate >= '1994-01-01' -- DATE是从[1993, 1997]中随机选择的一年的1月1日
    and l_shipdate < date '1996-01-01' + interval '1' year -- 一年内
    and l_discount between 0.06 - 0.01 and 0.06 + 0.01
    and l_quantity < 24; -- QUANTITY在区间[24, 25]中随机选择

-- Q7
select
    supp_nation, -- 供货商国家
    cust_nation, -- 顾客国家
    l_year,
    sum(volume) as revenue -- 年度、年度的货运收入
from
    (
        select
            n1.n_name as supp_nation,
            n2.n_name as cust_nation,
            extract(year from l_shipdate) as l_year,
            l_extendedprice * (1 - l_discount) as volume
        from
            clickzetta_sample_data.tpch_100g.supplier,
            clickzetta_sample_data.tpch_100g.lineitem,
            clickzetta_sample_data.tpch_100g.orders,
            clickzetta_sample_data.tpch_100g.customer,
            clickzetta_sample_data.tpch_100g.nation n1,
            clickzetta_sample_data.tpch_100g.nation n2
        where
            s_suppkey = l_suppkey
            and o_orderkey = l_orderkey
            and c_custkey = o_custkey
            and s_nationkey = n1.n_nationkey
            and c_nationkey = n2.n_nationkey
            and ( -- NATION2和NATION1的值不同,表示查询的是跨国的货运情况
                (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
                or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
            )
            and l_shipdate between '1995-01-01' and '1996-12-31'
    ) as shipping
group by
    supp_nation,
    cust_nation,
    l_year
order by
    supp_nation,
    cust_nation,
    l_year;

-- Q8
select
    o_year, -- 年份
    sum(case
        when nation = 'BRAZIL' then volume -- 指定国家,在TPC-H标准指定的范围内随机选择
        else 0
    end) / sum(volume) as mkt_share -- 市场份额:特定种类的产品收入的百分比;聚集操作
from
    (
        select
            extract(year from o_orderdate) as o_year, -- 分解出年份
            l_extendedprice * (1 - l_discount) as volume, -- 特定种类的产品收入
            n2.n_name as nation
        from
            clickzetta_sample_data.tpch_100g.part,
            clickzetta_sample_data.tpch_100g.supplier,
            clickzetta_sample_data.tpch_100g.lineitem,
            clickzetta_sample_data.tpch_100g.orders,
            clickzetta_sample_data.tpch_100g.customer,
            clickzetta_sample_data.tpch_100g.nation n1,
            clickzetta_sample_data.tpch_100g.nation n2,
            clickzetta_sample_data.tpch_100g.region
        where
            p_partkey = l_partkey
            and s_suppkey = l_suppkey
            and l_orderkey = o_orderkey
            and o_custkey = c_custkey
            and c_nationkey = n1.n_nationkey
            and n1.n_regionkey = r_regionkey
            and r_name = 'AMERICA' -- 指定地区,在TPC-H标准指定的范围内随机选择
            and s_nationkey = n2.n_nationkey
            and o_orderdate between '1995-01-01' and '1996-12-31' -- 只查95、96年的情况
            and p_type = 'ECONOMY ANODIZED STEEL' -- 指定零件类型,在TPC-H标准指定的范围内随机选择
    ) as all_nations
group by
    o_year -- 按年分组
order by
    o_year; -- 按年排序

-- Q9
select
    nation,
    o_year,
    sum(amount) as sum_profit --每个国家每一年所有被定购的零件在一年中的总利润
from
    (
        select
            n_name as nation, -- 国家
            extract(year from o_orderdate) as o_year, -- 取出年份
            l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount --利润
        from
            clickzetta_sample_data.tpch_100g.part,
            clickzetta_sample_data.tpch_100g.supplier,
            clickzetta_sample_data.tpch_100g.lineitem,
            clickzetta_sample_data.tpch_100g.partsupp,
            clickzetta_sample_data.tpch_100g.orders,
            clickzetta_sample_data.tpch_100g.nation
        where
            s_suppkey = l_suppkey
            and ps_suppkey = l_suppkey
            and ps_partkey = l_partkey
            and p_partkey = l_partkey
            and o_orderkey = l_orderkey
            and s_nationkey = n_nationkey
            and p_name like '%green%' -- LIKE操作,查询优化器可能进行优化
    ) as profit
group by
    nation, -- 国家
    o_year  -- 年份
order by
    nation, 
    o_year desc;

-- Q10
select
        c_custkey,
        c_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        c_acctbal,
        n_name,
        c_address,
        c_phone,
        c_comment
from
        clickzetta_sample_data.tpch_100g.customer,
        clickzetta_sample_data.tpch_100g.orders,
        clickzetta_sample_data.tpch_100g.lineitem,
        clickzetta_sample_data.tpch_100g.nation
where
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate >= date '1993-04-01'
        and o_orderdate < date '1993-04-01' + interval '3' month
        and l_returnflag = 'R'
        and c_nationkey = n_nationkey
group by
        c_custkey,
        c_name,
        c_acctbal,
        c_phone,
        n_name,
        c_address,
        c_comment
order by
        revenue desc
limit 20;

-- Q11
select
        ps_partkey,
        sum(ps_supplycost * ps_availqty) as value
from
        clickzetta_sample_data.tpch_100g.partsupp,
        clickzetta_sample_data.tpch_100g.supplier,
        clickzetta_sample_data.tpch_100g.nation
where
        ps_suppkey = s_suppkey
        and s_nationkey = n_nationkey
        and n_name = 'CANADA'
group by
        ps_partkey having
                sum(ps_supplycost * ps_availqty) > (
                        select
                                sum(ps_supplycost * ps_availqty) * 0.000001000000
                        from
                                clickzetta_sample_data.tpch_100g.partsupp,
                                clickzetta_sample_data.tpch_100g.supplier,
                                clickzetta_sample_data.tpch_100g.nation
                        where
                                ps_suppkey = s_suppkey
                                and s_nationkey = n_nationkey
                                and n_name = 'CANADA'
                )
order by
        value desc;

-- Q12
select
        l_shipmode,
        sum(case
                when o_orderpriority = '1-URGENT'
                        or o_orderpriority = '2-HIGH'
                        then 1
                else 0
        end) as high_line_count,
        sum(case
                when o_orderpriority <> '1-URGENT'
                        and o_orderpriority <> '2-HIGH'
                        then 1
                else 0
        end) as low_line_count
from
        clickzetta_sample_data.tpch_100g.orders,
        clickzetta_sample_data.tpch_100g.lineitem
where
        o_orderkey = l_orderkey
        and l_shipmode in ('RAIL', 'SHIP')
        and l_commitdate < l_receiptdate
        and l_shipdate < l_commitdate
        and l_receiptdate >= date '1994-01-01'
        and l_receiptdate < date '1994-01-01' + interval '1' year
group by
        l_shipmode
order by
        l_shipmode;

-- Q13
select
    c_count,
    count(*) as custdist
from
    (
        select
            c_custkey,
            count(o_orderkey) as c_count
        from  -- 子查询中包括左外连接操作
            clickzetta_sample_data.tpch_100g.customer left outer join clickzetta_sample_data.tpch_100g.orders on
                c_custkey = o_custkey
                and o_comment not like '%special%requests%'
                -- WORD1 为以下四个可能值中任意一个:special、pending、unusual、express
                -- WORD2 为以下四个可能值中任意一个:packages、requests、accounts、deposits
        group by
            c_custkey
    ) c_orders
group by
    c_count
order by
    custdist desc,
    c_count desc;

-- Q14
select
        100.00 * sum(case
                when p_type like 'PROMO%'  -- 促销零件
                        then l_extendedprice * (1 - l_discount)  -- 某一特定时间的收入
                else 0
        end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue
from
        clickzetta_sample_data.tpch_100g.lineitem,
        clickzetta_sample_data.tpch_100g.part
where
        l_partkey = p_partkey
        and l_shipdate >= date '1994-04-01' -- DATE是从1993年到1997年中任一年的任一月的一号
        and l_shipdate < date '1994-04-01' + interval '1' month;    

-- Q15
select
        s_suppkey,
        s_name,
        s_address,
        s_phone,
        total_revenue
from
        clickzetta_sample_data.tpch_100g.supplier,
    (
        select
            l_suppkey supplier_no,
            sum(l_extendedprice * (1 - l_discount)) total_revenue
        from
            clickzetta_sample_data.tpch_100g.lineitem
        where
                l_shipdate >= date '1994-05-01'
          and l_shipdate < date '1994-05-01' + interval '3' month
        group by
            l_suppkey
    ) as revenue0
where
        s_suppkey = supplier_no
        and total_revenue = (
                select
                        max(total_revenue)
                from 
                        (
        select
            l_suppkey supplier_no,
            sum(l_extendedprice * (1 - l_discount)) total_revenue
        from
            clickzetta_sample_data.tpch_100g.lineitem
        where
                l_shipdate >= date '1994-05-01'
          and l_shipdate < date '1994-05-01' + interval '3' month
        group by
            l_suppkey
        ) as revenue0
        )
order by
        s_suppkey;

-- Q16
select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt -- 聚集、去重操作
from
    clickzetta_sample_data.tpch_100g.partsupp,
    clickzetta_sample_data.tpch_100g.part
where
    p_partkey = ps_partkey
    and p_brand <> 'Brand#45' --BRAND=Brand  --MN ,M和N是两个字母,代表两个数值,相互独立,取值在1到5之间
    and p_type not like 'MEDIUM POLISHED%' -- 消费者不感兴趣的类型和尺寸
    and p_size in (49, 14, 23, 45, 19, 3, 36, 9) -- TYPEX是在1到50之间任意选择的一组八个不同的值
    and ps_suppkey not in ( -- NOT IN子查询,消费者排除某些供货商
        select
            s_suppkey
        from
            clickzetta_sample_data.tpch_100g.supplier
        where
            s_comment like '%Customer%Complaints%'
    )
group by
    p_brand,
    p_type,
    p_size
order by  -- 按数量降序排列,按品牌、种类、尺寸升序排列
    supplier_cnt desc,  
    p_brand,
    p_type,
    p_size;

-- Q17
select
        sum(l_extendedprice) / 7.0 as avg_yearly
from
        clickzetta_sample_data.tpch_100g.lineitem,
        clickzetta_sample_data.tpch_100g.part
where
        p_partkey = l_partkey
        and p_brand = 'Brand#23'
        and p_container = 'WRAP BAG'
        and l_quantity < (
                select
                        0.2 * avg(l_quantity)
                from
                        clickzetta_sample_data.tpch_100g.lineitem
                where
                        l_partkey = p_partkey
        );

-- Q18
select
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice,
    sum(l_quantity) --订货总数
from
    clickzetta_sample_data.tpch_100g.customer,
    clickzetta_sample_data.tpch_100g.orders,
    clickzetta_sample_data.tpch_100g.lineitem
where
    o_orderkey in (
        select
            l_orderkey
        from
            clickzetta_sample_data.tpch_100g.lineitem
        group by
            l_orderkey having
                sum(l_quantity) > 300 -- QUANTITY是位于312到315之间的任意值
    )
    and c_custkey = o_custkey
    and o_orderkey = l_orderkey
group by
    c_name,
    c_custkey,
    o_orderkey,
    o_orderdate,
    o_totalprice
order by
    o_totalprice desc,
    o_orderdate
limit 100;

-- Q19
select
    sum(l_extendedprice* (1 - l_discount)) as revenue
from
    clickzetta_sample_data.tpch_100g.lineitem,
    clickzetta_sample_data.tpch_100g.part
where
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#12' 
        and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG') 
        and l_quantity >= 1 and l_quantity <= 1 + 10 
        and p_size between 1 and 5 
        and l_shipmode in ('AIR', 'AIR REG') 
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#23'
        and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')
        and l_quantity >= 10 and l_quantity <= 10 + 10 
        and p_size between 1 and 10
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    )
    or
    (
        p_partkey = l_partkey
        and p_brand = 'Brand#34'
        and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')
        and l_quantity >= 20 and l_quantity <= 20 + 10 
        and p_size between 1 and 15
        and l_shipmode in ('AIR', 'AIR REG')
        and l_shipinstruct = 'DELIVER IN PERSON'
    );

-- Q20
select
        s_name,
        s_address
from
        clickzetta_sample_data.tpch_100g.supplier,
        clickzetta_sample_data.tpch_100g.nation
where
        s_suppkey in (
                select
                        ps_suppkey
                from
                        clickzetta_sample_data.tpch_100g.partsupp
                where
                        ps_partkey in (
                                select
                                        p_partkey
                                from
                                        clickzetta_sample_data.tpch_100g.part
                                where
                                        p_name like 'antique%'
                        )
                        and ps_availqty > (
                                select
                                        0.5 * sum(l_quantity)
                                from
                                        clickzetta_sample_data.tpch_100g.lineitem
                                where
                                        l_partkey = ps_partkey
                                        and l_suppkey = ps_suppkey
                                        and l_shipdate >= date '1993-01-01'
                                        and l_shipdate < date '1993-01-01' + interval '1' year
                        )
        )
        and s_nationkey = n_nationkey
        and n_name = 'IRAQ'
order by
        s_name;

-- Q21
select
    s_name,
    count(*) as numwait
from
    clickzetta_sample_data.tpch_100g.supplier,
    clickzetta_sample_data.tpch_100g.lineitem l1,
    clickzetta_sample_data.tpch_100g.orders,
    clickzetta_sample_data.tpch_100g.nation
where
    s_suppkey = l1.l_suppkey
    and o_orderkey = l1.l_orderkey
    and o_orderstatus = 'F'
    and l1.l_receiptdate > l1.l_commitdate
    and exists ( -- EXISTS子查询
        select
            *
        from
            clickzetta_sample_data.tpch_100g.lineitem l2
        where
            l2.l_orderkey = l1.l_orderkey
            and l2.l_suppkey <> l1.l_suppkey
    )
    and not exists (-- NOT EXISTS子查询
        select
            *
        from
           clickzetta_sample_data.tpch_100g.lineitem l3
        where
            l3.l_orderkey = l1.l_orderkey
            and l3.l_suppkey <> l1.l_suppkey
            and l3.l_receiptdate > l3.l_commitdate
    )
    and s_nationkey = n_nationkey
    and n_name = 'SAUDI ARABIA' -- TPC-H标准定义的任意值
group by
    s_name
order by
    numwait desc,
    s_name
limit 100;

-- Q22
select
    cntrycode,
    count(*) as numcust,
    sum(c_acctbal) as totacctbal
from
    ( -- 第一层子查询
        select
            substring(c_phone from 1 for 2) as cntrycode,
            c_acctbal
        from
            clickzetta_sample_data.tpch_100g.customer
        where
            substring(c_phone from 1 for 2) in
                ('13', '31', '23', '29', '30', '18', '17') -- I1…I7是在TPC-H中定义国家代码的可能值中不重复的任意值
            and c_acctbal > ( -- 第二层聚集子查询
                select
                    avg(c_acctbal)
                from
                    clickzetta_sample_data.tpch_100g.customer
                where
                    c_acctbal > 0.00
                    and substring(c_phone from 1 for 2) in
                        ('13', '31', '23', '29', '30', '18', '17')
            )
            and not exists ( -- 第二层NOT EXISTS子查询
                select
                    *
                from
                    clickzetta_sample_data.tpch_100g.orders
                where
                    o_custkey = c_custkey
            )
    ) as custsale
group by
    cntrycode
order by
    cntrycode;

执行完毕后,可以通过当前SQL Editor运行历史查看本次任务的运行耗时。

如果您希望进行性能测试,可连续执行2次以上的查询,以便计算集群能够充分Cache数据发挥最佳性能。同时,云器Lakehouse也提供了系统主动Cache功能,本教程中暂不展开这部分介绍。以下是第二次运行,计算集群Cache数据后的运行结果,较第一次无Cache有明显提升。

如果希望22个查询中每个查询的执行详情,可通过访问”计算→作业历史”,根据查询标签”tpch100g_benchmark”对查询历史进行过滤后查看。

Step3. 扩大集群规格,再次对样例数据进行TPC-H的查询

通过”计算→集群”管理页面,您可以对刚才创建的测试集群修改规格大小,从M修改至L,L规格大小为M的2倍。

或者在SQL脚本中执行以下命令进行修改:

-- 修改集群大小
alter vcluster TPCH_100GB SET VCLUSTER_SIZE = 'LARGE';

修改后,使用调整大小后的集群再次进行查询测试。

新扩容出的计算节点充分Cache后,性能会继续提升。

通过作业的运行时间可以观察到,相同的数据规模和查询任务,通过扩大计算集群规格大小,任务的总体运行时间大为缩短。2次执行之后,随着数据cache,能提升查询性能。

联系我们
预约咨询
微信咨询
电话咨询