Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于多表关联查询的优化思路 #26

Open
superleeyom opened this issue Mar 26, 2021 · 0 comments
Open

关于多表关联查询的优化思路 #26

superleeyom opened this issue Mar 26, 2021 · 0 comments
Assignees

Comments

@superleeyom
Copy link
Owner

superleeyom commented Mar 26, 2021

问题分析

最近在帮同事优化一个慢查询,这张主表的数量在 100w+,它具体的问题就是,查询条件非常多,大约有 30 多个可选的查询条件,这些查询的字段分散在数据库的各个表中,导致 left join 的表特别多,大约 left join 七八张表,这种情况下分页查询,查询时间在 5~6 秒,非常的影响查询体验。

-- 示例伪 sql
select  o.id
        t1.name,
        t2.name,
        ...       
from order o
    left join table_1 t1
    left join table_2 t2
    left join table_3 t3
    left join table_4 t4
    left join table_5 t5
    ...
where o.id = 1323 
    and t1.id = 2323
    and t2.name = 'xxx'
    ...

解决方案

动态 left join

通常情况下,用户使用的查询条件只会有两到三个,所以就可以根据用户实际的查询条件,动态的 left join 相关的表,比如 mybatis 里可以这样编写:

<if test="(ticketWebQueryDto.snCode != null and ticketWebQueryDto.snCode != '')">
    left join ticket_product_detail tpd on t.ticket_id = tpd.ticket_id
</if>
<if test="ticketWebQueryDto.pickingUserName != null and ticketWebQueryDto.pickingUserName != ''">
    left join ticket_picking tp on tp.ticket_id = t.ticket_id
</if>

这样一来,left join 的表就可以减少不少。

主查询只返回主表主键 id

select id from xxx 直接使用 index 里面的值就返回结果的。但是一旦用了 select *,就会有其他列需要读取,这时在读完 index 以后还需要去读 data 才会返回结果。这两种处理方式性能差异非常大,特别是返回行数比较多,并且读数据需要 IO 的时候,可能会有几十上百倍的差异。主查询只返回主表 id 情况下,充分利用索引的优势,通常我们的主表存放的都是其他表的 id 字段,但是页面展示的都是 name,这时候如果我们为了省事,一次性将所需要的字段的 name select 出来,势必会降低查询效率,增加回表的次数,降低索引的命中率,所以大数据量情况下,将查询分散到应用层面,而非数据库层面,整体的效率会提升很大。

-- 示例伪 sql
select  o.id     
from order o
    left join table_1 t1
    left join table_2 t2
    left join table_3 t3
    left join table_4 t4
    left join table_5 t5
    ...
where o.id = 1323 
    and t1.id = 2323
    and t2.name = 'xxx'
    ...

复杂查询拆解为多次单表查询

核心思路就是,将多表关联查询,拆解为多个单表查询,然后在进行数据整合,由于是分页查询,所以主键 id 数量肯定是有限制的,通常是 10~20 个,所以在代码层面,我们批量查询主表(单表查询):

// 以下代码均为伪代码,只讲解下思路
// 分页联合查询,但是 select 的 column 只有主表的主键 id
IPage<TicketWebListDto> page = ticketMapper.listByPage(pageParam, ticketWebQueryDto);
List<BigInteger> ticketIdList = list.stream().map(TicketWebListDto::getTicketId).collect(Collectors.toList());
// 批量单表查询主表 
List<Ticket> ticketPageList = ticketService.getTicketPageListByTicketIds(ticketIdList);
Map<BigInteger, Ticket> ticketMap = ticketPageList.stream().collect(Collectors.toMap(Ticket::getTicketId, ticket -> ticket));

收集其他需要 left join 表的主键id,并进行多次单表查询:

// 收集其他表的主键id
List<String> workerIdList = new ArrayList<>(ticketPageList.size());
List<String> providerIdList = new ArrayList<>(ticketPageList.size());
List<String> customerAddressIdList = new ArrayList<>(ticketPageList.size());
for (Ticket ticket : ticketPageList) {
    workerIdList.add(ticket.getWorkerId());
    providerIdList.add(ticket.getProviderId());
    customerAddressIdList.add(ticket.getCustomerAddressId());
}

// 单表查询1
List<ProviderWorker> providerWorkerList = CollUtil.emptyIfNull(providerWorkerService.getWorkerInfoByWorkerIds(workerIdList));
// 单表查询2
List<ProviderObj> providerObjList =CollUtil.emptyIfNull(providerObjService.getProviderInfoByProviderIds(providerIdList));
// 单表查询3
List<CustomerInfoVO> addressList = CollUtil.emptyIfNull(customerRecvAddressService.getCustomerInfoByAddressIds(customerAddressIdList));

将查询到的单表数据进行内存映射,构建k-v键值对,key 是单表主键id,value 就是我们查询到的数据,这一步的目的是为了接下来循环的构建返回前端视图层 VO 的时候,直接就可以从内存里面获取我们的单表数据:

Map<String, ProviderWorker> workerMap = providerWorkerList.stream().collect(Collectors.toMap(ProviderWorker::getWorkerId, providerWorker -> providerWorker));
Map<String, ProviderObj> providerObjMap = providerObjList.stream().collect(Collectors.toMap(ProviderObj::getProviderId, providerObj -> providerObj));
Map<String, CustomerInfoVO> customerInfoVOMap = addressList.stream().collect(Collectors.toMap(CustomerInfoVO::getAddressId, customerInfoVO -> customerInfoVO));

循环遍历,开始构建视图层 VO :

// 构建视图层
List<TicketWebListDto> list = page.getRecords();
list.forEach(t ->{
		
    // 从内存中获取构建的数据
    Ticket ticketPage = ticketMap.get(t.getTicketId());
    ProviderObj providerObj = ObjectUtil.defaultIfNull(providerObjMap.get(ticketPage.getProviderId()), new ProviderObj());
    ProviderWorker providerWorker = ObjectUtil.defaultIfNull(workerMap.get(ticketPage.getWorkerId()), new ProviderWorker());
    CustomerInfoVO customerInfoVO = ObjectUtil.defaultIfNull(customerInfoVOMap.get(ticketPage.getCustomerAddressId()), new CustomerInfoVO());
    
    // 设置value
    t.setProviderName(providerObj.getProviderName());
    t.setWorkerName(providerWorker.getWorkerName());
    t.setCustomerName(customerInfoVO.getName());
    // ...
});
return page;

总结

  1. 在数据量比较大的情况下,多表关联查询,拆解为多个单表查询,可以提高整体的查询效率
  2. for 循环里面,不能有查询 DB 的操作,应该在 for 循环外批量从 DB 取出后,映射到内存中,然后 for 循环从内存中读取
@superleeyom superleeyom self-assigned this Mar 26, 2021
@superleeyom superleeyom changed the title 关于多表关联查询的一次优化 关于多表关联查询的优化思路 Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant