Skip to content

Latest commit

 

History

History
84 lines (65 loc) · 7.27 KB

index.md

File metadata and controls

84 lines (65 loc) · 7.27 KB
layout
default

![xiaowang(王霄)]({{ site.url }}./IMG_2356.JPG){:class="img-responsive"}

Hello! I am currently a full-time software engineer in MaxCompute previous called ODPS, Alibaba's large-scale, distributed computing and storage platform. Currently, I am busy on building and optimizing a high-performance SQL execution engine for MaxCompute-SQL.

Before that, I achieve my master's degree from State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (SKL, ICT, CAS) under the supervision of Prof.Yunquan Zhang. I obtain my B.E. in Yunnan University, At that time, I was extensively involved in data mining and virtual machine migration projects.

I am interested in high-performance computing (Performance tuning and modeling, SIMD, GPGPU, GEMM, FFT) and building large-scale high-performance distributed computing systems.

Education

  • 2017.9-2020.7, Institute of Computing Technology, Chinese Academy of Sciences, Master, High-Performance Computing,
  • 2013.9-2017.7, Yunnan University, B.Eng, Computer Science(3/121).

Industry Experience

  • Full-time Software engineer at MaxCompute, Alibaba Cloud

    • July.2020-July.2021 Investigate other OLAP sys(Clickhouse) & Manage releasing procedures and quality of one sprint covering different releasing regions and core users.
    • Aug.2021-Dec.2021 Design adaptive distributed aggregation algorithms which are interchangable between hash-based and sort-based algorithms during runtime depending on data statistical characteristics.
    • Jan.2022-Mar.2022 Refactorize high-performance vectorization aggregation for 30 kinds of aggregate functions.
    • Mar.2022-July.2022 Design and implement hash-based distributed distinct aggregation algorithms for queries with multiple distinct functions
    • April.2022-July.2022 Optimize performances of aggregate functions for query that without group-by keys by loop unrolling and SIMD.
    • July.2022-Sept.2022 Improve data locality for hash-aggregation by designing and implementing row stores for aggregated intermediate partial states.
    • Aug.2022- Optimize HashTable vectorization implementations of distributed HashJoin.
  • Software engineer intern at Amazon AWS AI Lab(ShangHai)

    • July.2019~Aug.2019 Contribute to numpy-compatible operators for MXNet.
  • Software engineer intern at PerfxLab(Beijing)

    • Mar.2019~May.2019 Implement column-row seperatable gaussian filter based on OpenCL.
    • Jun.2018-Aug.2018 Tune performance specifically for AMD GCN arch GPU.
  • Research Assistant at Institute of Computing Technology

    • May.2017-Aug.2017 Develop and optimize high-performance libraries on ARM CPUs which is compatible with Intel IPP.

Research Projects & Publications

  • [Implementation and Optimization of Multi-dimensional Real FFT on ARMv8 Platform]({{ site.url }}./ICA3PP18.pdf) ICA3PP18.

    • Sept.2017-Sept.2019
    • Authors: Wang Xiao and Jia Haipeng and Li Zhihao and Zhang Yunquan.
    • Brief introduction: RealFFT is a high performance real number FFT library which supports 11 kinds of 1-3 dimensional float/double algorithms on both ARMv8 and x86 CPUs. It could outperforms FFTW on ARMv8 CPUs at that time.
  • [Efficient parallel optimizations of a high-performance SIFT on GPUs]({{ site.url }}./JPDC.pdf) JPDC19.

    • July.2017-Sept.2017
    • Authors: Zhihao Li and Haipeng Jia and Yunquan Zhang and Shice Liu and Shigang Lia and Xiao Wang et al.
    • Brief introduction; HartSift is a high-performance SIFT implementation that achieves higher performances than its counterpart in OpenCV, SiftGPU, CudaSift.
  • [AutoFFT: A Template-Based FFT Codes Auto-Generation Framework for ARM and X86 CPUs]({{ site.url }}./SC19.pdf) SC19.

    • July.2018-July.2019
    • Authors: Zhihao Li, Haipeng Jia, Yunquan Zhang, Tun Chen, Liang Yuan, Luning Cao, Xiao Wang.
    • Brief introduction: AutoFFT is a template-based FFT auto-generation framework that acheive significant performance improvements on various CPUs and contributes to many Chinese vendor's library.
  • [MVUC: An Interactive System for Mining and Visualizing Urban Co-locations]({{ site.url }}./WAIM16.pdf) WAIM16.

    • Oct.2014-July.2014
    • Authors: Xiao Wang, Hongmei Chen, Qing Xiao.
    • Brief introduction: MVUC proposes a data mining visualization method for spatial co-location patterns mined from urban datasets.
  • Virtual machine placement strategy using cluster-based genetic algorithm Neurocomputing.

    • Oct.2015-July2016
    • Authors: Binbin Zhang, Xiao Wang, Hao Wang.
    • Brief introduction: This work proposes a cluster-based genetic algorithm to optimize migration strategies for virtual machines.
  • Perforamnce modeling and tunning of OpenBLAS on TianHe-3 superComputer FT2000 CPUs

    • Oct.2018-Nov.2018
    • Brief introduction: Develop and tune four GEMM(SCZD-GEMM) of L3-BLAS from OpenBLAS on FT2000+ CPUs achieving near peak theorical performances.

Traveling

  • I like traveling and observing what is happening in our planet, I hope I could left my footprints on every continets in the future. Regions I have been include:
    • China: Beijing, Shanghai, Chongqing, Tianjin, Shenzhen, Zhuhai, Guangzhou, Huhhot, Kunming, Dali, HeFei, Zhengzhou, Shangqiu, Xinxiang, Kaifeng, Nanyang.
    • The United States: Denver, Colorado, Los Angeles, Chicago,
    • Italy: Rome, Venice.
    • France: Paris,
    • Germany: Luxembourg, Heidelberg.
    • Switzerland: Interlaken,

Contact

Resume

Here is long version [Resume_Long_pdf_version]({{ site.url }}./wangxiao.pdf) with more details.