layout |
---|
default |
![xiaowang(王霄)]({{ site.url }}./IMG_2356.JPG){:class="img-responsive"}
Hello! I am currently a full-time software engineer in MaxCompute previous called ODPS, Alibaba's large-scale, distributed computing and storage platform. Currently, I am busy on building and optimizing a high-performance SQL execution engine for MaxCompute-SQL.
Before that, I achieve my master's degree from State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences (SKL, ICT, CAS) under the supervision of Prof.Yunquan Zhang. I obtain my B.E. in Yunnan University, At that time, I was extensively involved in data mining and virtual machine migration projects.
I am interested in high-performance computing (Performance tuning and modeling, SIMD, GPGPU, GEMM, FFT) and building large-scale high-performance distributed computing systems.
- 2017.9-2020.7, Institute of Computing Technology, Chinese Academy of Sciences, Master, High-Performance Computing,
- 2013.9-2017.7, Yunnan University, B.Eng, Computer Science(3/121).
-
Full-time Software engineer at MaxCompute, Alibaba Cloud
- July.2020-July.2021 Investigate other OLAP sys(Clickhouse) & Manage releasing procedures and quality of one sprint covering different releasing regions and core users.
- Aug.2021-Dec.2021 Design adaptive distributed aggregation algorithms which are interchangable between hash-based and sort-based algorithms during runtime depending on data statistical characteristics.
- Jan.2022-Mar.2022 Refactorize high-performance vectorization aggregation for 30 kinds of aggregate functions.
- Mar.2022-July.2022 Design and implement hash-based distributed distinct aggregation algorithms for queries with multiple distinct functions
- April.2022-July.2022 Optimize performances of aggregate functions for query that without group-by keys by loop unrolling and SIMD.
- July.2022-Sept.2022 Improve data locality for hash-aggregation by designing and implementing row stores for aggregated intermediate partial states.
- Aug.2022- Optimize HashTable vectorization implementations of distributed HashJoin.
-
Software engineer intern at Amazon AWS AI Lab(ShangHai)
- July.2019~Aug.2019 Contribute to numpy-compatible operators for MXNet.
-
Software engineer intern at PerfxLab(Beijing)
- Mar.2019~May.2019 Implement column-row seperatable gaussian filter based on OpenCL.
- Jun.2018-Aug.2018 Tune performance specifically for AMD GCN arch GPU.
-
Research Assistant at Institute of Computing Technology
- May.2017-Aug.2017 Develop and optimize high-performance libraries on ARM CPUs which is compatible with Intel IPP.
-
[Implementation and Optimization of Multi-dimensional Real FFT on ARMv8 Platform]({{ site.url }}./ICA3PP18.pdf) ICA3PP18.
- Sept.2017-Sept.2019
- Authors: Wang Xiao and Jia Haipeng and Li Zhihao and Zhang Yunquan.
- Brief introduction: RealFFT is a high performance real number FFT library which supports 11 kinds of 1-3 dimensional float/double algorithms on both ARMv8 and x86 CPUs. It could outperforms FFTW on ARMv8 CPUs at that time.
-
[Efficient parallel optimizations of a high-performance SIFT on GPUs]({{ site.url }}./JPDC.pdf) JPDC19.
-
[AutoFFT: A Template-Based FFT Codes Auto-Generation Framework for ARM and X86 CPUs]({{ site.url }}./SC19.pdf) SC19.
- July.2018-July.2019
- Authors: Zhihao Li, Haipeng Jia, Yunquan Zhang, Tun Chen, Liang Yuan, Luning Cao, Xiao Wang.
- Brief introduction: AutoFFT is a template-based FFT auto-generation framework that acheive significant performance improvements on various CPUs and contributes to many Chinese vendor's library.
-
[MVUC: An Interactive System for Mining and Visualizing Urban Co-locations]({{ site.url }}./WAIM16.pdf) WAIM16.
- Oct.2014-July.2014
- Authors: Xiao Wang, Hongmei Chen, Qing Xiao.
- Brief introduction: MVUC proposes a data mining visualization method for spatial co-location patterns mined from urban datasets.
-
Virtual machine placement strategy using cluster-based genetic algorithm Neurocomputing.
- Oct.2015-July2016
- Authors: Binbin Zhang, Xiao Wang, Hao Wang.
- Brief introduction: This work proposes a cluster-based genetic algorithm to optimize migration strategies for virtual machines.
-
Perforamnce modeling and tunning of OpenBLAS on TianHe-3 superComputer FT2000 CPUs
- Oct.2018-Nov.2018
- Brief introduction: Develop and tune four GEMM(SCZD-GEMM) of L3-BLAS from OpenBLAS on FT2000+ CPUs achieving near peak theorical performances.
- I like traveling and observing what is happening in our planet, I hope I could left my footprints on every continets in the future. Regions I have been include:
- China: Beijing, Shanghai, Chongqing, Tianjin, Shenzhen, Zhuhai, Guangzhou, Huhhot, Kunming, Dali, HeFei, Zhengzhou, Shangqiu, Xinxiang, Kaifeng, Nanyang.
- The United States: Denver, Colorado, Los Angeles, Chicago,
- Italy: Rome, Venice.
- France: Paris,
- Germany: Luxembourg, Heidelberg.
- Switzerland: Interlaken,
- Email Address: [email protected]
Here is long version [Resume_Long_pdf_version]({{ site.url }}./wangxiao.pdf) with more details.