From 10ece5a89dd239b36a3a3dc2aa100d95559b7a27 Mon Sep 17 00:00:00 2001 From: Sagar Sumit Date: Tue, 18 Jan 2022 17:33:57 +0530 Subject: [PATCH 1/4] Add RFC for async metadata indexing Add more details --- rfc/rfc-45/async_metadata_index.png | Bin 0 -> 39199 bytes rfc/rfc-45/rfc-45.md | 229 ++++++++++++++++++++++++++++ 2 files changed, 229 insertions(+) create mode 100644 rfc/rfc-45/async_metadata_index.png create mode 100644 rfc/rfc-45/rfc-45.md diff --git a/rfc/rfc-45/async_metadata_index.png b/rfc/rfc-45/async_metadata_index.png new file mode 100644 index 0000000000000000000000000000000000000000..cc044d6c8f3fabafaf0b51997d8c7b1548213a8f GIT binary patch literal 39199 zcmd42cT|*3(>I7BC{aNX1_cxb$-yCK5D7!hLq@`ov*a+SZ-^$|9v#b>IJ9rN#jg@mF67FJa>!L>0C=J}PWtePSgme(6BtWN=0SeKZlPn%d+ zuAEp{+aIv71ixZokvpa~s|sTX0aO$;r0(wS4h{~cr>DESyR)*gzJLFIetwQXAPfu) za&mI6udnUw?EL-xo12@brlvwdLbkWJhlhu~yu9qp%@8mz3oElPQJ!|zW@~F}ySuww zTwLe%Wp^!AKb8x7HcJuF9@FT=gJ0Qqzl*G_tcHe$uCA_5PfwfUBCne&lFP!n){48E zq8b_+!o$O*rKKSd$nx@XM@L79n~T>M=iwi5$;ruug@v=TvlWrSaXEn@{?0?+BO8kS zI_iV2whFqFo*Y?CEoi((~?40vmJj);uV@X;2nVF@= z#>SeOn$FJ7+}zx#s3<;R4;6KDc_k}2qRraYuDK%gbTubA#y3$=bnkY z%zLa594cAN1D>O-&POaPij=>?x596o0l;UcUU^f}H_mM9H=YN~K*R-J4 z$6Nn&%Cx3v)0O2W)N^ z-2=9FGiHt9ESY$A3Q2V(}qIs^^NJFNSM19-9x-;@xQ#rmbUVUyD>kSMgN-<8OOuH{ zfp{j#f$L31tnJ8dHaEQZxRMmX{ig}Gh{tA~ns#Y9zuXsiViuQVISXTom$%&)xndT1 zSj+QO;o;0JcME$(ejLP6Q_0=$GBh7~ue)=_CB?tLkYk^Dl-toXpQ#%fR+1elE{G0e zIkZ39HEJ;j`b9eTCrmFor8}K07$NO}em5dH%hvexOSl~l*O7~UcQ=c7FBUWZd_H5(1KK?0r;fiDWnYU%2LweqP>Q50;w!rSG37#juh&xoZ3I z9pu^7-r2>?V##D#VM4jX%Iv%TnICK7;BL-h;3~`X-_8>AuV{LoL4qb^LmnrRKtL5Y z5+g|_Fa0>j>ZwiH(^;?4=K^fdQ?_Ew^IA#n{`PyX!0@ZVjw)12_=>Df~B$P z2L(+pd8B>mul2CNo`kK7H3z`)cyhYYZM}|Df-=}FVRcXT=6ur>#gGAyzoyb*8hQAT zw_Wnf>m4$MU&jspXkd^?_hA%5U5PSr%xcc(wTM(bbi(nA0a<&eb9#-nG-fzNR= zk|{J0@+Cfj~mYqx-{3^yRAA6{l$AFV27KW%!;ZJy}Gg__4Zgt(Ub&x?_*5 zBh~H+%RgWrSNq~NfG4E}UjA(6Dw)2{Q}iz9hwM%4Y}=<>-qoghdFT7|Od0TRps!Fs zj8Tly^JkEz_<+k&-KvUtt&COuukVy4G6n%h98~y7BIh-G_0hy;P?(f0b;#to~ z^l%Pm+@8I8#zyrsfei7+t|1Gk0#91;Tw}vNBsfHWS!Paz_xj%LUd6(FF6Z}to+4-W z7`LMD=U+Yyz2}aQ#oLUz+4%FS2*{R4RxF~RrB9onVoW=p>BnaOHm4>NLk#eLBsNz* ziTo~i*fu0~ckR7=ReKZZbyF`mLk=f~aFyOVl&tSKeb@RcZX*}{=COu&#*127uMwS1 zrwR$MdDKQi17R;mF9$(b$<{W|Yz}#HP=FG|n?29B4k{kmXOU`HFTs~8<0%IEi*rYX zUoJ{22Ul4Ih5d*f2Z+HElNJUPGJ*2p|3Rkc3M~3O1DP?SOlw1Gd&D;HD0SZK$`Zj8 zniHSP;}b5OsJy|+J}yPeIx-FkhKqyEBmQHwzTCSV-0S(mj8*T&JVDeVRsVRjZEJv6VYJ9mb; zz>JS#C?ur9Vbec0(!FV2!cJWy$1!CS~c(?AI2sxx;tZ-}tEW z$+lvH*Pk~-Eb(M9l~nlDEo(pPOFbRK6Qu}5>)FyZNkgJcuehzhvX7jAKfN!QByv$t zv>cg}V#JF#oA5C7M1Q*ea$TL-*w?g+D2=~faK{bL^Vv8vDOq`J>m&c5ntiDluq|xd zY4_7y{-kyE_}$3wDtIFmS;#iE2+v~LY`dijq-^4m=i>&aqMRt4zxts34}Azs-p_Y^ z>$ew8_xfUGF}y<3YIelru)OSfpu_RCVeQaqnzSDwq+&i(ypB<&Hozx`D(zMDp>fL} znZBiiamrH{`3Xs|0v6&s>%SS@$p~m|AG`D2cJsU2yS+NGYAby&r?%gcgptVVp!E>( zJ>c5e!n>vK0=lu=%GJHQ(3X9Oe&&<_0Z&vg8io38>=pV|D6CzO$bk|WM1tXS(_*r- zo)%FEW22CtdO>L&X{D0E!Dkz`fT&JGRm5vLG7`gb;#E^3$tgny?B#?`n#-6kf$~(@Kf4?(O2J8dLQb#7^ zXa@yARl1iLPv~H+GH+H;{!Iujf?7hr#ahwSM*pF0ih!OaOkklPMMRkm_BG9jSRnBw zxab2WTspl6MKBx=Z3*X@ZNyYBrWc06$Jh-rPBg+_{vQKgT=^&L;d*acND5lh{0ij zT3-luxlYtjlAiYRWvYmnCv(lp>>pU>%~PFD0f3zcj{x%b%RixQIg&->06AynM+U$6 zpPA&^tlTb*jO=X6JwAF|E+=3c(I=<68G0&0;F5{fpWMzPF`c%c|x_KCMt zH-F^zSyW>IhcXU%_Zxs|(Uo`GcFnG`b%YEI3=ZS#x2V(8qd3!>5wQgGce+6hLm7KA zh$XYD$XYyQ8arg<{e)4Furb=vKsTt3duKn>0kE4Mr5lu%lS4P(?&RU%u=joC3Ny2D z;(u0uQVR2rA(nPW7O(-wC;n~oS6#Y6W7pLcLNfJTqjPf3;^PKYid%lCVMBgS-ymM{ zvqMqi#-e8=G=ARQ57Ub$RL1GKW?or4!vx0C=)8604&Muh@ES#r8AGk!AHshcx!3-l zFR&49>iJWkaYN>Fo9^KwH(TMpwbwy4PT5yh+5vsc2zJ6mKOZ6RUeL>z`sgyCd%D!b zt8EWox8RE7ka{Dgj)RCH-$BGu09!{1zIM=#dqsg>Fa!2QqtMS?-!txN+{wM7N8Zkf zY#rE8toS7r#OKGz^$%kTTA5JU0gj$4j?UQ|?4;VO9*2zG4HQRp^VRR)TOX6&mQ(fW zJ34O=?clDp{wZKM%=Dc9!rLK<27Ue>FmZSNO`3{i@qCS7`;4`H_*5X!#z=r5r;^?RW;7~)@ z^XB~ZntUIJJM@j9u)Y^8iMBiw7+CXZqU=w*>QE8zs-iReM#vVh8hrcvx5o#-!d%w7 zjgmO*qUY{YAYq}n*J}qi1st^K7@kjPGxU?mXm0wZ#R_=telqJfgNyOSf`Q-`p~u>M zy&&PK*ucKg(VR5z8Bj=Y@K49ou(q`MV4%|63d~YzM=RdQ3{1;RrO?OW(D@Ob30W?i z^N|SKk9U5UAbid^X`d_V1d9#c>$O3PSZGi%dBdik>ih94x-~}UZw?n}EKZS_Kk{6;&-I~M9u$l76J2cTv_J7VUI-A+t{+#7{h3Ps7$re~<=jjgJFOC>YojY){I zNR`_B{j?7!UR6n3#7xO+K@0-T^)1ei7eUHWO$!MMQf=W2P$n8_12yKgqIajVnESJe zdp6ZCF1=G#cS|CUlZiD{!R)1W_T>irdw*)`@nh( z)T)VC8i;}NtA24VIR4A(p*F6`2Y}(|87xs*BvL2nAvW4kyAn!~Vx*+Bxp(lufdd5p~j(co(seGnjj_VCW!6 zFzza5>aAzS)fRmD7Ok0b%Oc+B4vth&g@sUzrW1~dXX5NE{c`{4Qros zvV=~CC}HC78h-jv)(+Hk9eE$xtgN*H=}2R2)z78gpaImG_%rs}24cS3$*a(l=-p##k|bz}P24d;jfeojo@&e3g$4W8FPLIxYalIgmXNRR?{ckFuvO3H^J<$7^ z$h?GA&ieS38})R$Y%j>3bV=%aX34mIjJr@dHBEHsaY4t#8kr@dXS(zbwnuJfE#y>| z=^8a)0-wSvy=fDmdB^3>uNBNlZDAu z07HgyEn$VyV11Frjkq%Xjxrg_0+1fML7%cRXmq=a**P&NUody&ktKsMz^v-vcx{A4 zk%s?e9brLBv?o?&X+$dZAJJ8=BRyOu^|PSF2sUT7Ul7&@L*r&AH8xKZRw;+%HPw}m zU!MK(w>l&+%GBr5fhLQr>QGz1*_ITEp*2hr5!S>p=b1FK9I)UKIXSLGqNHN4F9I#P zY1oDSiICd%Zf`uo?ry%G&1-1XNm|QB4f>eoSyAs-X^}e&TI|e%S zXUGP5t2-U-J*pGbgNb=qIUXo0dn}du-^9sWsq#Q$k}Z53@1$Dq9Sx$6_`L8}_M7#t zAK%RssCHQh*PbUE$^}R5h%-R-41(dxY%w(-JzSI@-$(Mzv3lL@Z>}x0^%UGkua_jLD3^v(sk-SwPLurY#(%~eUY zakX(*$yXBtyT-G3fJt)M-g5@`Fi8t5OFCo$^a}Y3$;w#n%!ER48+-ABbAdFy270FK zBC25PtVe0dc9G2BOk5wfU-SJBwxX+3eX^?MJtRxCxpiT}3|1wdzu z8yZKooutsaOwslfpcvbT!OZ_8J@+nfE}#-l2_NU#beJyE{P<<}{mzA|<*L(nsFPe+ zw~3u3c|4t%FpEe9p4WBMr}vKpm3?wcQSAdqwX#3`x>GDNBF4ucjxD!NxFZ zVfzoY99}X^oAE*J#3$r$m&86;KvK9C4dn1#n}<3iOa|S#o-(3-F8>GM&~P>%e)4*K z1iXCWT~s$kqKnGLyyUI)XCB zl$1_5GX4$gWC7!}C}d!v2kAOab)580L%#W@1Ev!~@tqGA(s~>ZF1l*A5XMVRCxhQC z(qemiybCscNTW-#k2?7k!b+r=LMQEUXe}=`QkY5Ee(>C3utA-8GrLN;n$Y2}5gVx< z2jOy2*2|-z^$uv~xR%eCDo>w}#9Zkcln_4+m-62AXbhF2gw-E2>+7rR?kV{%d97xa znsPg9wrW2}3osSHLyX@rqi~vbq(>J0c==XW09^0>ol2`|LlRA!$cU$nh_j{Yre=? zt95t}FH|B_+{txxSQiz{m-rL{EI1K6*IVR{L7qHxjl-uFEWh1%xQqRK{pT-=m#+q| zi~Qc-8Tod6KPW_4v&l=9ZRp^J*9KpckJ2s!5w&+={Q+(_so&!+Z2k18P2$x^t%Y-% zwi3M+9(s3lFbkN-ues2sS_`09!{sQhabl7oWW^(c?9H>z5Jos3wdHk<3%TaJ&0nK& z;vrIJL|u}d9PhOi+_u351V+du#%32B^%uUoc#dxG*T3jPO0-DlIXZjN-;egM(;rth zIbb@1@o|NU6+K)03|rr106h0Nn&Qk)eBhKfU2X0+CS`0#mL~NCvSETxZ~VIHBD#q7 z7V|Si!d+EamK@*XOZ^g36PPIl{n*x7Wv!xf59F~Q!kdA;Itvgf0*T}SBL}S~z`0%l zyB~yt|7POsg31;jr!y(>mgQ<-JIheICJ%ID9oIS~d$x#c~?_=N)n-1k=WO$ zugYMLmz_V6moHZ-QmTB`0T92-g7K&l;jx_{vgd^617GhQd}Kftln^litJ?b>=f&IR zhG#Pbq@54skGwhA*AX_W3$&(_`>0WqI8aC8DXi~GAG$QNupdQ8pTtl}crjgC4(*MU zpi_<92oo?ZTUWHFZ(_5X%IFMEz6OhY`~rOkwiS8)m8=^=x?Cda>9$c0d>`a&Gczd^ z4-C&^L}BoHRmyU$Nzf|p1wCKlwrh@V?mF0>K2{Tc_w~W*?T)v%Ooi2s7$O0F93IqR z2XY&8Qi5lle1}NX!l(RIRNz(OV1;GUZHHbC_5CP`8kY@4L;Bc7g*3t0Ju#BUG{Ph@ zV}A^XE~8jSwjE|mN#>+iiUm=X@F^8iMzD@&PodLSoX^+rloH&sI39;XX1tRIb*a_7 z%*$Tvs*h<9-=+jFRg2-)wUGRsqIH&^_>8CsWqkU#p(Sq=LG1XeiLEM5KBlMWuVBnq zuWp~J!AC`QflbPOw{Z%OTeB>Si~bPpDn1@B{9yE;t|&fN0JF}YGJ=_2qd%>)VMgzB z_rRluQ)CV4o659Q+gPX{U(VG`awCeLxOls@Fg2!?D25kIQ5jy+A>qALgUo{r0D}0= zS5u%;pQGo8@`fJpm9FUuQ=s|;*lk%3Zop8)lk7Fk7~XB#M0G-!+-``@0}7*k|HeWp zaJ6?kS6Q9}7xWC|W5=V}G^P@(nxwi$N#=3Fo+w)9vu0gvZB5@90gMoFwweagh=vab#7`imBEjzy+bmA>y`PelF%O4^#>R>yl^V?q{MnmdPOqc zOLx$y9Zg6ctfoJCyt+;^v2(O9Q#Xh4ik)#@)Ak0Kk(m(gHqcdq9MPe?DEsF4II|!z ze*?TXt#1Rp6huvmbY|4#SY*oLU1 zwE}y9;*bQUYY!!7YcVLk^JV#agmG6&LyVELQ^}riX7WmR!M^ej$;S&~|e7&;DItkpy@Uk5!T#7usJO)s4bC zlxH^?6by&O2XZFRoDsZLjz?US`O<5VPF9tEEBB<6Y{Zb`WHi-sZ&s_;OW zGSe58A#;2a4PgZjcrFi_!E~8}ZY>jv{22l8uacz^I@6jL&KP44me&io_F7w%V;|>D zgFhjk=v#hMTPwF5MEg6S)zR761TVhxnF+$UWYdD_*=pjo27h@z1b(QJenyeBLfvQ$ zuI3eLi@rzy7Qd5h`2!{MONa{TQ;HpJN5VdMsdaSJE7b|7h)ugGX~{kqU{6g^s)`fAVlt zWvTHC{HypOnC*4)=T{EKithZt|LqeOx#?t0sR~Mp0dNYOWqc=2BC;qJ%*kSeXwjdB zQp8?F!2k9FcoR3Jnu4(L?s%OAl>XfK?xlic;0iwG43f^Yf%Z;|_d1S(=wF0Nm|0Nf zRtt3U;8yndPRsumO;ZQVp0f8&itE{m$s@*4)PM6Tryf~n&A+F^5p6n_rLoEVP>}Qk z+lPOPn2--7m;3ej?ZcEcrzBy)o_VcCRLp-UCSe9Hr6kKMZ^)v<@R~`U&cHdUWe*zo z5C7K9{Ed}vRn7`4W;s7D=rzIYRpc4y6sdxdi)piH{TEOtCUvd7mK~dvM3$a)-P?B> zfovW3mw)|S^P+D`ogLKRN9N8})&B?hXkZ69H1e?HuE*nS z_0H)H(1BEGX_K|cPt1BjZ&SID8Xd*Cv9mu58!4lZVKEq={ch{EA2`b~OcO44<&=>z zmS0J@_bly4_)1!;Y6D!(dGI=0PmuXxo94pA~y(v)*KOe z3e<@tRUt6w&xEk|x8;q-JN@-dqwDKL)96jz;(?^|OZISz_miS(5h^?HLJkm@OQnQ!VlGibhzEbwPUa1(HNCZW1 zG!RZagPy5LxU#Vr4dXXxnPIq7izUgjTBV_vbE03_;~1+j>QhiZt{EcwYcvt}cg+G* zk#L`r6`DKNnOhP-Xe#1!F_1j*g(7;{WLFLxsK-*xkKv@3wT}C6NI%Z$1j>5bz8UU( zfpLi6LJCSaSyrzTRvC)xQZ5ASHfjm0SKF)Rv~{Upc-rWZ*_ZV$w_fD+A>N!>Q-aFP zQg3vQkI++Y(?!mgh~W3kjfx|KmW#xgU=z_7fvm81cTc&~ZJLe9bCF>o;ToZin;y&hQ~SSfTMd z7*{}i-&4A^wD>{0VKfo)u#)6uNeV2}Mj&o*;7(Jcv5wYXV2!+IIlZv_iAi-|6nu7fdd&Rr{J`i`n zCV#kH*#|VIybVs@*BZ{nYZHPhZ7P)JCdH7?k-YA;Z__VEZDNe02`Ts@R)1d0Nq;dC zy>1G4UP9Nc6O^C{_gqWN;*9ht2(^h;PtsNyin0jp{&kf_K)6Bf?x;n#BXoqzFcm2Q zbNcg}I&C^Wmy#&^zo5P?3HOb$zq>kH zeaPZ3&A{u`ooHXO1s^}X<+NHME{zG zrR-5mXLbM5uhlXGOpjf~D3Y*dw5b}jrz;r*~ za-^Q#%iw~=_GqoJ=B@3qo@CLj^G$tLN48j!2ZBRVBlMYxS+FmSX* zBG?CFym~q{M=Jx_|DtVxQ%0uK&x7pk#Cu4Ta3~WT@Ck+{e8pNsfA5c$)y!`_d!t?z zstZ4LgTm=Qg?91*)fCSizck+?(#?wTzs@c%8p>wjmN^XXcc4mgnDybg5I-bgG@5!R zqW_AfybBUDJvrdf@D{T$9^2H>yaRu01;_;kIaIwRYH1sZ!3dOba~Ld^Rs*ffec4Bv zR!^95&gu4t^NfNj}6J@-slo2ZrEXh{)M90s(wz$D*5ER`WN`i z7<~e*B)pdJ&{5B@8qN2NjoBOyjrlURx?C{O;Eod2*4g6?!nliEuE}e)a3--lh6IUZ z)=jz9`9gNpmE$wq4c+SMpLu?A8AUE}i*`pST#9->772XM%K7;r2I{C|5g*hm@6t%P zsY=kR0vLr7jAn)r&*!C?0DKmRFFZGA_YTWdcp_`xmq;TD=%R5ACt4!2ew2}Iv9cpK zwWO6wlKXJvo@&JAP$}y|yO`)_n+x(JnXH_2-W&W_&K);)g)PZ3jEDZYC6hiRSW_Hq z-j*1%inZtW!7e7^3-+2FR6RUrexA_vW-Ij*W}nt&_8=Y~B^SZ>a6)AHeG(PbWciR% zmcrD$EkC*8+@kdj@j3Fdu&tAg)Ez0Jn(a>`)KbPGvWn19F;Vg<;`82=x~VGm2*D*! zZt1aX-amYTkd1D$XL>&3WUn={ONin5jzL3tIR@6z%YjP1#nPL?KZ+=eIeMtAi8L)P zZem~OlP!y4X-gTK8)$>Pj1*TK1G4A9oZdO_@Dzc)V; zNjaSBLm&BA(;T5P1ANgRHmV3O@#+37T>Jl!>8XYOYH>;U$**yRj-}4|>n77MCoQ+E z#kzi-WM@`Q@x$rt#)Y4XgClKyZ8(pL#rpfSeBDZg4TX_o**4MwMOD58W$r81JhmEEV3> zM*oVgBb1BBCKI&+wElK~AsNUfDHx9Cu-kr7^=DTW7JHG+Vh3Yz+Z6{3Hm|-0ualRX z@v0r6GZk(1VOyJa8DW$o2Ewi*OQh6?nCyWNUeZ?*#5O`n^$mL3-`NP=EQS$^4S-x$ z?_j_6H5YR8|CHO8RThm4cY&sg5Z^~;%c>Arc@4h1Tq7zrQVyqr;1kw=7(g&@G)N!G zE3Zd$YT@m>&T6z0s=jyt5s-8@<1f0PPbL9Gp*RFNJZUGqiXA0K0O|Q(NGMfExiBt# z*VZAQ<0YR1{e=m)>7B)f(G9IVAXyj~M);o17(jU_`=sQ-`Hj|272@-Y7c)AD&qM?K z=uC7*5Vu#$*lU?Uoxk>La%)M2c=2`lhSZ<>tX>OWWQa4}$i*~2BiYF1aWhNz zD!D=X*2qWC%O(9mAa^Y~`tkjATI97w&cRt12Z8*6=Y32dx|9^keamNKy*vkTygn>> zV3ckn)vk)pys8{!l~k`KC0d;U@sB(z(T&c3+Rf%q*`>^r_6+@NOW^eitEApmLI*-Z zncC{S538=Qz%Rc_uOu6YnIGY(C@x4n`vRu#GWHvUUJ;#j*G6Wd!Nf` za@WZ{wKm6oJL%6(GX+wE3xj(=VA@0(Ip)NOLO5NsR|;!pycEa zqu>TYVShvJp2#lM(O;--l?r5+vW$p;;gbO;2aDF(#>QfM8!7wscq8j5*7vnu)w(A1 z3Au|`RB1M<<@lO;nG)fp+U;pcD z7KG*#Zl<0Ta+oRi&n+l1lwmvKn(2dL3VAixUL52#y_g?{yO!V*m5iXhWZET4Jw~zg zg6!i;A0b;Ea1?pkj2=4d*I6OgCUs-|NbC6US1;I@+TNnSl?2Gh2bRn>tl>15w_CQ5 z1+}&(Rdn!AWR6mmk*RZv4O!}tIdM>IFKQM2UgLN8$)%_}@NskJV+}b&Af~XW|CR4E z&cvG8juu5tc#$RH4HeSv_Hsjt^{$&^sqx5@yA7B8PwEl<$B2 zSBz!zH~E#0=wG@W`5%Z85@q3L4^6UVa`eA}6uy5sl9!wq^3O6$!;E@h?b@(jN}hZ; z&zx$6iNazK`1O=T`U3P>Yrg(jL0<&4aT}Q4>5|m*>;&ffG~nw6&9U+$fw=bL>*~Rp zR5nYg*S4>bf0<3f_xd8_sUjP|ii8hS3`&~}{YHGJVigm6_yLgr;rm5becVaEg8SFe zU(t$Kk+7{Bvix#*Wbaf?ANF%CNt?m;)PP_r@eW-ao-ijS?5P`&aTM!F&a zD{{kxz@0v%l-Q|n`Kqqv)NnCWvHpV9$^2LZX4>FnQ~f4MFybC-$FsliyTa3lA3k@g ze+SG;W7F&4VhH^$Py#8w{DcD+Bg+J$u@J`ie+%UDL$NExGlAy!(405~0~{m8k2bnP zIa)ZsK=b&3G6amP4=(a%PBVo|qu>g<878zV!Gz%vV6>Z0)ulkY9>(R`|C@>y2Gyb* z6J|gU_fo`Rp(}mVRzCQe4b@)cW%*1`aVb%2NcMLS3ZN(j+B2`K4VV7T)V-sG)d|9( z(n^c*82<0^9|7cM0CvqO(CQl=;%&%c6L+h{%=L2rTcpS6ps#@ntVy$?VgnB%jhU(s zU*U6d(@jo7RqivXqviR~ud_l4h?0|mku*@on@%%@)S26FfnO|Uf3wUt1gr`1$SA&d zI8Q5gma;=rMY{D+Hw8G)?y>~GIi%@1YfT?|x=eL?%6`@owMdyfJqVj#dp)^Tn==zi zMbO0^)%xDStYaAzBW*d#kJnKzC5ms~dg`8rzD71z@8GPaIat$Sz3 zfyMyyRX5+2pS^?4%iCQ9BkBG8cKiDJ2E4C&G-(}=9~j8j1vMLx1;FRXsoc1q&Y;3o zs+kj$t3{voaiP-48iJ}jXp-*P)_DZ*^m5M3M(PHJH?Pn02I9@68d3H!5VH}_s4lqa zNh-P9JYBT=N^o-dCtKad7=smlW2A2~s4$|KE@G2ZPyl62Yh|?Q=2Z>M8emF2w6pKP zXbuD4ingq9kKej=`7iZQCEQkB=ebbFG3XD8g3x-x?ef~;lG)IgW+)n^%=A);1|p3H zMe8+OjG3|9ja5X+g4?3&ik#KWQ~S6#r>WK)B7{v(n^)x~?ZcwxRW7;`t$xSQ=*zHe zTAo}x_MT}*rizSS^g+t|c?~?x8IuhLIU=JhXLyCQcc$AZoT(jHIP%&@P1vmCEEJvF zna-_2t-5cuJ!k8rE=@)+4$*-nCy{-;_7dO-jAS~Cg?e;qpi06Qkco*N(VFXCkxApT zrrZYx4dg!?N-k&0+V}L@%-WsXW^LXi9uTb{vgq%0X&*s(-mK3m6)TSBd|Cfsx8DoW zuHUG6?z*LA7Vx0OI(J|3)JJ7(26g$!L9a(5*d0T0v0igB=R~p+OUK1?PDTMSz{O2QW)-r zcAm;=bD~*fhxbd_bSQd$i>>z9Uhk>)Y-vIL3kVJ-8K=ktj$Ildot+|}g>hfaL_T&b zPhg)Mgcas)b+P)F+_+cWsfppkD+vvlv3bp)U)UvKGOSQ66NE$qe`D1>y%|ttZKcq> zxTl3J9J=p1`XJE`wN?HCX#MWsc&eE{eE>aPk8wWrsu`W#f9_0tsNER@U0x*E%pSml z%breUy%9o z99lR#Enk($Hfp-au|2^E@HG#pF_qq%a0d#8hPb;#>jqH+aEt)a($3fR(d{$!j;y?# zwt#ak>T1G1T5cIu>^pmucx+$oGA8B{_6*NVSQ`rXNf!)lY_8kSWqg>(%OEwGQ0JGF zWb|^ex-`EShcK0nc#WL>o@4t#9gp%ldE($+h%{u^ozcK`H)JhQJm5J=#A>49Gf-c0 zdmjh0PhmCQ#W;!ro+=M!H!5|(d5!_Sf|$T|H)$*@U1lq%W&IX!f_Iwfa?-2;Ub>j& z1S-_piA{QCCaU}6Xzi94{H+lBBnk07>YX&vtSun91Cf<*+{lwkeS&^ErLX7nE*=rf zY->&pNi$b8uzBfXOn4YhlOWT*o|r#Jf(!b1p{A00d`&vCX9`Afv}T|s{P%s(hV^tj z+NzzO0blchMQ4%d-mez+0YBNbwJcJh!)!XHY?AVbXZ|q-S;Onv=;6B`1yzK0+Cg2h zF|oRCsh7oXe@_01zPTAi2>UGYnSE(K~XQ0i6M?$JSu~7E21|X^he>;Sy|ooTkNg6iR&+&46V$^Ok$0mJTzCqr*FC(9G?31 z8}!CEoL3K5L_t~)x0q!AtW1!3ZbJOo@y1-p)e=#ll;Em7dbpgPO|mydt{^#WwQ%q= zG^PO_ZT!&d5|2+^n=Pe8p547~2!YW{EImngt$kqqBwn&`W;Amx5ralYhyRohX^BKm zciMXxfLUHP>s8+7mUN*9EVgN6=>ljxYv*FZ6CQj0Gf!{p0?2*WS+K7rsb7xdeSi5u zY?b`4DO3RI(2P*gn)pb!AyP%1W}E#~P_or}CDj?#iLVg%XoM)lI~5v3!?T5>gG>6U z5xSrIeW|*5b4ux>izwpn zH8R(en~O1#rkP#QE*o>U*#^1Hi8pG72qLn45xCBSC-T{FKw&&$mHY;)IBrpEiRo$! z6r3R{i+zFP^Z|FB1$=9-J#)Pt}NfWt+ku`)O<4eoT5`cO_QN5t< zZ@ZS#Y?3mBTF8A?yEpP=@zi#U;qp(?>BdBx1J>hWD|g9Ko`n6^RBmM%hf zvHLWkgVb=nJGA0}g(EYuclpVs0@=cmrz_QvgN^~iM0OEm@&Na>0id%%q)sk+$DP%X zv=mt!wO8QUOox0$+@C2Y_pU^5#KgOt^6;nS)l5dEjL=-LqMtI+CIb|ggH$JXGZF&{(sw01(G9GWXm8J={yO|5>Cu-@pjv_ zvOQ;BA8K0W9;NH`4DsIY&o#zdnmwqKB0sSw!*=>`pCt!~ZXm3+reXEJhsl*Tp$*+j zms?4!5Li~;^vl)6Es2+3zu~ihiP66XOaPYiy@nmzKS0m=eGbD@=t%xV@VGm#k&lQ0 zE}+a9l~p#&Ew7#Hd)Md$$rgVA;6p_(@)GGtZl({z=aGb%GMHDy3YnNbM3+*Q;IdnH zN>?31R{xag@VAwO3&Yj=zUU`nJkyWkOaODCH}n|Rm332jSrdQ;=q#~FLzYK+%&x+R zZY60;ygu^WzHVwy!#-^%)}_x?H#1lHqVXTXp1ja{^{zR6j^=@uMa*j#SeAOqC3~CF z)UiQNnA{5k`H=jz&YjA^UrM|CY2C=#;?Mk?pNkp6X0y|YA@slPWux*v5=}!h^Vs}H z@dGSJOpbnY1k8oFpo(I;G8$yqrM@VcIx2owB&NO+vgU!in?82GWIqg&D0v%0y%qyp zPzE^{RIaRf@o~NoI1G2Ak<zw)eUH;mTv(nJmLZA6pd5DI0deTQ-&j|72$dmdTdS)4iCY zJ z7SIB^xANkI@&q$kgx*{)zjwE;;v4Qv>u&om1!7pN#jP`xWq-UTBWviM&@unUZ%))E zu$qt<9X6{bQMWgKJ6Xmm8N5TK#Ls!{(~v&)B=NVxFE|~r@bVA4KS@AYi-w4D{rOJekYH463QhWDe{IjE`QC|% zhW|#$d_B?px6;f#ABQmk%D-C@A#*YC^TK4z?6m|l`a=!FC^OerIJLuu4IPwW5GTb4F06<+h<+7g~1HoLB#KSCg7559Jc?rR`B$2Z*yIVa>#ur z>dksV=e0e=4;oB(yd~3;ed>B?xk(Wc;W71uzYpfAYydNtN+R_VZDUFBzh)nGooy>j zU#W8PK4Hqfp-7M=`+7M{w0-;hkx-TDGga8ZWn%ceqEQl>m&yPi4|IeM5Q_9}(EPii z$uvo!D@ED`1hIV}4+Mn;VCL9i6!7)D7u0xz?%I7$rd%ZQz9Dq%hXVQ+(L73d!RqoD z%M%rAq*1IIw*CH;F~dVqVu7Dl&F zeHfl5%=qcR%GDQbDy>^9>De1W+iV-OeV2JU;ka?z`!pakRK;LovO}QcxcYKl?64)4}x#ePl$T69Tqg-jZj96gt21Vh*$Tt@yx7v~FhU-#BPg`^*Y4$~N!n zxpY0RSd&`0N0w^zk|&~7aaxv%f~Ov1%;m|^@7$dwHDA=jyQu$)&ZHp))*zDbOFo{= zqj~MyC?Q0N^LtI=N74H)*J^J+8gX~d~R3k>sS%hVu?QlLeRmR36-6-KIG$3<@ z-$2vJ*vz=Y-6P1P;c2IJYH>HK!#DTWb+K|}eOxtH}ZYfoJ32)+D zFGvwQf?Tmme+sExsP*ZFY@@0OncUxr{ljma{pY7AB?lC2_Daqk8UfaQt5Sr??2!l+ zFzb+dPE(`tD1>Xpv<&fl$iD%5LF}t^;bHg-p@fDM5f&lj7-iKj1D^S|w7q#P!p_fl z8q#3>4BVM1HYi}G76i_`oUv*O&$qtW=CR^=R{Zo4^q~eaM7VsW!ow26!W-Mzg^(b+ zu#PH4{3yV^8CUzEWSR<9aj#x>r!=slFzUKSg{U_u;t(bOmMNNWd^zv7bz-UTUh<-5 z#Jfn~XI%m$IfzU9b@Pb*W1@kRV{WOI`KYUo(Cg{pB169nK7E!oPRU!_W%uf*?v!`- zEI7;G9x&$s;nK%{it$Wt9uI~4jv4lCOoafBS;@5kS1<7HbeMK@6xr=y$lYA5G{jk?HnCeDVnU#tAA3h@hZ6^EjI*cBF$;QjQyNS?T z+x}gJ4cD&>Q<{)6)+`t15&1j30A6aubV*?N45HGh|6nIYis84()O$A!iYpC&o|NMJ+GB7HLBKJ*X+2=x^M;RJeQ*g^L`ZT$s(LlYigVPyiP|6jY- z{eSs#c&|p~p{Hbw++ju7X)oq#e}{e$oZ?$ocAC&VtZxy|>}2l!y-)5IR-f~sY?Bq! zflIBH3a4(a=+o+bEQ&`8asRr3o{xHv3B>vY!X7pI*Hv^Y&^^rE_RRio+)4j+C;daR z0+!r;EJiL)#edyj|N7?XCpZ>XAL0LFGTI2ZzBbFu{6|3>@-U^()}5Mw48WEROn>bj z#thM_aQVb=w_?P=n!Hqj#KlT+#8AXBk68JkYA~Ead{^TuebWSQ*zNo7swyoP+IW%j zzVm)rqoWmO23N+=d#g%;0Qe56pYR?fo}VagLZ^m)n*~tEDe$<=?J}7dGoOeophg<< z;5!f9#jhZrMzmDH4@dp^>;2|f9}g~(@|gB5uP*=%yWX4Vl~*13(% zI<-=$b#q<)*S;lb?cLaz6BT+!RI>2+Fe%HyXCGlx-C(A2LbfVWn*1)ZpA9oxHo_Rw z)BXb2FP&Uun@NpWnfLOo*)$9Jdo?{J&Z!kso+`U&T~Kj=v0f~>cLOxt+{7vNR`&Uw z|1mOx!YKNG(e~D1QGU<=umuPb3Ift9yL2ca(jB{WcjuCVbQ*xtuq;bScXuNw zy&xb+zdxVv_xHzhUC(np|FPWnea@UYbIzHW^O~6ze z7)uxMr1N)wZlY1$emKe6E1F;yZx2JbjMW_}NexDiG8$gZ3yOZAo=j@as`H9#MTaUH zQk~Ju%#^M4h&^qK`cGrYIH#KthS)pjbqIL7PQv_!Pm{$7OK(~^n~JnS)5E!0nX&y# zqB^&&#k~7DdtBW7ADpY`=gy(&=%WOMc?@RA%HR=Sr)x0$VagvFm_sZ)&`luPS&y^p zoe0n&Z!aL!N$9lIrw#$m*dvvL(?#^g#xMF#@2qGW#TnZaN+3-1 z?3`D(q$2wL`{)gx-R?h8nHwoG?_D|2Mh=x(W4xYE55w`zWBl#D4SdY=ces)qn)*1-4E+-^q{4g& zRgBk=^#_(ENM3kZBigwMn&vqFi;FD?8#5?b7+p$yF zlBZ^w1=BY)vDn>7Q_jwb4KG6p1k^W{S3eD8J4dtjJ`sL>W#cDcVx5fpt!FA~QwQb^ z@-pMy{2gpre6n!UPIxk{m~=;ldWtCGWZMQ`s+VCXyl-RABt0uWvZ=hZG({1j!RKS{ zvv*)6`qlrB@IM@_ucEqIb{RG+%o_6p8?4WGu*u_)gt zT_^slyVWgM+1%@D<=&3eQI`eo_VY1xsf(z!QU`-9zN#HT#P>bz;JuKd==o-*n=&jaOzEVXxG+W+Fw}?;i)MLX^ zy_75e&e^k_#fb5DX{vbw4YX~=h*Uv4FW2Qyo!*GBp*#Vm8k+nN<(r2x0`VE}Ps-9- zY+&t`eL;}mBXvTeKYN7{aKYd~hc87y&)ORzT5`_(f^D=|`3%*>@!TQcrN#TbM!#Zf z{kn`^kO=1y@eGu$5L#B&1+3VZBpd=%5nF(y=2#{Dww)#Kr*Xv>kkj z$^sR9YC7UAl8R~j%lWuekY$NMQWm9xR>Hg%MeX-6V|GLq@6s+asj1>JmtE~XwhyhU ziEQhlDw_fRo@Z!Qoh*i0k*Ju>{V;s=p?JUiAkso}ETdshgD~cy2f{3C@CclE3M`T* ze-;TQ11iYfg?{5z3;LC+^bC4^O(f#!Sy@So<`7!O_2AX_C$1fp8J+8G#t&`Td(f0j zvakL6pt~AlR|3%T?v*p0ux+Y;Iipkyr)-AA|4g=k730NdtBZGWs#{Q%bI}u5a*e`X zTCIMo{PCr@A&9(3f4~8^f!$gBW1JBT|5qVsVy?YmfHzh__89@yab zn|sqHD|c}Mh|>x#EJgy8`YexBxi%3QeA9auQ^Je9cbw+IHeos#wB=&#TWc@&v8+eF zvW;b_$-YJy!!{MGxl9M$tSD9zW14WL>dm$O?DoN`aH5^nP}`0WqV_7QX9|<(Pi?{R zSX>!DZ4-5-J@)cgwwvl+1Z460(j~{*;&aJMJ#1(g-3p?Y4o*rVA>g2%148Eph|<1E zc!z1i7!=%S_HHQWN#sa35S)BFFQP$*PVtY&o54D=w<6(>eJapyG7wzZKYPhro14vq zCTH$SMJZOWV!_3r-u^D{^}TF&UxBc!G!A>zl8E82DvG0G;dSXj;}*1D^O*#dd##tC zF{}$`#iM`JaL?JwaqE%`8Tf5-`^!9R_3%r*2ChP zpkZ+_Ad!6y1ZqzuWeD+eQF*$5?7xmRqZKS(+cs`Y^RYado=>vcTb?$xCmU0?VWpxz z&>e&6)4dQnC!mQ4rxP%hn@{ma*QCUKdMq;o$PuAR(U*TFKyKT4_?KB_(%JBycTV4% zS)>I*e}`}H$AwwVx_5l9S?4hB3U(K=1baNF%6;?yQehq=al|71yYjIi*#GOJFOP}ITU03Y2j0OSO8mD{ z^D#>-cNH}BZTKzxq4@uLpy?cg05GWQqLAR!|JIz^RMMVEeE-SHe*YF!<<9>gAQ?x2 zL7-~bN)&>V_TQi2=Cl|YG)#0*9^GR2Zyig^ZEYOyeG~N{Yi}n1rtJm;etlgcFV1+<;Js+(~Uer=+Wv$r1`kSm0{NC;m zh2%sukWH+!LCKk;Z61hwIs_#q!v5&HzqK}GDyXW*%o>-q%2ytGF+{<;+`i zk+U2f|Elkq_~7SnX4SSOc?`cW);ZTLD}oYx0%KabA8DFHP}SRmSzc#hgf%ZPzp8lU%nQ)XO6PNh@>Aca?Y<4@oPnxeiPEE(jlTBlb*HO7 zu|+y~-axzIOfwpv_EEf-8^noL0)m{=N=iF1SdQZXyU`#0>cDzH@tL5^%(-4wu7f!vB}s$(?&+YaX^oN#UhLi4a!9tS^9F*gQW-2j@T*i`$i{vLVsEt1o3KsbmP#Qg({{7QmwWRu5O z=Kd7ZDPwTQY{hIaCuIE7ibFj?JC);6JZIw74%=m0#I^IsfS2jpB zxQ=e+zl?G!%|>L}+Au!xdF`Z|<*>rnvJ`qfL`WaKqcs#FFhE9H`tuLBz7m_mx?j{| ztt3W=IUye|{bmK3LI6q7M-4(7y)W#Fn!2=@7_`wlO<~P)83!?IsY3{F@KBX{F^SXc*6RMGl`Q2B#alA zT{DFR&7*i)+}<+7#D~JkvJVI7UJWYKjq+n|z`2Bsyy!c6lGra5KC7iG@%eIMUvU5u zxS!;8Y7m^YW!*1dm}m;H%d;NtJ_W>JfnRW8xDyyYv$b|2kKj1u{cY_1p`uMy6j#|7 zoTlhh@lL*Lt@6(LqPKC+pKWx;L{|Q}j>|f*ZPk*>M?>4+nevjDWo&L8e2_+%6U##K( z^Q8p^jQ4OGI$6`RjDw=OxoGmUF}Q&Hefo+f}1|FoDZ`fIH&d2JzXTNk5z8k_)cqBur zSqIj-4WFDjIbJB)zc}4kRujr5w<5fs4ieWtb>^J5JFMlvxpo4$P5)lpz&4bazOjg{wGAvEY~=Lm^G8ai z1!$Y-kdM2!V}&Pk3$^F_bRAw5gYQ-7GYLT7iyge+PlXp%;x!f6sU(hKlqNdL5{)Yl z`Y`Ai^v1Y?U@(bJFf)4L%`T*oP+Qnpyd$_ps1?>W%8NdC*Ax(uBsYSYT{OOX1kCzQw(y5fS>|nK` zpD4*DE|^2L$gRhgqqtQF(iKA1gIjqb8_0Xpe8>N1KW4&^cIlfF;_uJ(8#c~Zg}&fD zGGpUYzU`^{vh^2wfUQb%`}>QO;CN)Yq97l2{-cU{WOiELE-W#TO-fhzNBt}9x~%Xz zv4CF<`$HZ>=V$<>JvN{3?t3z<+5BlUKB`IFk^5t_dh724U=4J8-SOr<-=!jfuXH*I zP4*+Fv`)1=LtvuWyt!Qb-1>+vl1faS37SCDf@y&|6&%ha~8FuMEn1H`7H{YPDnr z8y|ADpO|_etGd?Z&_qGeK~lptnNuL6Lry(AC{xAt{zbGSGCQN@B7yeG>KbazXHtdm z-}tHM2lcxWxZJ{ybB3b`lOk^b!Z5xaf}(sIZ0!FgshjzU)KG*qC_; zc=i6OyMK*@j(8DBUo2k$YTdDXWQcm|Qh5<;_AshRHES!j?D#C5or(DlpbZh`{XDGi zb~NdEZZ%c(Cm?Mp8&v-4QFrWywhA8+Qi_BufI^rDSAtWC7UISAz9>&lY7$+`^)`Mf zJ4=WB!?{v$Dt8B3={0Vz|GdS+2n(qms<*OIy_!5L=8Y4NUKabdT-+B9IemgV->;U> zA)K>Bmbs<2au%(1e({PpvE~|43&%+oukgep$2fmY9<#e=c)9zFtHsiHW0|`Xp5m)W zqFa=iwQ>|L^-BjYA+-=E3{S_ub$4pKG%IhwiBeQ{Jb()jud3^GmJt%{L1dJ z03n>Y$NA(Y^<@K6gVhwwQXMmfxrWUPzIQ8*SMOg3SZ7KXR;&A$bsQJOTYf%?@ejClIJ+|J&ava=O?`e{1W~fXrQj=bYTV<`Sc+ga4GtSU%#LA ze?@cM8rYo(ivRY@k{o>EQTXXJ%h7z~cT89>H$I^nv(MmMe`!lg`n3~{h1l8Bi*u^i z*%8H^;<|qh!aN*WGJ$msDScS`bpA{|=ODx3(sw5W%m}GMlU^gd-=$rH`bU!}U)8ac zQwb7JC}?!q&hYjJ1n#U?3#Vj9z^041pu5C8rflMn{Fo-?Pd+=XG`>Kl+JNVUAW>!; z<&zfp8fk`Pv0wGJtMfd{0d{%PeBASLDEyLCQrPDcgL3mic8xg$ge=Yg(xf)emxjT6;DD?lfQn<1)g$l#t2mz-VcDk5rbsD zOa95$QYo5yV)0)XjKm%DrtHRbMLQ2cEndzlD8(hj_ki<=|%29;bN=bR5KmO zYgP{q1ZJ7=uB{^=)V?F80dT#>pL%1PaxU3aUVP0Y>#rV*t#-ifakCCj$erBg*RMK` zV|C*ir}A1L>@KP)p)~07qnz3E6jF?H{XtPXxy9#eW3|M2wt@Vhmopd+nusFEU=5oH zX*pwJ)mvQVhZx9KXam_}bfiz06lMx!6{+X1BcND_%YAC;Z?VM;x$cL_($2017m%_` zcyTm$1SMZY<7q;stZ@Dl|*(Q z_Mx?!5QU<->t(a4#=un9^oZO4YQs;nna{*})u&M2+?JzRO8e!;SkfE)A;`SZLH&aD z@NaZ7vLiOI{fTg1xXyO}>d9=RLne9M=bXJ5c6AtO{q8pWV{t2G^HK%+$1uXkNC|v_vHfOvaltr9(?_n zwIcb1He-odHlyW(-lt*|JuXLJ;ZiwQED!rI6zAH=GLRS~F-F34)S>yHyna$1&b6^3 z8n&a@7h~wQoOnEzFdDouHDt%C1Hk}-za@1fn%+5=%5(_*QRw1@yB%zu!HkO!A^ZIr zmYb|ZLLb0X@(_-XCN(pvhw0!k<$`Q2tj7#9DrtKY*ubZG<|E0M=M<)_VeIFDTo}ay zeXM&h^O5kPWu8x&03?>xGa=imy+w7x6REpBT)`z{82NT38^)cR0GF`HMqY5gkQA|_ zm~@(Q_*FW!?z)dFJ9u9sk6hPFN!uv6}F zJe%Sw!c}9h=$;N(jQn& z(O%OfFDIiCq|O~q$$Vh~vwrKDpcIoDGqr}ct)k{m=iwX|*60T=4@EJv6SId?IiTf^ zXBsAou4~d(UP-B_4B-YbXL5yHINuOnBJZ@MQbUkGkaAt1j+M zqCZ?IlC)5-+tAJ{8lFuzdlk1C8!MkXHS}(4Kq4@ zlHW?txc918grgoR)jdzRU8EtgG-ph zt(zw%8B<{SPl22zDvNI7lxk;FOA3G5LBN@sDdtkgw_dni5A!Y}6QOo8bsuZkx2Enl zQK_J;%~&G(yCTQ8SqJuE&BVc9(hrQioO)K73<;ae18xg@;4$BSqOZ6~aI1iGnnLx6)B~znR7w##tN9@13)=Rc-y)=jNZ;k%OeGfV^1C{j(^Eqa2@% zO1`x!%i6j1MC|M4`pY+2_|cu9a?)L;N5M@qYPxkCDnen)at_D#tKDJ^Z?iTmatyoF z74L6F+2L(2V76WZa0)Qdq|dHr%B(6z!asZsYau%EeBQ%Anpel%&($EL`PIQm&mP~d zgN*Qxn{Gu9aL==u96XzF=LZ(~#NDF(SQ?i67u+KztHvLKM)cA@tsJn15l!W_Z*~Is zo}>S2zCFgj+Dpv+wq>pv?EE*p@f|3Zf&zdy+BeuWY;}uiCy2xoz?deFr*zD^etakp z@7$;2WR>T1rQdkmXz7JIdIR{|Z^>VajJ6%PY+p#%_h(f6_lpd?hMYotEEPM_q3zko zl@xhc%((kAH3Y( zwEz&LJi|{ci+*0Xu1fuG(ENbA~KR z8Cx5xq@6{c)`R}ZciKc?GyXFTk})7PWT3DyHCeuF)a2ZgE3S+U-!*A|$21LF3o$d8 z`-fPDioo%FE=f&;(1{@%Q(a-g*Tn!>Fn~adJd^$VwX{bOs}gF1Y;*m7p#3W&%FZKj8 zV!P#)KTtOHsvY=*8-7k(vq86EU4c<9{(?&AYTKhRyHW{Rz&_x>pC^g8i_YwNW-mL& z{*E;)V~MjeFJ52!Wk0zotlqlm))Wd)y%n;qU^-*h{?CR{m;jfd7+l4Cs)VGwufhM& z(=2@1*(xwzeuFfr8l1rDfxvh6P8PS|-ECv*=UINKph;{?CT-9%OJ=KYZM?-}m|IQa zal7fBHZ2S#=3x1;;YZ{EW3sZ2x2i|a z-Q{kQO3slw`(95z6ey}1Z_2tx(jQ%Y4Xv}tW(oTXt(!pqf!0yg|AyB6vn%H_2A%(y zKtFWnA`5Lg3Y=elaA6QJg75G{PiV+X(3+~uZu7p~%lB376dQ-%zsoBsnpRBM@oO|W z5664wkIwv(w0nkwKaW01PQFLl9Gz4k{}1@A{|n}IquPvVL03}&^Smb5U;_Wf4MGtJ?oX3}HOt%o3wI{M zgz9)GBnrUMUIHjI;Mbz}6LTk^5{n{GiKTU+#@NX5|7#JH{|harq%iV=-SD%g$Lpfy zUpFKHR2Be*VtBHaNHGvBOOd!HDU8SP@HZYx2U8v@th0JAhr1d8qc{!% zVRMu^4~%-Hq4&;M*>Sr#iiH^;*~(8js0~(Eub+0$SMgSj-~F^EbzNFHU_gdL_J2Y>!KJ>H{Js8! zMH9-I$i+c$+f0LGPy5SheKAUw1eZ-%$*%-&_o(**r-C>;X%J?c zx$1L8x)lb$NRApcyRng^cT+B1ZVx2v{P#1*8_k*=p?S?10n&YEx}% zTF7C$g=WJ4&=b$TcaW_sNUtS7P&jOBKe3zeZe^&pO7@P)_b?7JAh8n7b%hiHD2Jl6 z;u$4B51U5eug^~;8c1+t?Opv}Sr_-`qY06^uN#zkpvfNrHq`xP+H121w~Oc#9ZA^cq%^ZCFqMEvNroP}6zC|F=E{edWzgTbTJq!9f$m1jLN=+xd7XvLx1xT? z60k`eW&yger7|?#Yz}MYs`%S}iVsIVNy3)zF04Jq?COpALND`4b$>rxTG9UvTw=k= z(xp!p;G>54s|jS`r~O%+;QOqW!HjhCh=px7oA%EicGJBgRBs9RdN2_PdT*0MZ+*mI zMCCy7GsC?P*Brv2UpnV>Qy%Po?qy2nE@KxRaw6wb$AABI+v{otra;JT6pk=3s0_k?vLA6re)FbQ*BXH zU@;{)v8gC@mzNnOJ*QjVy|ZFDH>!OQ?%?OPphcIZ2Is84qMsYvsj}(pZ_5aR!XdyY zckUf)bYUYk;MMk1tMgU2A1Dw#(S|y8LCcvHwq(`EhzL~t{h68gz2^2e54=SnOuZ(E zh57zAH{r|Wi>;uD8B5UzhL#Mdvy{>wxIvKuI~Nf`m*;DWzI%xg`(+VAX9vZXe_Ywoj!c?XID5dDP7*@Bz~q*i~e$emjma%leQ#PaHR{p4`044JXP;}B5= z|0Fm*E#%RCGp4*zke6cbhPg7_oW)KhDcP+EbhsQh1G8wdWJV1nySrC2lihH0H5@VZ8LJ@*F5p{Q<0he75}{TwHlH!NqIeU` zNED(>XkQ=J_HMkqd=<2HCGwFblk!Z?`^w8X%wgbbL~e!Tc|y#>;8vciG2>I=W0DJS zQnwX8rwHU&`;3YvqOBqb+HwKp3m>@0w?}-?DeNPa$Zik`mk~5Jn_4<1mhEY}e`ZRZ zN(n7zk^Z3W_zrx!HwmMI6j4;$d`^YvgA5|M)2<_n+mda;vjQWUA88%=RKP1VlaP<0 ztMp?;N6Fv0Yg@`?!;I!`GNUf&nQqHIsPpPRAkR}Jt0HHB0QtPuG#V$)*Bs1i>`_Hq z*;Hq8!Cjq(e*3IDJ|0c?XD$v$n>F;Ac2g9EF7|mES=TrXo3HZFMB*`Kx$>*wT?6%5 zF%pe@yf3|;Ov|)KPFP|&EUcw$zpcUQjJPxGn2h5{U({ac3@vYJbj``dD_W4((}CoA zFz+X^^?TMxRM_z&S3?xvW}bHIoVRyY+P63Kx+zHO)KG`-hler*y%ZcD0}!MOCt?EL zpJ@a95&jA7n=^i>iHP=+o5_k@aIX1=$?TVGaR+N(>c%M1AtCH7Xt0w6NXWrB5`f+~d8Up!@9;%dYw2Tw9V z<h`(Gds2;&y6hcOO}Nluh~~opss&nH@?_ddF_BN{AjjOO+1^@t0Y>KP=mw3L0WLIDhH!QWs+7m10fh zptklyh!NGKLVS9Zu=+LUewGHDY*IUEFOc}vJM)Ps1NZ8gaOC(dW%Gu4vg#JmHF;5o zhK+Ix6uguxFUi#5A1W_&?v76n(o^MAjn+Xz9zsU6a}92R^KRTLiT1v#*b0g|%(?W2 zJNO`c^Qkuv>Qo3fYpxOyAabE|!7K(9tv2Z`OJl_K8uNW(0Os%Hi~*@c_;>{Hc)vkN zXvkHQaVH;#dtcO;2DG_@<9m~aPjYjK3bBXgOXC*uqbDc4p29a2`FU4C1eKoO50$Sn zX&vjO1(bt#`OF_aM`V6tNlTG{ib_SC(ZiA!bs$2U%&3~7!!rP(=$+35ep&%FLBj8| z?;h1kL!G`i-(p0eejXgDXPJq`w7mi|T63w_eKIOCY}88;)*QJsZ}KH%#6VHIJ+Dhy z$SncoSfACaz=f{gm)pREs?Y)8d_zBGZO4!Jei5glbc`s=eC9)}qFc&%UV2~c zTp(2Eq2*(g)Iw$sdsQhZ~!CfNK3R5^ZR-$O`C@3 zil%l`$#{*>#rfW>sU!GA%lpWN%wPJ_H^0-=gu-x z#b0)ygn(0SZP5@Mf)ER#x~1VMdq0XS^9L8!+k=voIE?v%>YysM1oGB+?!1!5oQ-NeQdGxS+?i$@b)i-T z%0q!XuJ@)N){Jw}5qac0Q(J!f6zV(2EF<{CnsCMD9{8R|;efRSqzS41GeUSo*4c-J z{bb;m%9|!C{T&`gRA$aY*@6=r{*$^yk2zfPoL^9 z+g>dY_tJaxvFb*+KS81yQQlM9Nf9b9zrtODPMm4wxpB0hU*>7^r34yln881-Y9&J~ ziGx39bg~aqif292hL+RbrlLy9KLJVa!W%w7{f3AQ%K@lfcVGI*t!(PhqnWG28uvFN zCBnTbgvd|WP_cIs730mFc)BxN{f<`>csDNhbu+H1#g4c%$4F&diwG~lb4%Kf0X4YD zrkq*nmi#2{b@a>koXD_-<{`{3ujd~dnEyiE3Pb-~MNZBruV=-Eem>nq!!(nMOxUco z?Vs=;0vT0hB4r$tU$u^WzN9?rw7#)@ktpH_sOnss#6`X;35&qn(cYNwrE9WDdaR5B}whRvORndvD8ML|va zzBBH{&ef|lbEMppuh0or3m%;`s(O8VEW@5$mAFjNe0I7l>TLT9B?3uzsd2x05Yjf) zNkBalbLKh};dNxNrzZJ%dSvziBJ&nYzvUc3c|!+Ky0KxV$1iR2Q5q;ifS6X6t;zkx zBDa%iDx9So*nFN%VUH}f<2_64vz(h+FWy-jr4d2_uLD@Cwq3-+nW=?usCs-pBPua1 zs!^~1QM0Q?7Q*Z;z~~c^ny^g|9x2FXSp3Oz4f9kG^H;((kZEIx_r*!h3A>srxQ>$} ze%t+-bHOUKT+2=4SF07hRtE#>5l`c&o65zu7(^pzkrB1~Tn&N@)`ps_`hS_rmQiSX z1lAT(f&;Pnzg2&a=Q%loL3M&#N^pTKV+CqM?LTTbh-N$~ChZ=5G1=;!dr#MESD^X1u@7=D25XfurdS=dU-g3axp7%!Gw=Xr0g#(v(zrKSnco1;z z9z&yLFbth5#Ka&Ra|BTM?cst8;RAgx`7DruJ?|sB`wQ}=1CMB+G+W!Dbys7h12>f5 zdRxX4)P{D1dEbe83nP`_p@8ioC&B;u!mukRWjtCH@^ETTT`m*%U_tn+e`4;3<&UX* z&E*J|cDz^isUVN();9u8t00xHrm^4iM1FV65tK`i&dhg{;)pRO)59K|r;Mcbi^inc za_n$8@1OJIEM&*J9%pd>JhP$-h98V#F*l$(sm#=wvEc6sw@uZ}Hc&r>XVJ>Fi=4FhC0lOc#D{dwO%-F4hg z5w?#~VCa=$3hGynDZZOa5x}=*CWXrvUO=8Zbb~F#?90Y0PDJIEaSi?AlYK4FseC!B zaUFN!mRiNh?kZjKefgRZD<|QwM%NyPeFwW8wuY<6QoXGo{`P0|aQoT{M45FZB39qq z6bK#p!@0~{B=hCIr;*D_J3}D3=?_#1b=M?A)LK-8S%x*1_62;XMr&+|N2oO5;dk?l zdxooGUTud$z-8Z@-eG`Id(=wsmg@?Pen7>Pz|pad?OGA@9%;|X+j+RjR=h)954-O- zSz`eEMMxsBN(k_6J22$>qzq@sa< z$yP5Z5=!FoK}#?HtO9`3AXQNpoAE09$Yr{>aI>p&0Uc0G5C+e(jIva#Jao+5DQTyQA~rW(%1PbLFIHDW4Q{|7F;7IPtpCk+HPF$?$l*ROV_uy?9x28rxp4?N-A&}A-WMX7IT;>s{;ci_=|@2*c2=DL9Ek} z>7W5`!Y3xts&dO7hQ?qI_TSEmFI@~qIoXJ|HO)Ql3;+Y}=96Pp*R^!*ND+!DO-L%e5=l|n_yC+{Y^t)X!jnD1BKb4{2^K*+_aaxkIK z0ud6yAr7jNw9cPM6xoJc2pK7xSlteKn-Pf$ikLP2!-LQ{jP!Z0As&Z=^ChL6k9+~) zd$n_K_zt3{-b?a-)k%dbO@G#|G%$P3`J0+AGS|rAwiG@3Jmr*y65OH~?+^ZF*kqw| zKLgXM-~{$8^!R}?ZkEH;`{qg)U#Rrmn19y4W$7tR#bpf2;Ts+{J9?w`*m}x3U;jM` zqrm{rzD#T-d>iK4t7=-$;u$H8J>uXT9bvaoMmtx(1sW;&& zc)HQDOSyKjZaIF3Fl$3wAG(WP(WU0vX)z!1GcmZu;MiRwY_vgr*lvk;R)Xn=waKFsB5MUbw|gvaIG7+ioL^(xca#2k!HlU^Kkv!sS8YdZ2r~$<;^nlG z^6S~}{b^sN!+yZb(56igD1gd~P=JJLJ$DOY8N(()m^}p?s*mI`@?%0rWPp0P>1I%; z%VgKinmx=aEYS_b&PwQ(mj`?7+h1oVx7ezFQ9(H#04`bpDNY5z=f-7$iadmiDsVD^ zdpb{ZX0sh1Um6WU2{;>(BOXdN-iyZZ3Z7@bWwILsO5z%irv`s^O6_$Mk$D%I0kRgcg_a34eR1ej{Y)D6!ztDC z$#fH3Vit?qvGIq_XXm$S;2q4Gw1T6UHfkysc)-+_j^jogymo-+`^HSogNEFKYV`oy zS@D##KLdD5Y+~4$$S^MoL5e{TW>)az3B0jIBl$Y!q0hGHOa9$hmPF^<50pOCRu0=U z>rkmNLf&iQX1yLpr#Tylh^jTfN7Ll}e)=b!L0DWCvs-IG!kfLA{KEq^4rOQ4BD|3G z6bhHJ(xoopqDwoydh>!DM5e4Msji)`4)>u%k#&9`=XfQR^Kpl4d^q;sc~$vZJppSq z0w(S&7dX#QTNd_}jJ#<9GBJTOhC!@t7&1xVBrmoZJz*^E|L|SJ3y(zn3iuBd(?XK1 zND_fk7FU0Z+B(EJmRu+J5 z3;jK1a0S2LMD~0Q-syW)=O(rK{KgI9If#U)n%BkZcay*;v7`J8drIT(umtmfv{oPc za;MY_Dn4g5Cdki*su!a~S`e1%{xG@?B{qEXM@}Ov(K9aEX;vQGiq;Snh(srCFRaWi zHZ~KY^gc(TRw4S|2E{AjUAk}qm*XdDRe7!b42$LB z6PZuEVN$HR9xjfb!E4zuEVy^Bm#x~eQ_M8d5Js)2X|$=A=kSPITaQ1amYgX1-lVt_ zzo(q0woU&oYK1jXDd5&_$6(BfhH#2o#$@>wP(}mb; zZbILY&HGTjzmAUI;E9LTwbFG}G-wc#eR9(-PncU{{s3glzM#;}&&vBs4D+e0<|_fe zJ6@0WhnITpBj7(Sa`Kh^nz!8Cb-BM_&u*(9P;e#9RvQ{P+nS1&T>;1~Aa;^Uc?=2+!RU?{&(uOVJI$zWxyx1S< zJyC&w$xNVsw7c;g>I%Ry`;8;8y;BTH9| zS8XXkD5Z0mP!7Bc>R+;R3N={~>;occU-4$$R5<4aN@eD-4}3(lrS(8~AJ|PC0{@_B z$Q`Zrm+_`3*Fc{)lMqeG9tNsFuXh)j0DD6SB?ClRuhs<61O^!(4wWtiIXX2e2(wHV zn`g?j)>Gtq0@e;V9SlMCQ{^se8in{klK!l~Thck@a~_OEx`j;oF6+cKv9pMzJ!mC(vXX-oU8iaHTiy)+KpAN_bq(L z-~LxttlA*;l)H43_0(yzpV#HNu+zF$V9LLZ2R_^|X4Z?|i^~ zQjiA{%#Wyj3ug+%IvzI$67I08iF+-o?hyU}ImvU=s7RD#;unaTCzI2%T#?C5NJe-K z0KiWmE=-}>>jCr2cRI<+U#g@YOK=)ZOejU4?|*B12!?!{gC@T$=kHn%@@8tSEVNs7s9b2l@>3#%uti#ePLCzhpkFFk# z9ChAy7xIw%DY`ktBc_=CiC#%ilKN-Q&N%aiPOG&OC%JyWspNv=LL(sg@UOxtn>)U4tmo*``K z(QkI-Hhf0=H|l)`a4Y$}0OTiL_38#QtbT3-!7KEjiV1{0Jk73_PuOZkynV;Q55J z^Hp2@RXjL|t&BfLcG_cTeSNTC9t`oET9C*RGU!aOw~aM*$-xn3{*@TcOQ8f0`Xhw~ zTm>LG-z_D=Ncqd;qX%vn4;JQL)t}J$mSY@Jx>I>fXz=&P{meS7lot6z!ho7pCCmV| z+^q}TRN_~~%`OHBZpA}^E6#pF{bwLNh~Z;4`tZa2bSnj1zuDdG@)+4$e@10A#rlkX z#ATx2zi(L%yz@G4@2=b7_}vDqA{Vp82^TN<=V?i{`ZY7GK3ZSXuPgsa(rWhzKe`XF zJR|StIQciZJA-90Il?+jK{jW;iY2woJQHSU{RY~E7n($rGj6rzvnYWduYQ0V3K=He zb9++Q;rF-@;xE*!_$Skxpu#>x7%C1MFXR-(Xxix7$kjkA8CSvYq_`|w-l@m?;kt`mwnK@?aJDu%!E;bRUrdT<7xN+`XJ3OF# zj-Fhci*eAKOXCBp6);^?9B?=u^AV-{4M1EMU^RTQEhbBQ((og|FeR0d3DyX;m4P}9-@Z10WCE(Y5?)|}Z+ejN6HTrrzWz`> zR+k`&{It>7tA0-d(FAgq%+)K!!*s)AgrM~sX9#G}2WpmQkHBTO)g2&d$75&r)ZpAF z)`2539|;!b7grgtb@Zpi%A*o+YAhP|wGm=8Y_*Y9mYG20Pt6qL8 z4xK9yQE=7~XcVkXj!pv=cag{1wgj617URiAvu>qc=)6AJ0q@Lhpe#8`%__pZHW*;T zxmih^%?c81aCo`a?_`+5Q57 zX&Y1LwbQ|g>2Ji3*4z7UiH+7t;-XmSwUGIT_5Y{2D-DNw?Zadwov1UG7RDQ93=J8g zj$I;V>|@K4u}_w1EMq5%I+i&mgRvws#uC#Yq$E;WBspcv+Clb1hC`uro}ueH=l$@0 zf9K0w*UYp1p6CC6miu=<_pf5@zzxbF1wo035lfHTe6oC%a&;Glo*0|b<}N{-Z=PVm z?-dR*9-4*Qd!BPgtjlFG4JoW+RO`oXx?VXRroEbs9;Ah6p^yWP8@Aiy!cEUu}tK^IeLH_|*h8l>kg9KKj>my$>dQa=<9pw$= zb7HlvM4|(2yAHs*E+%hXIIfFWI^X|u!q`!qqNJ87hWth9SJF@JfI%rnVfa_-)AN=` z%^j^H+80^Rd49lEt&PrZLCjD`XU4PN%EYTk48ju}41|IxD11|VTL$4fbM>HyY>#t| z#-c11diAqse{K14_n3Kaln~N*za>xM*9`TgXZx?7~o(;75&U`g}|96@dl z-Vshv-_6A!<7uK6+G2mY|K7+yrT^Auv$SLn=-c=o*ckUJZh}-edjDS6i(c5hLH}eD z&V(rD8)WO}r(79!1c6X?P^yGW(CUa37kX^o3oaynh>Om09~&!Z^}du4F}r0#;*XL+ z92XqArb+K>yiebaemNF?+$iDEc+u3C8Xmt>%DSY%n1XPa>vf z4DXJ^C;i1Yahw(j&15n`Y0KyU_Syd=OgAZ7W76<~LDy|Ci^E{`;nr#->F%Ax1vfT|BC3F&u4K~m+m5Y90WFE&{L zBiF5#4%{tVvk!;#4In6i1odkGafeB#;gf4kAdC3{q{8x(EWQWpUlS#oEiBkNI+5fx z7j3O%ukrYMYnE@)x<&(9bnG|R3iaZmSy>-vix(D*$pczQWXzY@_My6rkLO48VV?^B zJcD_)2UG(Ki5l%(@TK10_F)i*o3q|$SkX?6ch(y>vTKwIS{R|nUDGr3Ii(S+wikcm z*e5LzM~k{9j#EWyzU7D0ytF4LlwGT7P1Y!%CXME>uV$l{#pYRY?`~{iHe<36Qg@3T z2EH%wDo=VQ=^1!}VT^8TRNpk(ld^>J=bUVEM`(zT&(`yeOk3cVX7uyciHk{bp}~PN z-=#8WCZ69j)o#Tx#Q7D(1uXGN)4}_12AV}%++UgOUo6-}9>zufNMrrv+MK z^JqXWdRp@J;92i`8AtveYBSX|&C5H+?wQ}+6> zHvQMt)a-+7_zRU6h5{VTcAE#^OoqPXW0%WYI@7VI_9o-`=lk>0 z_Wk7Pt5OeF?e{;^w0aUKZbKssV3%$Aak6ps!V4`COs$3euGX9OeHhX$ID#L$Yp>4( z;Zh2jppF0seFvh&9r$cB4OK9`T>&H>gCnR+^^HKA+6&@S&0?GNJp#zQd=*& zbSmRzahJu~hhSq(pGms9eaC@8apHR5#Gx(t(ITLK` z@awkesF|$6OBS)R%H^gLi5qH*O;$P_0Y1c-@P--U!PD;E!WQd^l-_;orzEXDH+X}&q+yJwgQ>=0R#+$z4}=da)~JQL z9CC24$A~oG=FN?ZhrpglE7T6|bQ9yC8jRkyVt{EfO|H`v8+q z2v|p1*K9p@RjX0~2+8>;mm-YX#29zN1Z>Xl2FR>HCM4&%WUN&d*h-NNvE#e_gfYgD z=J}1N6bx_&x?KV`M8572wD%o$G&g3#d-^S~ZYo3n`AVF^Or;I;ejAIsLN+yy2`K%L~hrg}q!rP^5Qq`E_2 zd~E9zc*Tx4+&&d7QQM8`h#bM$wchLnC(!L2av=r$F$Y^%f$BesID6ue2rK1&trO5r z((+ z&|NJzF}LF_0lT~-Z9+T5U0y&%vFiYJKR7B2!i7hrcM7dXJjhc6drNS9EOA>ldph8w z%9iOVAwLD7z6Id$bu$rw6@c-8Q}ExI!HX>VV?^59a8gY7FBh9SZs(FpBzBm^j#enu zQmt0pBFo2_6SQj8`-@Hc!yfJxO^{jJOQW@dp;cLO%P-i67QJA;lPB3^)y2*{+}?B? z1UL8Q8*u6P@BxI2-KH29_fprR=J`vw!mMCbP$g%-q;J%#m##oq5P31I)giG!7vf(E_Q7F&fi12KW)YJcnLH>6EpESDU=d z4$(dRE6@t1B+!@I8vUPzXjBmdfWGKsbRWiqE&5IL=<;Mt$R}1li~u*~^{mt?3!;4Ze6((W + +# RFC-45: Asynchronous Metadata Indexing + +## Proposers + +- @codope +- @manojpec + +## Approvers + +- @nsivabalan +- @vinothchandar + +## Status + +JIRA: [HUDI-2488](https://issues.apache.org/jira/browse/HUDI-2488) + +## Abstract + +Metadata indexing (aka metadata bootstrapping) is the process of creation of one +or more metadata-based indexes, e.g. data partitions to files index, that is +stored in Hudi metadata table. Currently, the metadata table (referred as MDT +hereafter) supports single partition which is created synchronously with the +corresponding data table, i.e. commits are first applied to metadata table +followed by data table. Our goal for MDT is to support multiple partitions to +boost the performance of existing index and records lookup. However, the +synchronous manner of metadata indexing is not very scalable as we add more +partitions to the MDT because the regular writers (writing to the data table) +have to wait until the MDT commit completes. In this RFC, we propose a design to +support asynchronous metadata indexing. + +## Background + +We can read more about the MDT design +in [RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements) +. Here is a quick summary of the current state (Hudi v0.10.1). MDT is an +internal Merge-on-Read (MOR) table that has a single partition called `files` +which stores the data partitions to files index that is used in file listing. +MDT is co-located with the data table (inside `.hoodie/metadata` directory under +the basepath). In order to handle multi-writer scenario, users configure lock +provider and only one writer can access MDT in read-write mode. Hence, any write +to MDT is guarded by the data table lock. This ensures only one write is +committed to MDT at any point in time and thus guarantees serializability. +However, locking overhead adversely affects the write throughput and will reach +its scalability limits as we add more partitions to the MDT. + +## Goals + +- Support indexing one or more partitions in MDT while regular writers and table + services (such as cleaning or compaction) are in progress. +- Locking to be as lightweight as possible. +- Keep required config changes to a minimum to simplify deployment / upgrade in + production. +- Do not require specific ordering of how writers and table service pipelines + need to be upgraded / restarted. +- If an external long-running process is being used to initialize the index, the + process should be made idempotent so it can handle errors from previous runs. +- To re-initialize the index, make it as simple as running the external + initialization process again without having to change configs. + +## Implementation + +### A new Hudi action: INDEX + +We introduce a new action `index` which will denote the index building process, +the mechanics of which is as follows: + +1. From an external process, users can issue a CREATE INDEX or similar statement + to trigger indexing for an existing table. + 1. This will add a `.index.requested` to the timeline, which + contains the indexing plan. + 2. From here on, the index building process will continue to build an index + up to instant time `t`, where `t` is the latest completed instant time on + the timeline without any + "holes" i.e. no pending async operations prior to it. + 3. The indexing process will write these out as base files within the + corresponding metadata partition. A metadata partition cannot be used if + there is any pending indexing action against it. + +2. Any inflight writers (i.e. with instant time `t'` > `t`) will check for any + new indexing request on the timeline prior to preparing to commit. + 1. Such writers will proceed to additionally add log entries corresponding + to each such indexing request into the metadata partition. + 2. There is always a TOCTOU issue here, where the inflight writer may not + see an indexing request that was just added and proceed to commit without + that. We will correct this during indexing action completion. In the + average case, this may not happen and the design has liveness. + +3. When the indexing process is about to complete, it will check for all + completed commit actions to ensure each of them added entries per its + indexing plan, otherwise simply abort after a configurable timeout. Let's + call this the **indexing check**. + 1. The corner case here would be that the indexing check does not factor in + the inflight writer just about to commit. But given indexing would take + some finite amount of time to go from requested to completion (or we can + add some, configurable artificial delays here say 60 seconds), an + inflight writer, that is just about to commit concurrently, has a very + high chance of seeing the indexing plan and aborting itself. + +We can just introduce a lock for adding events to the timeline and these races +would vanish completely, still providing great scalability and asynchrony for +these processes. + +### Multi-writer scenario + +![](./async_metadata_index.png) + +Let us walkthrough a concrete mutli-writer scenario to understand the above +indexing mechanism. In this scenario, let instant `t0` be the last completed +instant on the timeline. Suppose user triggered index building from an external +process at `t3`. This will create `t3.index.requested` file with the indexing +plan. The plan contains the metadata partitions that need to be created and the +last completed instant, e.g. + +``` +[ + {MetadataPartitionType.FILES.partitionPath(), t0}, + {MetadataPartitionType.BLOOM_FILTER.partitionPath(), t0}, + {MetadataPartitionType.COLUMN_STATS.partitionPath(), t0} +] +``` + +Further, suppose there were two inflight writers Writer1 and Writer2 (with +inflight instants `t1` and `t2` respectively) while the indexing was requested +or inflight. In this case, the writers will check for pending index action and +find a pending instant `t3`. Now, if the metadata index creation is inflight and +a basefile is already being written under the metadata partition, then each +writer will create log files in the same filegroup for the metadata index +update. This will happen within the existing data table lock. However, if the +indexing has still not started and instant `t3` is still in requested state, +then writer will still continue to log entries but indexer will handle this +scenario and assign the same filegroup id when `t3` transitions to inflight. + +The indexer runs in a loop until the metadata for data upto `t0` plus the data +written due to `t1` and `t2` has been indexed, or the indexing timed out. After +timeout, indexer will abort writing the instant upto which indexing was done +in `t3.index.inflight` file in the timeline. At this point, user can trigger the +index process again, however, this time `t2` will become the last completed +instant. This design ensures that the regular writers do not fail due to +indexing. + +### Error Handling + +**Case 1: Writer fails while indexer is inflight** + +This means index update due to writer did not complete. Indexer continues to +build the index ignoring the failed instant due to writer. The next update by +the writer will trigger a rollback of the failed instant, which will also +rollback incomplete updates in metadata table. + +**Case 2: Indexer fails while writer is inflight** + +Writer will commit adding log entries to the metadata partition. Indexer will +fetch the last instant for which indexing was done from `.index.inflight` file. +It will start indexing again from the instant thereafter. + +**Case 3: Race conditions** + +a) Writer went inflight just after an indexing request was added but indexer has +not yet started executing. + +In this case, writer will continue to log updates in metadata partition. At the +time of execution, indexer will see there are already some log files and handle +that in the indexing check. + +b) Inflight writer about to commit, but indexing completed just before that. + +In this case, since the indexer completed before the writer, so it has already +missed the index updates due to the writer. We can let async compaction on the +metadata table handle this scenario so that the log files written by the writer +are merged into a base file. But what if the async compaction has not even +completed and there is another indexing request? What will be the latest +completed instant then? + +Or, we can introduce a lock for adding events to the metadata timeline. + +**Case 4: Async table services** + +The metadata partition cannot be used if there is any pending index action +against it. So, async compaction/cleaning/clustering will ignore the metadata +partition for which indexing is inflight. + +## Rollout/Adoption Plan + +- What impact (if any) will there be on existing users? + +There can be two kinds of existing users: + +a) Enabling metadata for the first time: There should not be any impact on such +users. When they enable metadata, they can trigger indexing process. b) Metadata +already enabled: Such users already have metadata table with at least one +partition. If they trigger indexing process, then the indexer should take into +account the existing metadata and ignore instants upto which MDT is in sync with +the data table. + +- If we are changing behavior how will we phase out the older behavior? + +The changes will be backward-compatible and if the async indexing is diabled +then the existing behavior of MDT creation and updates will be used. + +- If we need special migration tools, describe them here. + +Not required. + +- When will we remove the existing behavior + +Not required + +## Test Plan + +- Extensive unit tests to cover all scenarios including conflicts and + error-handling. +- Run a long-running test on EMR cluster with async indexing enabled. From ee2a53e6ba4a54f3f247b58a020a3df53632fa8e Mon Sep 17 00:00:00 2001 From: Sagar Sumit Date: Wed, 9 Mar 2022 21:41:23 +0530 Subject: [PATCH 2/4] Add changes since last discussion --- rfc/rfc-45/rfc-45.md | 99 ++++++++++++++++++++++++++++++-------------- 1 file changed, 67 insertions(+), 32 deletions(-) diff --git a/rfc/rfc-45/rfc-45.md b/rfc/rfc-45/rfc-45.md index 3f6848da6cc2a..92ccf6f0639f7 100644 --- a/rfc/rfc-45/rfc-45.md +++ b/rfc/rfc-45/rfc-45.md @@ -83,15 +83,21 @@ the mechanics of which is as follows: 1. From an external process, users can issue a CREATE INDEX or similar statement to trigger indexing for an existing table. - 1. This will add a `.index.requested` to the timeline, which - contains the indexing plan. + 1. This will schedule INDEX action and add + a `.index.requested` to the timeline, which contains the + indexing plan. Index scheduling will also initialize the filegroup for + the partitions for which indexing is planned. 2. From here on, the index building process will continue to build an index up to instant time `t`, where `t` is the latest completed instant time on the timeline without any "holes" i.e. no pending async operations prior to it. 3. The indexing process will write these out as base files within the corresponding metadata partition. A metadata partition cannot be used if - there is any pending indexing action against it. + there is any pending indexing action against it. As and when indexing is + completed for a partition, then table config (`hoodie.properties`) will + be updated to indicate that partition is available for reads or + synchronous updates. Hudi table config will be the source of truth for + the current state of metadata index. 2. Any inflight writers (i.e. with instant time `t'` > `t`) will check for any new indexing request on the timeline prior to preparing to commit. @@ -102,10 +108,14 @@ the mechanics of which is as follows: that. We will correct this during indexing action completion. In the average case, this may not happen and the design has liveness. -3. When the indexing process is about to complete, it will check for all - completed commit actions to ensure each of them added entries per its - indexing plan, otherwise simply abort after a configurable timeout. Let's - call this the **indexing check**. +3. When the indexing process is about to complete (i.e. indexing upto + instant `t` is done but before completing indexing commit), it will check for + all completed commit instants after `t` to ensure each of them added entries + per its indexing plan, otherwise simply abort after a configurable timeout. + Let's call this the **indexing check**. So, the indexer will only write base + files but ensure that log entries due to instants after `t` are in the same + filegroup i.e. no new filegroup is initialized by writers while indexing is + in progress. 1. The corner case here would be that the indexing check does not factor in the inflight writer just about to commit. But given indexing would take some finite amount of time to go from requested to completion (or we can @@ -115,7 +125,8 @@ the mechanics of which is as follows: We can just introduce a lock for adding events to the timeline and these races would vanish completely, still providing great scalability and asynchrony for -these processes. +these processes. The indexer will error out if there is no lock provider +configured. ### Multi-writer scenario @@ -139,21 +150,19 @@ last completed instant, e.g. Further, suppose there were two inflight writers Writer1 and Writer2 (with inflight instants `t1` and `t2` respectively) while the indexing was requested or inflight. In this case, the writers will check for pending index action and -find a pending instant `t3`. Now, if the metadata index creation is inflight and -a basefile is already being written under the metadata partition, then each -writer will create log files in the same filegroup for the metadata index -update. This will happen within the existing data table lock. However, if the -indexing has still not started and instant `t3` is still in requested state, -then writer will still continue to log entries but indexer will handle this -scenario and assign the same filegroup id when `t3` transitions to inflight. +find a pending instant `t3`. Now, if the metadata index creation is pending, +which means indexer has already intialized a filegroup, then each writer will +create log files in the same filegroup for the metadata index update. This will +happen within the existing data table lock. The indexer runs in a loop until the metadata for data upto `t0` plus the data -written due to `t1` and `t2` has been indexed, or the indexing timed out. After -timeout, indexer will abort writing the instant upto which indexing was done -in `t3.index.inflight` file in the timeline. At this point, user can trigger the -index process again, however, this time `t2` will become the last completed -instant. This design ensures that the regular writers do not fail due to -indexing. +written due to `t1` and `t2` has been indexed, or the indexing timed out. +Whether indexing timed out or not, table config would be updated with any MDT +partition(s) for which indexing was complete till `t2`. In case of timeout +indexer will abort. At this point, user can trigger the index process again, +however, this time indexer will check for available partitions in table config +and skip those partitions. This design ensures that the regular writers do not +fail due to indexing. ### Error Handling @@ -166,9 +175,10 @@ rollback incomplete updates in metadata table. **Case 2: Indexer fails while writer is inflight** -Writer will commit adding log entries to the metadata partition. Indexer will -fetch the last instant for which indexing was done from `.index.inflight` file. -It will start indexing again from the instant thereafter. +Writer will commit adding log entries to the metadata partition. However, table +config will indicate that partition is not ready to use. When indexer is +re-triggered, it will check the plan and table config to figure out which MDT +partitions to index and start indexing for those partitions. **Case 3: Race conditions** @@ -176,17 +186,24 @@ a) Writer went inflight just after an indexing request was added but indexer has not yet started executing. In this case, writer will continue to log updates in metadata partition. At the -time of execution, indexer will see there are already some log files and handle -that in the indexing check. +time of execution, indexer will see there are already some log files and ensure +that the indexing check passes. b) Inflight writer about to commit, but indexing completed just before that. -In this case, since the indexer completed before the writer, so it has already -missed the index updates due to the writer. We can let async compaction on the -metadata table handle this scenario so that the log files written by the writer -are merged into a base file. But what if the async compaction has not even -completed and there is another indexing request? What will be the latest -completed instant then? +Ideally, the indexing check in the indexer should have failed. But this could +happen in the following sequence of events: + +1. No pending data commit. Indexing check passed, indexing commit not + completed (table config yet to be updated). +2. Writer went inflight knowing that MDT partition is not ready for use. +3. Indexing commit done, table config updated. + +In this case, the writer will continue to write log files under the latest base +filegroup in the MDT partition. Even though the indexer missed the updates due +to writer, there is no "index loss" as such i.e. metadata due to writer is still +updated in the MDT partition. Async compaction on the MDT will eventually merge +the updates into another base file. Or, we can introduce a lock for adding events to the metadata timeline. @@ -196,6 +213,24 @@ The metadata partition cannot be used if there is any pending index action against it. So, async compaction/cleaning/clustering will ignore the metadata partition for which indexing is inflight. +**Case 5: Data timeline with holes** + +Let's say the data timeline when indexer is started looks +like: `C1, C2,.... C5 (inflight), C6, C7, C8`. In this case the latest completed +instant without any hole is `C4`. So, indexer will continue to index upto `C4`. +Instants `C5-C8` will go through the indexing check. If `C5` does not complete +before the timeout, then indexer will abort. The indexer will run through the +same process again when re-triggered. + +## Summary of key proposals + +- New INDEX action on data timeline. +- Async indexer to handle state change for the new action. +- Concept of "indexing check" to reconcile instants that went inflight after + indexer started. +- Table config to be the source of truth for available MDT partitions. +- Indexer will error out if lock provider not configured. + ## Rollout/Adoption Plan - What impact (if any) will there be on existing users? From 9705e18e3379d707733ced26749e3d3b68f3175c Mon Sep 17 00:00:00 2001 From: Sagar Sumit Date: Fri, 11 Mar 2022 17:33:37 +0530 Subject: [PATCH 3/4] Add another race condition handling --- rfc/rfc-45/rfc-45.md | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/rfc/rfc-45/rfc-45.md b/rfc/rfc-45/rfc-45.md index 92ccf6f0639f7..beac4ddfb3521 100644 --- a/rfc/rfc-45/rfc-45.md +++ b/rfc/rfc-45/rfc-45.md @@ -86,7 +86,8 @@ the mechanics of which is as follows: 1. This will schedule INDEX action and add a `.index.requested` to the timeline, which contains the indexing plan. Index scheduling will also initialize the filegroup for - the partitions for which indexing is planned. + the partitions for which indexing is planned. The creation of filegroups + will be done within a lock. 2. From here on, the index building process will continue to build an index up to instant time `t`, where `t` is the latest completed instant time on the timeline without any @@ -112,10 +113,10 @@ the mechanics of which is as follows: instant `t` is done but before completing indexing commit), it will check for all completed commit instants after `t` to ensure each of them added entries per its indexing plan, otherwise simply abort after a configurable timeout. - Let's call this the **indexing check**. So, the indexer will only write base - files but ensure that log entries due to instants after `t` are in the same - filegroup i.e. no new filegroup is initialized by writers while indexing is - in progress. + Let's call this the **indexing check**. So, the indexer will not only write + base files but also ensure that log entries due to instants after `t` are in + the same filegroup i.e. no new filegroup is initialized by writers while + indexing is in progress. 1. The corner case here would be that the indexing check does not factor in the inflight writer just about to commit. But given indexing would take some finite amount of time to go from requested to completion (or we can @@ -207,6 +208,22 @@ the updates into another base file. Or, we can introduce a lock for adding events to the metadata timeline. +c) Inflight writer about to commit but index is still being scheduled + +Consider the following scenario: + +1. Writer is in inflight mode. +2. Indexer is starting and creating the file-groups. Suppose there are 100 + file-groups to be created. +3. Writer just finished and tries to write log blocks - it only sees a subset of + file-groups created yet (as the above step 2 above has not completed yet). + This will cause writer to incorrectly write updated to lesser number of + shards. + +In this case, we ensure that scheduling for metadata index always happens within +a lock. Since the initialization of filegroups happen at the time of scheduling, +indexer will hold the lock until all the filegroups are created. + **Case 4: Async table services** The metadata partition cannot be used if there is any pending index action From ee05ae22e767b1c62862f9d041ef2835d4667f63 Mon Sep 17 00:00:00 2001 From: Sagar Sumit Date: Fri, 1 Apr 2022 20:58:04 +0530 Subject: [PATCH 4/4] Update rfc --- rfc/README.md | 2 +- rfc/rfc-45/rfc-45.md | 135 ++++++++++++++++++++++++++++++++++++------- 2 files changed, 116 insertions(+), 21 deletions(-) diff --git a/rfc/README.md b/rfc/README.md index 5ec12dc666ecb..4d8aba38014d1 100644 --- a/rfc/README.md +++ b/rfc/README.md @@ -68,7 +68,7 @@ The list of all RFCs can be found here. | 42 | [Consistent Hashing Index](./rfc-42/rfc-42.md) | `UNDER REVIEW` | | 43 | [Compaction / Clustering Service](./rfc-43/rfc-43.md) | `UNDER REVIEW` | | 44 | [Hudi Connector for Presto](./rfc-44/rfc-44.md) | `UNDER REVIEW` | -| 45 | [Asynchronous Metadata Indexing](./rfc-45/rfc-45.md) | `UNDER REVIEW` | +| 45 | [Asynchronous Metadata Indexing](./rfc-45/rfc-45.md) | `IN PROGRESS` | | 46 | [Optimizing Record Payload Handling](./rfc-46/rfc-46.md) | `UNDER REVIEW` | | 47 | [Add Call Produce Command for Spark SQL](./rfc-47/rfc-47.md) | `UNDER REVIEW` | | 48 | [LogCompaction for MOR tables](./rfc-48/rfc-48.md) | `UNDER REVIEW` | \ No newline at end of file diff --git a/rfc/rfc-45/rfc-45.md b/rfc/rfc-45/rfc-45.md index beac4ddfb3521..f79dd896a09e6 100644 --- a/rfc/rfc-45/rfc-45.md +++ b/rfc/rfc-45/rfc-45.md @@ -76,14 +76,16 @@ its scalability limits as we add more partitions to the MDT. ## Implementation -### A new Hudi action: INDEX +### High Level Design + +#### A new Hudi action: INDEXING We introduce a new action `index` which will denote the index building process, the mechanics of which is as follows: -1. From an external process, users can issue a CREATE INDEX or similar statement - to trigger indexing for an existing table. - 1. This will schedule INDEX action and add +1. From an external process, users can issue a CREATE INDEX or run a job to + trigger indexing for an existing table. + 1. This will schedule INDEXING action and add a `.index.requested` to the timeline, which contains the indexing plan. Index scheduling will also initialize the filegroup for the partitions for which indexing is planned. The creation of filegroups @@ -113,14 +115,14 @@ the mechanics of which is as follows: instant `t` is done but before completing indexing commit), it will check for all completed commit instants after `t` to ensure each of them added entries per its indexing plan, otherwise simply abort after a configurable timeout. - Let's call this the **indexing check**. So, the indexer will not only write + Let's call this the **indexing catchup**. So, the indexer will not only write base files but also ensure that log entries due to instants after `t` are in the same filegroup i.e. no new filegroup is initialized by writers while indexing is in progress. - 1. The corner case here would be that the indexing check does not factor in - the inflight writer just about to commit. But given indexing would take - some finite amount of time to go from requested to completion (or we can - add some, configurable artificial delays here say 60 seconds), an + 1. The corner case here would be that the indexing catchup does not factor + in the inflight writer just about to commit. But given indexing would + take some finite amount of time to go from requested to completion (or we + can add some, configurable artificial delays here say 60 seconds), an inflight writer, that is just about to commit concurrently, has a very high chance of seeing the indexing plan and aborting itself. @@ -129,7 +131,7 @@ would vanish completely, still providing great scalability and asynchrony for these processes. The indexer will error out if there is no lock provider configured. -### Multi-writer scenario +#### Multi-writer scenario ![](./async_metadata_index.png) @@ -165,6 +167,77 @@ however, this time indexer will check for available partitions in table config and skip those partitions. This design ensures that the regular writers do not fail due to indexing. +### Low Level Design + +#### Schedule Indexing + +The scheduling initializes the file groups for metadata partitions in a lock. It +does not update any table config. + +``` +1 Run pre-scheduling validation (valid index requested, lock provider configured, idempotent checks) +2 Begin transaction + 2.a Get the base instant + 2.b Start initializing file groups for each partition + 2.c Create index plan and save indexing.requested instant to the timeline +3 End transaction +``` + +If there is failure in any of the above steps, then we abort gracefully i.e. +delete the metadata partition if it was initialized. + +#### Run Indexing + +This is a separate executor, which reads the plan and builds the index. + +``` +1 Run pre-indexing checks (lock provider configured, indexing.requested exists, idempotent checks) +2 Read the indexing plan and if any of the requested partition is inflight or already completed then error out and return early +3 Transition indexing.requested to inflight +4 Build metadata partitions + 4.a Build the base file in the metadata partition to index upto instant as per the plan + 4.b Update inflight partitions config in hoodie.properties +5 Determine the catchup start instant based on write and non-write timeline +6 Start indexing catchup in a separate thread (that can be interrupted upon timeout) + 6.a For each instant to catchup + 6.a.i if instant is completed and has corresponding deltacommit in metadata timeline then continue + 6.a.ii if instant is inflight, then reload active timeline periodically until completed or timed out + 6.a.iii update metadata table, if needed, within a lock +7 Build indexing commit metadata with the partition info and caught upto instant +8 Begin transaction + 8.a update completed metadata partitions in table config + 8.b save indexing commit metadata to the timeline transition indexing.inflight to completed. +9 End transaction +``` + +If there is failure in any of the above steps, then we abort gracefully i.e. +delete the metadata partition if it exists and revert the table config updates. + +#### Configs + +``` +# enable metadata +hoodie.metadata.enable=true +# enable asynchronous metadata indexing +hoodie.metadata.index.async=true +# enable column stats index +hoodie.metadata.index.column.stats.enable=true +# set indexing catchup timeout +hoodie.metadata.index.check.timeout.seconds=60 +# set OCC concurrency mode +hoodie.write.concurrency.mode=optimistic_concurrency_control +# set lock provider +hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.InProcessLockProvider +``` + +#### Table upgrade/downgrade + +While upgrading from a previous version to the current version, if metadata is +enabled and `files` partition exists then completed partitions in +hoodie.paroperties will be updated to `files` partition. While downgrading to a +previous version, if metadata table exists then it is deleted because metadata +table in current version has a schema that is not forward compatible. + ### Error Handling **Case 1: Writer fails while indexer is inflight** @@ -188,11 +261,11 @@ not yet started executing. In this case, writer will continue to log updates in metadata partition. At the time of execution, indexer will see there are already some log files and ensure -that the indexing check passes. +that the indexing catchup passes. b) Inflight writer about to commit, but indexing completed just before that. -Ideally, the indexing check in the indexer should have failed. But this could +Ideally, the indexing catchup in the indexer should have failed. But this could happen in the following sequence of events: 1. No pending data commit. Indexing check passed, indexing commit not @@ -233,19 +306,41 @@ partition for which indexing is inflight. **Case 5: Data timeline with holes** Let's say the data timeline when indexer is started looks -like: `C1, C2,.... C5 (inflight), C6, C7, C8`. In this case the latest completed -instant without any hole is `C4`. So, indexer will continue to index upto `C4`. -Instants `C5-C8` will go through the indexing check. If `C5` does not complete -before the timeout, then indexer will abort. The indexer will run through the -same process again when re-triggered. +like: `C1, C2,.... C5 (inflight), C6, C7, C8`, where `C1` is a commit at +instant `1`. In this case the latest completed instant without any hole is `C4`. +So, indexer will continue to index upto `C4`. Instants `C5-C8` will go through +the indexing catchup. If `C5` does not complete before the timeout, then indexer +will abort. The indexer will run through the same process again when +re-triggered. + +The above example contained only write commits however the indexer will consider +non-write commits (such as clean/restore/rollback) as well. Let's take such an +example: + +| DC | DC | DC | CLEAN | DC | DC | COMPACT | DC | INDEXING | DC | +| ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | ---- | +| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | +| C | C | C | I | C | C | R | C | R | I | + +Here, DC indicates a deltacommit, second row is the instant time, and the last +row is whether the action is completed (C), inflight (I) or requested(R). In +this case, the base instant upto which there are no holes in write timeline +is `DC6`. The indexer will also check the earliest pending instant in non-write +timeline before this base instant, which is `CLEAN4`. While the indexing is done +upto base instant, the remaining instants (CLEAN4, COMPACT7, DC8) are checked +during indexing catchup whether they logged updated to corresponding filegroup +as per the index plan. Note that during catchup, indexer won't move beyond +unless the instants to catch up actually get into completed state. For instance, +if the CLEAN4 was inflight till the configured timeout, then indexer will abort. ## Summary of key proposals -- New INDEX action on data timeline. +- New INDEXING action on data timeline. - Async indexer to handle state change for the new action. -- Concept of "indexing check" to reconcile instants that went inflight after +- Concept of "indexing catchup" to reconcile instants that went inflight after indexer started. -- Table config to be the source of truth for available MDT partitions. +- Table config to be the source of truth for inflight and completed MDT + partitions. - Indexer will error out if lock provider not configured. ## Rollout/Adoption Plan