-
Notifications
You must be signed in to change notification settings - Fork 3k
Docs: add catalog and metadata files to metadata structure diagram #2622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks good to me @jasonhughes248 ! Any thoughts @rdblue @aokolnychyi ? |
|
Thanks, @jasonhughes248! This looks like an improvement. I can share the original drawing with you if you want to edit that instead. There are also a couple of things we may want to change. First, I just gave a talk with what I think might be a better series of diagrams than the one here. I'd be interested in getting your feedback on that and whether we should replace the doc based on the new ones instead. Second, assuming that we go with the older drawing, I think we also need to show that both snapshots, |
|
@rdblue yeah, I'd be happy to make these changes in the original drawing if you want to share that with me. if it's lucidchart which it looks like, my username is [email protected] I think the diagram in that linked deck has some good additional aspects:
but, I think the lucidchart diagram looks more polished/official, which is probably better for the docs site, so I think it'd be good to try to merge the two - use the current diagram as the base, and add the aspects of the one you linked (plus the both snapshots in the metadata file change). only thing I worry about is the size of the boxes and diagram with the file content/purpose in them since they'll all be visible at the same time in this one-image usage vs the multi-slide build in the deck, but let's see. I can create a copy of the diagram and try it out. on #3, I think a diagram like this can be helpful in two related but different situations - an introductory overview for new folks that gives a high-level understanding of the layout and types of files, and a deeper view of how the layout and underlying structure changes as the dataset changes. to me, having the point-in-time diagram works really well for the spot it's currently in the docs, since it's a good introduction for new users on that page. then there could be another section covering the changes-over-time content for users who want that more detailed view into how it changes as table changes are made I actually did a presentation internally where a subset has that structure ("layout overview and file types, then how it changes over time") and it worked well. I'm also doing it publicly via webinar on thursday and making a written version of it shortly thereafter, so that could help here too. the changes-over-time view in that content currently shows the files and their location in the filesystem with color-coding of file types, but I think this diagram view like you have here is a better way to tell that story (or maybe both). I can try to address point #3 with multiple versions of this new merged diagram in that content and share here to see if it works well. what do you think? |
|
@jasonhughes248, sounds great. I added you to the google drawing as an editor. Feel free to make changes! |
|
Sounds good, thanks @rdblue! |
…hots in metadata file and lines for dividing metadata vs data layers
|
@rdblue I just updated the point-in-time spec diagram in the PR to do the original changes on the google drawing as well as show both snapshots in the later metadata file and have dividing lines of the metadata vs data layers I couldn't find a way put the contents/purpose of each file type without doing more harm than good in this intro diagram (either the boxes and therefore the diagram getting way bigger, leaving off some important detail so people come away thinking they know the full scope of the file type but they don't, or it being short but redundant of the file type name) on the over-time/examples walkthrough, I tried the sql + spec diagram illustrating the example change + fs contents and I think it worked well. you can see it in this webinar deck. it's covered in slides 14-19. the slides build so it's best to view those in presentation mode for the full over-time effect. you may notice a couple familiar slides later in the deck there 🙂 I'm working on a written version of this content which should be public in a couple weeks that I'm hoping will be useful for folks who prefer reading this kind of content when looking for an architectural intro about something like iceberg. I'll post that here when it's done, it'd be awesome to get your and others' feedback on it too what do you think? |
|
The updated version looks great! |
|
thanks @rdblue! on the discussion above on a different diagram for the over-time state of the table, I put this content together that includes an attempt to explain what happens to an iceberg table's structure as different operations are done on it. it'd be great to get your feedback on that and/or any other sections in the post. there are a few remaining tweaks I want to do to it, so I can certainly incorporate any edits you think would help too |
|
I just deployed the spec with this update. Thanks, @jasonhughes248! |
|
This diagram was very useful for helping me to understand Iceberg. I'd like to use some variations of it for a presentation. Would it be possible to share a public (non-editable) link to the diagram here? Thanks in advance! |
while working on additional educational materials for iceberg (will be making them public shortly), I found making a few changes to the table format diagram on https://iceberg.apache.org/spec/#overview helped people better tie the higher level architecture diagram to what they were seeing in the filesystem as operations were made.
changes made to the diagram:
I used the png from the site which looks a little more compressed than how I saved it, so there's a little difference between the sharpness of the text and lines of the current content vs the additions. I'm happy to make the changes on the original if you want to share the original chart with me
cc @rymurr