{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":756620350,"defaultBranch":"main","name":"openhouse","ownerLogin":"linkedin","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2024-02-13T00:52:30.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/357098?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1721182738.0","currentOid":""},"activityList":{"items":[{"before":"27a8a07d39bab7412f418cbf89939d1c02ec78d6","after":"b04bb5c3c36366700922fccc940dab43dd79f7c7","ref":"refs/heads/main","pushedAt":"2024-07-17T02:01:48.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"sumedhsakdeo","name":"Sumedh Sakdeo","path":"/sumedhsakdeo","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/773250?s=80&v=4"},"commit":{"message":"[1/5] Set up basic Azure sandbox environment, providers, etc. (#136)","shortMessageHtmlLink":"[1/5] Set up basic Azure sandbox environment, providers, etc. (#136)"}},{"before":"7ce564546acf89a7aa97892a3300ed30bfb4c077","after":"27a8a07d39bab7412f418cbf89939d1c02ec78d6","ref":"refs/heads/kai/azure-sandbox","pushedAt":"2024-07-17T00:46:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"kmcclenn","name":"Kai McClennen","path":"/kmcclenn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/63942422?s=80&v=4"},"commit":{"message":"Fix github build failure (#137)\n\n## Summary\r\n\r\nThe build failed because of missing `khttp` library that is required by\r\n`springdoc-openapi` plugin. This library has been removed from maven2\r\nrepo from 2022, so not sure why the previous build succeeded (maybe\r\ngithub caches some gradle libraries). The `springdoc-openapi` plugin has\r\nremoved the usage of `khttp` too since `1.5.0`. I'm upgrading the\r\nversion to `1.6.0` which is the latest that gradle 6 supports.\r\n\r\nMore details in this issue:\r\nhttps://github.com/springdoc/springdoc-openapi-gradle-plugin/pull/92.\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [x] Bug Fixes\r\n- [ ] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n\r\n\r\n- [ ] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [x] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [ ] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Fix github build failure (#137)"}},{"before":"7ce564546acf89a7aa97892a3300ed30bfb4c077","after":"27a8a07d39bab7412f418cbf89939d1c02ec78d6","ref":"refs/heads/main","pushedAt":"2024-07-17T00:22:58.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"jiang95-dev","name":"Levi Jiang","path":"/jiang95-dev","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19853208?s=80&v=4"},"commit":{"message":"Fix github build failure (#137)\n\n## Summary\r\n\r\nThe build failed because of missing `khttp` library that is required by\r\n`springdoc-openapi` plugin. This library has been removed from maven2\r\nrepo from 2022, so not sure why the previous build succeeded (maybe\r\ngithub caches some gradle libraries). The `springdoc-openapi` plugin has\r\nremoved the usage of `khttp` too since `1.5.0`. I'm upgrading the\r\nversion to `1.6.0` which is the latest that gradle 6 supports.\r\n\r\nMore details in this issue:\r\nhttps://github.com/springdoc/springdoc-openapi-gradle-plugin/pull/92.\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [x] Bug Fixes\r\n- [ ] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n\r\n\r\n- [ ] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [x] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [ ] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Fix github build failure (#137)"}},{"before":"397e48378de8ddba6b6ef6a9856b8c4e643bba1b","after":"7ce564546acf89a7aa97892a3300ed30bfb4c077","ref":"refs/heads/kai/azure-sandbox","pushedAt":"2024-07-16T21:21:55.000Z","pushType":"push","commitsCount":4,"pusher":{"login":"kmcclenn","name":"Kai McClennen","path":"/kmcclenn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/63942422?s=80&v=4"},"commit":{"message":"Fix tables service tests due to removal of getAllTables api (#135)\n\n## Summary\r\n\r\nThe [PR](https://github.com/linkedin/openhouse/pull/127) has not cleaned\r\nall the tests related to the getAllTables api (not sure why the build\r\nsucceeded). This PR is to remove the tests to make github build succeed.\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [ ] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [x] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n\r\n\r\n- [ ] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [x] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [ ] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Fix tables service tests due to removal of getAllTables api (#135)"}},{"before":"44a11567eb088e5f94f8e888b1b1ac9034687e7f","after":"0e5f449996e067cf64d39086ed55e48f3cf4fb75","ref":"refs/heads/dependabot/github_actions/github-actions-60c914ad0e","pushedAt":"2024-07-15T19:31:53.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"},"commit":{"message":"Bump anothrNick/github-tag-action in the github-actions group\n\nBumps the github-actions group with 1 update: [anothrNick/github-tag-action](https://github.com/anothrnick/github-tag-action).\n\n\nUpdates `anothrNick/github-tag-action` from 1.69.0 to 1.70.0\n- [Release notes](https://github.com/anothrnick/github-tag-action/releases)\n- [Commits](https://github.com/anothrnick/github-tag-action/compare/1.69.0...1.70.0)\n\n---\nupdated-dependencies:\n- dependency-name: anothrNick/github-tag-action\n dependency-type: direct:production\n update-type: version-update:semver-minor\n dependency-group: github-actions\n...\n\nSigned-off-by: dependabot[bot] ","shortMessageHtmlLink":"Bump anothrNick/github-tag-action in the github-actions group"}},{"before":"5ba28ac94bfb064bb00314e75ba67655b431bed4","after":"7ce564546acf89a7aa97892a3300ed30bfb4c077","ref":"refs/heads/main","pushedAt":"2024-07-12T17:49:59.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"jiang95-dev","name":"Levi Jiang","path":"/jiang95-dev","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19853208?s=80&v=4"},"commit":{"message":"Fix tables service tests due to removal of getAllTables api (#135)\n\n## Summary\r\n\r\nThe [PR](https://github.com/linkedin/openhouse/pull/127) has not cleaned\r\nall the tests related to the getAllTables api (not sure why the build\r\nsucceeded). This PR is to remove the tests to make github build succeed.\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [ ] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [x] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n\r\n\r\n- [ ] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [x] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [ ] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Fix tables service tests due to removal of getAllTables api (#135)"}},{"before":"36aba11a0d4ba73204313a2735650f1f580e3afb","after":"5ba28ac94bfb064bb00314e75ba67655b431bed4","ref":"refs/heads/main","pushedAt":"2024-07-10T15:08:40.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"jiang95-dev","name":"Levi Jiang","path":"/jiang95-dev","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19853208?s=80&v=4"},"commit":{"message":"Deprecate getAllTables api (#127)\n\n## Summary\r\n\r\nDeprecate getAllTables api in favor of searchTables api. The\r\ngetAllTables api causes latency issues. It will come back with paging\r\nsupport.\r\n\r\n## Changes\r\n\r\n- [x] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [ ] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n\r\n\r\n- [ ] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [x] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [ ] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Deprecate getAllTables api (#127)"}},{"before":"a5693f453dd072d6e40b6f42a5504055d3196d83","after":"36aba11a0d4ba73204313a2735650f1f580e3afb","ref":"refs/heads/main","pushedAt":"2024-07-08T23:30:03.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"teamurko","name":"Stas Pak","path":"/teamurko","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/270580?s=80&v=4"},"commit":{"message":"Make snapshots expiration job leaner (#131)\n\n## Summary\r\nSnapshots expiration job skips removing files as we wanted to localize\r\nfiles removal to one job. The job traverses the files tree though, and\r\nthat is expensive and unnecessary. As the result of this change, it\r\nupdates snapshots list in the metadata without traversing the files\r\ntree.\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [ ] New Features\r\n- [x] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n\r\n\r\n- [ ] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [x] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nNo change in the job effect is expected, it's purely optimization.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [ ] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Make snapshots expiration job leaner (#131)"}},{"before":"397e48378de8ddba6b6ef6a9856b8c4e643bba1b","after":"a5693f453dd072d6e40b6f42a5504055d3196d83","ref":"refs/heads/main","pushedAt":"2024-07-08T22:41:55.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"cbb330","name":"Christian Bush","path":"/cbb330","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/25903091?s=80&v=4"},"commit":{"message":"Enabling access to swagger-ui using client without auth (#130)\n\n## Summary\r\nproblem: swagger ui returns 401 when requesting api-docs via browser\r\n\"image\"\r\nsrc=\"https://github.com/linkedin/openhouse/assets/25903091/7f0c2f35-676f-4999-b11d-c9601c0969a7\"\r\n\r\nsolution: expose swagger ui to unauthenticated access (just like the\r\nexisting, non-ui version `/v3/api-docs`)\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [ ] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [x] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\nSwagger-ui is beneficial for browsing API configuration for features\r\nlike, what are the required client headers for a given API endpoint.\r\n\r\n## Testing Done\r\n\r\n\r\n- [x] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [x] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\nI manually ran tests by configuring\r\ninfra/recipes/docker-compose/oh-only/docker-compose.yml to also start\r\nthe jobs rest service, then\r\n```bash\r\n➜ JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_282-msft.jdk/Contents/Home ./gradlew clean build -x test -x javadoc\r\n➜ docker compose -f infra/recipes/docker-compose/oh-only/docker-compose.yml down --rmi all\r\n➜ docker compose -f infra/recipes/docker-compose/oh-only/docker-compose.yml up\r\n```\r\n\r\nthen querying each endpoint:\r\ntables\r\n\"image\"\r\nsrc=\"https://github.com/linkedin/openhouse/assets/25903091/b5411c96-0f52-4010-96fc-ce4db8e34ac5\"\r\nhousetables\r\n\"image\"\r\nsrc=\"https://github.com/linkedin/openhouse/assets/25903091/6a2dda20-f501-4298-be5a-4919ed4a1075\"\r\njobs\r\n\"image\"\r\nsrc=\"https://github.com/linkedin/openhouse/assets/25903091/eb7a6cb8-1a28-4d23-b353-e2c6e0ca54a3\"\r\n\r\n> No tests added or updated. Please explain why. If unsure, please feel\r\nfree to ask for help.\r\n\r\nWhen atttempting to add unittests, using e.g. MockMvcBuilder, the\r\nswagger endpoint returns 404 for our services. I believe this is because\r\nthe service unittests use a Mocked version of the controller, but none\r\nof our controllers specify swagger. Swagger is configured at spring\r\napplication start, so would require a different method of testing than\r\nusing a mocked controller. I tried to use a tomcat/h2 local server but\r\nstill was getting issues of 404.\r\n\r\nI would be open to continuing to try unittests with some pointers in the\r\nright direction for testing springboot application server with\r\nconfigured swagger.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [ ] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Enabling access to swagger-ui using client without auth (#130)"}},{"before":null,"after":"44a11567eb088e5f94f8e888b1b1ac9034687e7f","ref":"refs/heads/dependabot/github_actions/github-actions-60c914ad0e","pushedAt":"2024-07-08T20:03:02.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"dependabot[bot]","name":null,"path":"/apps/dependabot","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/29110?s=80&v=4"},"commit":{"message":"Bump anothrNick/github-tag-action in the github-actions group\n\nBumps the github-actions group with 1 update: [anothrNick/github-tag-action](https://github.com/anothrnick/github-tag-action).\n\n\nUpdates `anothrNick/github-tag-action` from 1.69.0 to 1.70.0\n- [Release notes](https://github.com/anothrnick/github-tag-action/releases)\n- [Commits](https://github.com/anothrnick/github-tag-action/compare/1.69.0...1.70.0)\n\n---\nupdated-dependencies:\n- dependency-name: anothrNick/github-tag-action\n dependency-type: direct:production\n update-type: version-update:semver-minor\n dependency-group: github-actions\n...\n\nSigned-off-by: dependabot[bot] ","shortMessageHtmlLink":"Bump anothrNick/github-tag-action in the github-actions group"}},{"before":"c699979cf5ce999348d9703e2ba99699daadb302","after":"397e48378de8ddba6b6ef6a9856b8c4e643bba1b","ref":"refs/heads/kai/azure-sandbox","pushedAt":"2024-06-26T17:34:23.000Z","pushType":"push","commitsCount":7,"pusher":{"login":"kmcclenn","name":"Kai McClennen","path":"/kmcclenn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/63942422?s=80&v=4"},"commit":{"message":"Data layout optimization (strategy generation). Part 2: data source for statistics/query logs (#109)\n\n## Summary\r\n\r\nThis is part 2 of a new feature: data layout optimization library,\r\nstrategy generation.\r\nAdded data source interface/implementation. This PR builds on top of\r\nhttps://github.com/linkedin/openhouse/pull/108\r\n\r\nThe following 3 components will be added eventually:\r\n1) DLO library that has primitives for generating data layout\r\noptimization strategies\r\n2) App that generates strategies for all tables\r\n3) Scheduling of the app\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [x] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n\r\n\r\n- [ ] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [x] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [x] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Data layout optimization (strategy generation). Part 2: data source f…"}},{"before":"2f57a7c0e99cc11e0642a0dc3798a69fd120af93","after":"397e48378de8ddba6b6ef6a9856b8c4e643bba1b","ref":"refs/heads/main","pushedAt":"2024-06-20T03:21:47.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"teamurko","name":"Stas Pak","path":"/teamurko","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/270580?s=80&v=4"},"commit":{"message":"Data layout optimization (strategy generation). Part 2: data source for statistics/query logs (#109)\n\n## Summary\r\n\r\nThis is part 2 of a new feature: data layout optimization library,\r\nstrategy generation.\r\nAdded data source interface/implementation. This PR builds on top of\r\nhttps://github.com/linkedin/openhouse/pull/108\r\n\r\nThe following 3 components will be added eventually:\r\n1) DLO library that has primitives for generating data layout\r\noptimization strategies\r\n2) App that generates strategies for all tables\r\n3) Scheduling of the app\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [x] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n\r\n\r\n- [ ] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [x] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [x] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Data layout optimization (strategy generation). Part 2: data source f…"}},{"before":"3df41a32865073f8b8ca8b359cc9dc5ea502670a","after":"2f57a7c0e99cc11e0642a0dc3798a69fd120af93","ref":"refs/heads/main","pushedAt":"2024-06-19T17:47:34.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"teamurko","name":"Stas Pak","path":"/teamurko","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/270580?s=80&v=4"},"commit":{"message":"Data layout optimization (strategy generation). Part 1: strategy class with persistence (#108)\n\n## Summary\r\nThis is part 1 of a new feature: data layout optimization library,\r\nstrategy generation.\r\nAdded strategy class and persistence utilities. Refactored existing\r\ncompaction app to use the library config.\r\n\r\nThe following 3 components will be added eventually:\r\n1) DLO library that has primitives for generating data layout\r\noptimization strategies\r\n2) App that generates strategies for all tables\r\n3) Scheduling of the app\r\n\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [x] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [x] Refactoring\r\n- [x] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n\r\n\r\n- [ ] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [x] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [x] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Data layout optimization (strategy generation). Part 1: strategy clas…"}},{"before":"54e3a8958f1557ccfbac727436136399c74d2e08","after":"3df41a32865073f8b8ca8b359cc9dc5ea502670a","ref":"refs/heads/main","pushedAt":"2024-06-17T18:08:37.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"jainlavina","name":"Lavina Jain","path":"/jainlavina","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114708561?s=80&v=4"},"commit":{"message":"[PR4/5]: Add S3FileIO (#125)\n\n## Summary\r\nThis is the fourth PR in a sequence of PRs to add support for S3\r\nstorage.\r\n\r\nOpenhouse catalog currently supports HDFS as the storage backend. The\r\nend goal of this effort is to add the integration with S3 so that the\r\nstorage backend can be configured to be S3 vs HDFS based on storage\r\ntype.\r\n\r\nThe entire work is done via a series of PRs:\r\n1. Add S3 Storage type and S3StorageClient.\r\n2. Add base class for StorageClient and move common logic like\r\nvalidation of properties there to avoid code duplication.\r\n3. Add S3Storage implementation that uses S3StorageClient.\r\n4. Add support for using S3FileIO for S3 storage type.\r\n5. Add a recipe for end-to-end testing in docker.\r\n\r\nThis PR addresses 4 by adding S3FileIO.\r\n\r\nSushant has already done 5. So, this marks the completion of S3\r\nintegration.\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [x] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\nTesting by running the oh-s3-spark recipe in docker.\r\n\r\n1. Run docker:\r\n\r\n$ docker compose up -d\r\n[+] Building 0.0s (0/0) docker:desktop-linux\r\n[+] Running 16/16\r\n✔ Network oh-s3-spark_default Cre... 0.0s\r\n✔ Container local.spark-master St... 0.2s\r\n✔ Container oh-s3-spark-prometheus-1 Started 0.2s\r\n✔ Container local.mysql Started 0.2s\r\n✔ Container local.minioS3 Started 0.2s\r\n✔ Container local.opa Started 0.2s\r\n! opa The requested image's platform (linux/amd64) does not match the\r\ndetected host platform (linux/arm64/v8) and no specific platform was\r\nrequested 0.0s\r\n! spark-master The requested image's platform (linux/amd64) does not\r\nmatch the detected host platform (linux/arm64/v8) and no specific\r\nplatform was requested 0.0s\r\n✔ Container local.spark-livy Star... 0.1s\r\n✔ Container local.spark-worker-a Started 0.1s\r\n✔ Container local.minioClient Sta... 0.0s\r\n✔ Container local.openhouse-housetables Started 0.0s\r\n! spark-worker-a The requested image's platform (linux/amd64) does not\r\nmatch the detected host platform (linux/arm64/v8) and no specific\r\nplatform was requested 0.0s\r\n! spark-livy The requested image's platform (linux/amd64) does not match\r\nthe detected host platform (linux/arm64/v8) and no specific platform was\r\nrequested 0.0s\r\n✔ Container local.openhouse-jobs Started 0.0s\r\n✔ Container local.openhouse-tables Started 0.0s\r\nlajain-mn2:oh-s3-spark lajain$ docker exec -it local.spark-master\r\n/bin/bash\r\n\r\n\"Screenshot\r\n\r\n2. Login to MinIO.\r\n\"Screenshot\r\n\r\n\"Screenshot\r\n\r\n3. Run spark shell:\r\nopenhouse@cff3c38358c5:/opt/spark$ export AWS_REGION=us-east-1\r\nopenhouse@cff3c38358c5:/opt/spark$ bin/spark-shell --packages\r\norg.apache.iceberg:iceberg-spark-runtime-3.1_2.12:1.2.0,software.amazon.awssdk:bundle:2.20.18,software.amazon.awssdk:url-connection-client:2.20.18\r\n\\\r\n> --jars openhouse-spark-runtime_2.12-*-all.jar \\\r\n> --conf\r\nspark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,com.linkedin.openhouse.spark.extensions.OpenhouseSparkSessionExtensions\r\n\\\r\n> --conf\r\nspark.sql.catalog.openhouse=org.apache.iceberg.spark.SparkCatalog \\\r\n> --conf\r\nspark.sql.catalog.openhouse.catalog-impl=com.linkedin.openhouse.spark.OpenHouseCatalog\r\n\\\r\n> --conf\r\nspark.sql.catalog.openhouse.metrics-reporter-impl=com.linkedin.openhouse.javaclient.OpenHouseMetricsReporter\r\n\\\r\n> --conf spark.sql.catalog.openhouse.uri=http://openhouse-tables:8080 \\\r\n> --conf spark.sql.catalog.openhouse.auth-token=$(cat\r\n/var/config/$(whoami).token) \\\r\n> --conf spark.sql.catalog.openhouse.cluster=LocalS3Cluster \\\r\n> --conf\r\nspark.sql.catalog.openhouse.io-impl=org.apache.iceberg.aws.s3.S3FileIO \\\r\n> --conf spark.sql.catalog.openhouse.s3.endpoint=http://minioS3:9000 \\\r\n> --conf spark.sql.catalog.openhouse.s3.access-key-id=admin \\\r\n> --conf spark.sql.catalog.openhouse.s3.secret-access-key=password \\\r\n> --conf spark.sql.catalog.openhouse.s3.path-style-access=true\r\n:: loading settings :: url =\r\njar:file:/opt/spark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml\r\nIvy Default Cache set to: /home/openhouse/.ivy2/cache\r\nThe jars for the packages stored in: /home/openhouse/.ivy2/jars\r\norg.apache.iceberg#iceberg-spark-runtime-3.1_2.12 added as a dependency\r\nsoftware.amazon.awssdk#bundle added as a dependency\r\nsoftware.amazon.awssdk#url-connection-client added as a dependency\r\n:: resolving dependencies ::\r\norg.apache.spark#spark-submit-parent-9d1e3e8a-c713-44f0-93b3-bd5029daa8e2;1.0\r\n\tconfs: [default]\r\nfound org.apache.iceberg#iceberg-spark-runtime-3.1_2.12;1.2.0 in central\r\n\r\n\tfound software.amazon.awssdk#bundle;2.20.18 in central\r\n\tfound software.amazon.eventstream#eventstream;1.0.1 in central\r\n\tfound software.amazon.awssdk#url-connection-client;2.20.18 in central\r\n\tfound software.amazon.awssdk#utils;2.20.18 in central\r\n\tfound org.reactivestreams#reactive-streams;1.0.3 in central\r\n\tfound software.amazon.awssdk#annotations;2.20.18 in central\r\n\tfound org.slf4j#slf4j-api;1.7.30 in central\r\n\tfound software.amazon.awssdk#http-client-spi;2.20.18 in central\r\n\tfound software.amazon.awssdk#metrics-spi;2.20.18 in central\r\ndownloading\r\nhttps://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.1_2.12/1.2.0/iceberg-spark-runtime-3.1_2.12-1.2.0.jar\r\n...\r\n[SUCCESSFUL ]\r\norg.apache.iceberg#iceberg-spark-runtime-3.1_2.12;1.2.0!iceberg-spark-runtime-3.1_2.12.jar\r\n(966ms)\r\ndownloading\r\nhttps://repo1.maven.org/maven2/software/amazon/awssdk/bundle/2.20.18/bundle-2.20.18.jar\r\n...\r\n\t[SUCCESSFUL ] software.amazon.awssdk#bundle;2.20.18!bundle.jar (8889ms)\r\ndownloading\r\nhttps://repo1.maven.org/maven2/software/amazon/awssdk/url-connection-client/2.20.18/url-connection-client-2.20.18.jar\r\n...\r\n[SUCCESSFUL ]\r\nsoftware.amazon.awssdk#url-connection-client;2.20.18!url-connection-client.jar\r\n(66ms)\r\ndownloading\r\nhttps://repo1.maven.org/maven2/software/amazon/eventstream/eventstream/1.0.1/eventstream-1.0.1.jar\r\n...\r\n[SUCCESSFUL ]\r\nsoftware.amazon.eventstream#eventstream;1.0.1!eventstream.jar (59ms)\r\ndownloading\r\nhttps://repo1.maven.org/maven2/software/amazon/awssdk/utils/2.20.18/utils-2.20.18.jar\r\n...\r\n\t[SUCCESSFUL ] software.amazon.awssdk#utils;2.20.18!utils.jar (66ms)\r\ndownloading\r\nhttps://repo1.maven.org/maven2/software/amazon/awssdk/annotations/2.20.18/annotations-2.20.18.jar\r\n...\r\n[SUCCESSFUL ] software.amazon.awssdk#annotations;2.20.18!annotations.jar\r\n(62ms)\r\ndownloading\r\nhttps://repo1.maven.org/maven2/software/amazon/awssdk/http-client-spi/2.20.18/http-client-spi-2.20.18.jar\r\n...\r\n[SUCCESSFUL ]\r\nsoftware.amazon.awssdk#http-client-spi;2.20.18!http-client-spi.jar\r\n(61ms)\r\ndownloading\r\nhttps://repo1.maven.org/maven2/org/reactivestreams/reactive-streams/1.0.3/reactive-streams-1.0.3.jar\r\n...\r\n[SUCCESSFUL ]\r\norg.reactivestreams#reactive-streams;1.0.3!reactive-streams.jar (58ms)\r\ndownloading\r\nhttps://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.30/slf4j-api-1.7.30.jar\r\n...\r\n\t[SUCCESSFUL ] org.slf4j#slf4j-api;1.7.30!slf4j-api.jar (59ms)\r\ndownloading\r\nhttps://repo1.maven.org/maven2/software/amazon/awssdk/metrics-spi/2.20.18/metrics-spi-2.20.18.jar\r\n...\r\n[SUCCESSFUL ] software.amazon.awssdk#metrics-spi;2.20.18!metrics-spi.jar\r\n(56ms)\r\n:: resolution report :: resolve 153033ms :: artifacts dl 10382ms\r\n\t:: modules in use:\r\norg.apache.iceberg#iceberg-spark-runtime-3.1_2.12;1.2.0 from central in\r\n[default]\r\n\torg.reactivestreams#reactive-streams;1.0.3 from central in [default]\r\n\torg.slf4j#slf4j-api;1.7.30 from central in [default]\r\n\tsoftware.amazon.awssdk#annotations;2.20.18 from central in [default]\r\n\tsoftware.amazon.awssdk#bundle;2.20.18 from central in [default]\r\nsoftware.amazon.awssdk#http-client-spi;2.20.18 from central in [default]\r\n\tsoftware.amazon.awssdk#metrics-spi;2.20.18 from central in [default]\r\nsoftware.amazon.awssdk#url-connection-client;2.20.18 from central in\r\n[default]\r\n\tsoftware.amazon.awssdk#utils;2.20.18 from central in [default]\r\n\tsoftware.amazon.eventstream#eventstream;1.0.1 from central in [default]\r\n\t---------------------------------------------------------------------\r\n\t| | modules || artifacts |\r\n\t| conf | number| search|dwnlded|evicted|| number|dwnlded|\r\n\t---------------------------------------------------------------------\r\n\t| default | 10 | 10 | 10 | 0 || 10 | 10 |\r\n\t---------------------------------------------------------------------\r\n:: retrieving ::\r\norg.apache.spark#spark-submit-parent-9d1e3e8a-c713-44f0-93b3-bd5029daa8e2\r\n\tconfs: [default]\r\n\t10 artifacts copied, 0 already retrieved (480145kB/974ms)\r\n2024-06-14 00:51:50,318 WARN util.NativeCodeLoader: Unable to load\r\nnative-hadoop library for your platform... using builtin-java classes\r\nwhere applicable\r\nSetting default log level to \"WARN\".\r\nTo adjust logging level use sc.setLogLevel(newLevel). For SparkR, use\r\nsetLogLevel(newLevel).\r\nSpark context Web UI available at http://cff3c38358c5:4040\r\nSpark context available as 'sc' (master = local[*], app id =\r\nlocal-1718326322630).\r\nSpark session available as 'spark'.\r\nWelcome to\r\n ____ __\r\n / __/__ ___ _____/ /__\r\n _\\ \\/ _ \\/ _ `/ __/ '_/\r\n /___/ .__/\\_,_/_/ /_/\\_\\ version 3.1.1\r\n /_/\r\n \r\nUsing Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_232)\r\nType in expressions to have them evaluated.\r\nType :help for more information.\r\n\r\nscala> \r\n\r\nThe bucket right now is empty:\r\n\"Screenshot\r\n\r\n4. Create table:\r\nscala> spark.sql(\"CREATE TABLE openhouse.db.tb (ts timestamp, col1\r\nstring, col2 string) PARTITIONED BY (days(ts))\").show()\r\n++\r\n||\r\n++\r\n++\r\n\r\n\r\nscala> spark.sql(\"DESCRIBE TABLE openhouse.db.tb\").show()\r\nSLF4J: Failed to load class \"org.slf4j.impl.StaticLoggerBinder\".\r\nSLF4J: Defaulting to no-operation (NOP) logger implementation\r\nSLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for\r\nfurther details.\r\n+--------------+---------+-------+\r\n| col_name|data_type|comment|\r\n+--------------+---------+-------+\r\n| ts|timestamp| |\r\n| col1| string| |\r\n| col2| string| |\r\n| | | |\r\n|# Partitioning| | |\r\n| Part 0| days(ts)| |\r\n+--------------+---------+-------+\r\n\r\n\"Screenshot\r\n\r\n\"Screenshot\r\n\r\n\"Screenshot\r\n\r\n5. Add data\r\n\r\nscala> spark.sql(\"INSERT INTO TABLE openhouse.db.tb VALUES\r\n(current_timestamp(), 'val1', 'val2')\")\r\nres2: org.apache.spark.sql.DataFrame = []\r\n\r\nscala> spark.sql(\"INSERT INTO TABLE openhouse.db.tb VALUES\r\n(date_sub(CAST(current_timestamp() as DATE), 30), 'val1', 'val2')\")\r\nres3: org.apache.spark.sql.DataFrame = []\r\n\r\nscala> spark.sql(\"INSERT INTO TABLE openhouse.db.tb VALUES\r\n(date_sub(CAST(current_timestamp() as DATE), 60), 'val1', 'val2')\")\r\nres4: org.apache.spark.sql.DataFrame = []\r\n\r\nscala> spark.sql(\"SELECT * FROM openhouse.db.tb\").show()\r\n+--------------------+----+----+\r\n| ts|col1|col2|\r\n+--------------------+----+----+\r\n| 2024-05-15 00:00:00|val1|val2|\r\n| 2024-04-15 00:00:00|val1|val2|\r\n|2024-06-14 00:55:...|val1|val2|\r\n+--------------------+----+----+\r\n\r\n\r\nscala> spark.sql(\"SHOW TABLES IN openhouse.db\").show()\r\n+---------+---------+\r\n|namespace|tableName|\r\n+---------+---------+\r\n| db| tb|\r\n+---------+---------+\r\n\r\n\"Screenshot\r\n\r\nTest using table service:\r\n$ curl \"${curlArgs[@]}\" -XPOST\r\nhttp://localhost:8000/v1/databases/d3/tables/ \\\r\n> --data-raw '{\r\n> \"tableId\": \"t1\",\r\n> \"databaseId\": \"d3\",\r\n> \"baseTableVersion\": \"INITIAL_VERSION\",\r\n> \"clusterId\": \"LocalS3Cluster\",\r\n> \"schema\": \"{\\\"type\\\": \\\"struct\\\", \\\"fields\\\": [{\\\"id\\\":\r\n1,\\\"required\\\": true,\\\"name\\\": \\\"id\\\",\\\"type\\\": \\\"string\\\"},{\\\"id\\\":\r\n2,\\\"required\\\": true,\\\"name\\\": \\\"name\\\",\\\"type\\\": \\\"string\\\"},{\\\"id\\\":\r\n3,\\\"required\\\": true,\\\"name\\\": \\\"ts\\\",\\\"type\\\": \\\"timestamp\\\"}]}\",\r\n> \"timePartitioning\": {\r\n> \"columnName\": \"ts\",\r\n> \"granularity\": \"HOUR\"\r\n> },\r\n> \"clustering\": [\r\n> {\r\n> \"columnName\": \"name\"\r\n> }\r\n> ],\r\n> \"tableProperties\": {\r\n> \"key\": \"value\"\r\n> }\r\n> }' | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 2146 0 1576 100 570 4184 1513 --:--:-- --:--:-- --:--:-- 5692\r\n{\r\n \"clusterId\" : \"LocalS3Cluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1718327830198,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1718327830198,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"s3://openhouse-bucket/d3/t1-394d8186-143f-482a-b5e5-e6aa6e382556/00000-8e498f9d-153e-412f-bb0e-476cfcab926d.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalS3Cluster\",\r\n \"openhouse.creationTime\" : \"1718327830198\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1718327830198\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"s3://openhouse-bucket/d3/t1-394d8186-143f-482a-b5e5-e6aa6e382556/00000-8e498f9d-153e-412f-bb0e-476cfcab926d.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n \"openhouse.tableUUID\" : \"394d8186-143f-482a-b5e5-e6aa6e382556\",\r\n \"openhouse.tableUri\" : \"LocalS3Cluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"394d8186-143f-482a-b5e5-e6aa6e382556\",\r\n \"tableUri\" : \"LocalS3Cluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n}\r\n\r\n\"Screenshot\r\n\r\n$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/t1 | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 1576 0 1576 0 0 9477 0 --:--:-- --:--:-- --:--:-- 9493\r\n{\r\n \"clusterId\" : \"LocalS3Cluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1718327830198,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1718327830198,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"s3://openhouse-bucket/d3/t1-394d8186-143f-482a-b5e5-e6aa6e382556/00000-8e498f9d-153e-412f-bb0e-476cfcab926d.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalS3Cluster\",\r\n \"openhouse.creationTime\" : \"1718327830198\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1718327830198\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"s3://openhouse-bucket/d3/t1-394d8186-143f-482a-b5e5-e6aa6e382556/00000-8e498f9d-153e-412f-bb0e-476cfcab926d.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n \"openhouse.tableUUID\" : \"394d8186-143f-482a-b5e5-e6aa6e382556\",\r\n \"openhouse.tableUri\" : \"LocalS3Cluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"394d8186-143f-482a-b5e5-e6aa6e382556\",\r\n \"tableUri\" : \"LocalS3Cluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n}\r\n\r\n$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/ | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 1590 0 1590 0 0 22490 0 --:--:-- --:--:-- --:--:-- 22714\r\n{\r\n \"results\" : [\r\n {\r\n \"clusterId\" : \"LocalS3Cluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1718327830198,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1718327830198,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"s3://openhouse-bucket/d3/t1-394d8186-143f-482a-b5e5-e6aa6e382556/00000-8e498f9d-153e-412f-bb0e-476cfcab926d.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalS3Cluster\",\r\n \"openhouse.creationTime\" : \"1718327830198\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1718327830198\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"s3://openhouse-bucket/d3/t1-394d8186-143f-482a-b5e5-e6aa6e382556/00000-8e498f9d-153e-412f-bb0e-476cfcab926d.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n\"openhouse.tableUUID\" : \"394d8186-143f-482a-b5e5-e6aa6e382556\",\r\n \"openhouse.tableUri\" : \"LocalS3Cluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"394d8186-143f-482a-b5e5-e6aa6e382556\",\r\n \"tableUri\" : \"LocalS3Cluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n }\r\n ]\r\n}\r\n\r\nDelete table:\r\n$ curl \"${curlArgs[@]}\" -XDELETE\r\nhttp://localhost:8000/v1/databases/d3/tables/t1\r\n\r\nValid that the table is deleted:\r\n![Uploading Screenshot 2024-06-13 at 6.21.03 PM.png…]()\r\n\r\n\r\n- [x] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [x] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"[PR4/5]: Add S3FileIO (#125)"}},{"before":"acd482f9ff60e86d378a12158fface715f69b6cb","after":"54e3a8958f1557ccfbac727436136399c74d2e08","ref":"refs/heads/main","pushedAt":"2024-06-16T04:32:04.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"abhisheknath2011","name":"Abhishek Nath","path":"/abhisheknath2011","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/679579?s=80&v=4"},"commit":{"message":"Exclude metrics-core lib pulled in by hadoop yarn node manager (#126)\n\n## Summary\r\n\r\n\r\nHadoop 2.10.0 version (i.e. `hadoop-yarn` lib) pulls in very old version\r\n(`3.0.1`) of `com.codahale.metrics:metrics-core` lib. This lib is\r\nbundled in tables.jar and jobs.jar fat jar. Also some new methods such\r\nas\r\n[gauge](https://www.javadoc.io/doc/io.dropwizard.metrics/metrics-core/3.2.0/com/codahale/metrics/MetricRegistry.html#gauge-java.lang.String-com.codahale.metrics.MetricRegistry.MetricSupplier-)\r\netc. are added in `metrics-core` lib version starting `3.2.0`. So when\r\ntables.jar and jobs.jar coexists with higher version of metrics-core lib\r\nin the classpath and if new MetricRegistry APIs (such as gauge) are used\r\nby some codebase that results in method not found error. Hence,\r\nexcluding `metrics-core` lib as this lib is not used in the OSS codebase\r\nand we can always pin higher version of this lib if needed.\r\n\r\n```\r\n| | | +--- org.apache.hadoop:hadoop-yarn-server-nodemanager:2.10.0\r\n| | | | +--- org.apache.hadoop:hadoop-yarn-common:2.10.0 (*)\r\n| | | | +--- org.apache.hadoop:hadoop-yarn-api:2.10.0 (*)\r\n| | | | +--- org.apache.hadoop:hadoop-yarn-registry:2.10.0 (*)\r\n| | | | +--- javax.xml.bind:jaxb-api:2.2.2 (*)\r\n| | | | +--- org.codehaus.jettison:jettison:1.1\r\n| | | | +--- commons-lang:commons-lang:2.6\r\n| | | | +--- javax.servlet:servlet-api:2.5\r\n| | | | +--- commons-codec:commons-codec:1.4 -> 1.9\r\n| | | | +--- com.sun.jersey:jersey-core:1.9\r\n| | | | +--- com.sun.jersey:jersey-client:1.9 (*)\r\n| | | | +--- org.mortbay.jetty:jetty-util:6.1.26\r\n| | | | +--- com.google.guava:guava:11.0.2 -> 31.1-jre (*)\r\n| | | | +--- commons-logging:commons-logging:1.1.3 -> 1.2\r\n| | | | +--- org.slf4j:slf4j-api:1.7.25 -> 1.7.36\r\n| | | | +--- com.google.protobuf:protobuf-java:2.5.0\r\n| | | | +--- com.codahale.metrics:metrics-core:3.0.1\r\n\r\n```\r\n\r\nMethod not found error: \r\n```\r\ncom.codahale.metrics.MetricRegistry.gauge(Ljava/lang/String;Lcom/codahale/metrics/MetricRegistry$MetricSupplier;)Lcom/codahale/metrics/Gauge;\r\n```\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [ ] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n- [x] Lib exclusion\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n\r\n\r\n- [x] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\n`./gradlew clean build` passed. \r\n\r\nTested using local docker\r\n\r\nCreate table:\r\n```\r\nanath1@anath1-mn1 oh-hadoop-spark % curl \"${curlArgs[@]}\" -XPOST http://localhost:8000/v1/databases/d3/tables/ \\\r\n--data-raw '{\r\n \"tableId\": \"t1\",\r\n \"databaseId\": \"d3\",\r\n \"baseTableVersion\": \"INITIAL_VERSION\",\r\n \"clusterId\": \"LocalHadoopCluster\",\r\n \"schema\": \"{\\\"type\\\": \\\"struct\\\", \\\"fields\\\": [{\\\"id\\\": 1,\\\"required\\\": true,\\\"name\\\": \\\"id\\\",\\\"type\\\": \\\"string\\\"},{\\\"id\\\": 2,\\\"required\\\": true,\\\"name\\\": \\\"name\\\",\\\"type\\\": \\\"string\\\"},{\\\"id\\\": 3,\\\"required\\\": true,\\\"name\\\": \\\"ts\\\",\\\"type\\\": \\\"timestamp\\\"}]}\",\r\n \"timePartitioning\": {\r\n \"columnName\": \"ts\",\r\n \"granularity\": \"HOUR\"\r\n },\r\n \"clustering\": [\r\n {\r\n \"columnName\": \"name\"\r\n }\r\n ],\r\n \"tableProperties\": {\r\n \"key\": \"value\"\r\n }\r\n}'\r\n\r\n{\"tableId\":\"t1\",\"databaseId\":\"d3\",\"clusterId\":\"LocalHadoopCluster\",\"tableUri\":\"LocalHadoopCluster.d3.t1\",\"tableUUID\":\"12b090ff-0dce-487f-8e74-5d18c55c68da\",\"tableLocation\":\"hdfs://namenode:9000/data/openhouse/d3/t1-12b090ff-0dce-487f-8e74-5d18c55c68da/00000-9a3a852b-26d7-43f6-8340-c0687466c3f5.metadata.json\",\"tableVersion\":\"INITIAL_VERSION\",\"tableCreator\":\"DUMMY_ANONYMOUS_USER\",\"schema\":\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\"lastModifiedTime\":1718344489040,\"creationTime\":1718344489040,\"tableProperties\":{\"policies\":\"\",\"write.metadata.delete-after-commit.enabled\":\"true\",\"openhouse.tableId\":\"t1\",\"openhouse.clusterId\":\"LocalHadoopCluster\",\"openhouse.lastModifiedTime\":\"1718344489040\",\"openhouse.tableVersion\":\"INITIAL_VERSION\",\"openhouse.creationTime\":\"1718344489040\",\"openhouse.tableUri\":\"LocalHadoopCluster.d3.t1\",\"write.format.default\":\"orc\",\"write.metadata.previous-versions-max\":\"28\",\"openhouse.databaseId\":\"d3\",\"openhouse.tableType\":\"PRIMARY_TABLE\",\"openhouse.tableLocation\":\"/data/openhouse/d3/t1-12b090ff-0dce-487f-8e74-5d18c55c68da/00000-9a3a852b-26d7-43f6-8340-c0687466c3f5.metadata.json\",\"openhouse.tableUUID\":\"12b090ff-0dce-487f-8e74-5d18c55c68da\",\"key\":\"value\",\"openhouse.tableCreator\":\"DUMMY_ANONYMOUS_USER\"},\"timePartitioning\":{\"columnName\":\"ts\",\"granularity\":\"HOUR\"},\"clustering\":[{\"columnName\":\"name\",\"transform\":null}],\"policies\":null,\"tableType\":\"PRIMARY_TABLE\"}\r\n```\r\nList table:\r\n```\r\nanath1@anath1-mn1 oh-hadoop-spark % curl \"${curlArgs[@]}\" -XGET http://localhost:8000/v1/databases/d3/tables/\r\n{\"results\":[{\"tableId\":\"t1\",\"databaseId\":\"d3\",\"clusterId\":\"LocalHadoopCluster\",\"tableUri\":\"LocalHadoopCluster.d3.t1\",\"tableUUID\":\"12b090ff-0dce-487f-8e74-5d18c55c68da\",\"tableLocation\":\"hdfs://namenode:9000/data/openhouse/d3/t1-12b090ff-0dce-487f-8e74-5d18c55c68da/00000-9a3a852b-26d7-43f6-8340-c0687466c3f5.metadata.json\",\"tableVersion\":\"INITIAL_VERSION\",\"tableCreator\":\"DUMMY_ANONYMOUS_USER\",\"schema\":\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\"lastModifiedTime\":1718344489040,\"creationTime\":1718344489040,\"tableProperties\":{\"policies\":\"\",\"write.metadata.delete-after-commit.enabled\":\"true\",\"openhouse.tableId\":\"t1\",\"openhouse.clusterId\":\"LocalHadoopCluster\",\"openhouse.lastModifiedTime\":\"1718344489040\",\"openhouse.tableVersion\":\"INITIAL_VERSION\",\"openhouse.creationTime\":\"1718344489040\",\"openhouse.tableUri\":\"LocalHadoopCluster.d3.t1\",\"write.format.default\":\"orc\",\"write.metadata.previous-versions-max\":\"28\",\"openhouse.databaseId\":\"d3\",\"openhouse.tableType\":\"PRIMARY_TABLE\",\"openhouse.tableLocation\":\"/data/openhouse/d3/t1-12b090ff-0dce-487f-8e74-5d18c55c68da/00000-9a3a852b-26d7-43f6-8340-c0687466c3f5.metadata.json\",\"openhouse.tableUUID\":\"12b090ff-0dce-487f-8e74-5d18c55c68da\",\"key\":\"value\",\"openhouse.tableCreator\":\"DUMMY_ANONYMOUS_USER\"},\"timePartitioning\":{\"columnName\":\"ts\",\"granularity\":\"HOUR\"},\"clustering\":[{\"columnName\":\"name\",\"transform\":null}],\"policies\":null,\"tableType\":\"PRIMARY_TABLE\"}]}\r\n```\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [ ] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Exclude metrics-core lib pulled in by hadoop yarn node manager (#126)"}},{"before":"2678df25d32bcdd6dbcc37a1b93208becf869343","after":"acd482f9ff60e86d378a12158fface715f69b6cb","ref":"refs/heads/main","pushedAt":"2024-06-11T20:26:28.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"HotSushi","name":"Sushant Raikar","path":"/HotSushi","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6441597?s=80&v=4"},"commit":{"message":"Add S3 docker setup for OpenHouse (#123)\n\nAdd S3 docker setup for OpenHouse (#123)","shortMessageHtmlLink":"Add S3 docker setup for OpenHouse (#123)"}},{"before":"82242fbe166843468d93abef966dcc724d4f4c7a","after":"2678df25d32bcdd6dbcc37a1b93208becf869343","ref":"refs/heads/main","pushedAt":"2024-06-11T00:13:32.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"jainlavina","name":"Lavina Jain","path":"/jainlavina","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114708561?s=80&v=4"},"commit":{"message":"[PR 3/5]: Add S3 storage (#122)\n\n## Summary\r\nThis is the third PR in a sequence of PRs to add support for S3 storage.\r\n\r\nOpenhouse catalog currently supports HDFS as the storage backend. The\r\nend goal of this effort is to add the integration with S3 so that the\r\nstorage backend can be configured to be S3 vs HDFS based on storage\r\ntype.\r\nThe entire work will be done via a series of PRs:\r\n\r\n1. Add S3 Storage type and S3StorageClient.\r\n2. Add base class for StorageClient and move common logic like\r\nvalidation of properties there to avoid code duplication.\r\n3. Add S3Storage implementation that uses S3StorageClient.\r\n4. Add support for using S3FileIO for S3 storage type.\r\n5. Add a recipe for end-to-end testing in docker.\r\n\r\nThis PR addresses 3 by adding S3Storage class. \r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [x] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\nTested oh-hadoop-spark recipe. New recipe for S3 testing will be added\r\nin successive PRs.\r\nAdded unit tests.\r\n\r\n- [x] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [x] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"[PR 3/5]: Add S3 storage (#122)"}},{"before":"c699979cf5ce999348d9703e2ba99699daadb302","after":"82242fbe166843468d93abef966dcc724d4f4c7a","ref":"refs/heads/main","pushedAt":"2024-06-07T21:28:50.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"HotSushi","name":"Sushant Raikar","path":"/HotSushi","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6441597?s=80&v=4"},"commit":{"message":"Introduce `AllocateTableStorage` in Storage interface (#119)\n\nIntroduce `AllocateTableStorage` in Storage interface (#119)","shortMessageHtmlLink":"Introduce AllocateTableStorage in Storage interface (#119)"}},{"before":null,"after":"c699979cf5ce999348d9703e2ba99699daadb302","ref":"refs/heads/kai/azure-sandbox","pushedAt":"2024-06-06T22:24:03.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"kmcclenn","name":"Kai McClennen","path":"/kmcclenn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/63942422?s=80&v=4"},"commit":{"message":"[PR 2/5] Add validation for storage properties to BaseStorageClient (#113)\n\n## Summary\r\nThis is the second PR in a sequence of PRs to add support for S3\r\nstorage.\r\n\r\nOpenhouse catalog currently supports HDFS as the storage backend. The\r\nend goal of this effort is to add the integration with S3 so that the\r\nstorage backend can be configured to be S3 vs HDFS based on storage\r\ntype.\r\nThe entire work will be done via a series of PRs:\r\n\r\n1. Add S3 Storage type and S3StorageClient.\r\n2. Add base class for StorageClient and move common logic like\r\nvalidation of properties there to avoid code duplication.\r\n3. Add S3Storage implementation that uses S3StorageClient.\r\n4. Add support for using S3FileIO for S3 storage type.\r\n5. Add a recipe for end-to-end testing in docker.\r\n\r\nThis PR addresses 2 by adding BaseStorageClient and common logic to\r\nvalidate storage properties for a given storage type.\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [ ] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\nAdded more unot tests for S3 storage properties validation.\r\nManually tested oh-hadoop-spark recipe in docker.\r\n\r\n- [x] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [x] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nDocker testing:\r\n\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XPOST\r\nhttp://localhost:8000/v1/databases/d3/tables/ \\\r\n> --data-raw '{\r\n> \"tableId\": \"t1\",\r\n> \"databaseId\": \"d3\",\r\n> \"baseTableVersion\": \"INITIAL_VERSION\",\r\n> \"clusterId\": \"LocalHadoopCluster\",\r\n> \"schema\": \"{\\\"type\\\": \\\"struct\\\", \\\"fields\\\": [{\\\"id\\\":\r\n1,\\\"required\\\": true,\\\"name\\\": \\\"id\\\",\\\"type\\\": \\\"string\\\"},{\\\"id\\\":\r\n2,\\\"required\\\": true,\\\"name\\\": \\\"name\\\",\\\"type\\\": \\\"string\\\"},{\\\"id\\\":\r\n3,\\\"required\\\": true,\\\"name\\\": \\\"ts\\\",\\\"type\\\": \\\"timestamp\\\"}]}\",\r\n> \"timePartitioning\": {\r\n> \"columnName\": \"ts\",\r\n> \"granularity\": \"HOUR\"\r\n> },\r\n> \"clustering\": [\r\n> {\r\n> \"columnName\": \"name\"\r\n> }\r\n> ],\r\n> \"tableProperties\": {\r\n> \"key\": \"value\"\r\n> }\r\n> }' | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 2174 0 1600 100 574 614 220 0:00:02 0:00:02 --:--:-- 835\r\n{\r\n \"clusterId\" : \"LocalHadoopCluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1717110226250,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1717110226250,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"hdfs://namenode:9000/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalHadoopCluster\",\r\n \"openhouse.creationTime\" : \"1717110226250\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1717110226250\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n \"openhouse.tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"openhouse.tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n}\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/t1\r\n\r\n{\"tableId\":\"t1\",\"databaseId\":\"d3\",\"clusterId\":\"LocalHadoopCluster\",\"tableUri\":\"LocalHadoopCluster.d3.t1\",\"tableUUID\":\"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\"tableLocation\":\"hdfs://namenode:9000/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\"tableVersion\":\"INITIAL_VERSION\",\"tableCreator\":\"DUMMY_ANONYMOUS_USER\",\"schema\":\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\"lastModifiedTime\":1717110226250,\"creationTime\":1717110226250,\"tableProperties\":{\"policies\":\"\",\"write.metadata.delete-after-commit.enabled\":\"true\",\"openhouse.tableId\":\"t1\",\"openhouse.clusterId\":\"LocalHadoopCluster\",\"openhouse.lastModifiedTime\":\"1717110226250\",\"openhouse.tableVersion\":\"INITIAL_VERSION\",\"openhouse.creationTime\":\"1717110226250\",\"openhouse.tableUri\":\"LocalHadoopCluster.d3.t1\",\"write.format.default\":\"orc\",\"write.metadata.previous-versions-max\":\"28\",\"openhouse.databaseId\":\"d3\",\"openhouse.tableType\":\"PRIMARY_TABLE\",\"openhouse.tableLocation\":\"/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\"openhouse.tableUUID\":\"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\"key\":\"value\",\"openhouse.tableCreator\":\"DUMMY_ANONYMOUS_USER\"},\"timePartitioning\":{\"columnName\":\"ts\",\"granularity\":\"HOUR\"},\"clustering\":[{\"columnName\":\"name\",\"transform\":null}],\"policies\":null,\"tableType\":\"PRIMARY_TABLE\"}lajain-mn2:oh-hadoop-spark\r\nlajacurl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/ | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 1614 0 1614 0 0 6673 0 --:--:-- --:--:-- --:--:-- 6669\r\n{\r\n \"results\" : [\r\n {\r\n \"clusterId\" : \"LocalHadoopCluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1717110226250,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1717110226250,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"hdfs://namenode:9000/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalHadoopCluster\",\r\n \"openhouse.creationTime\" : \"1717110226250\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1717110226250\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n\"openhouse.tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"openhouse.tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n }\r\n ]\r\n}\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/ | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 1614 0 1614 0 0 14629 0 --:--:-- --:--:-- --:--:-- 14672\r\n{\r\n \"results\" : [\r\n {\r\n \"clusterId\" : \"LocalHadoopCluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1717110226250,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1717110226250,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"hdfs://namenode:9000/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalHadoopCluster\",\r\n \"openhouse.creationTime\" : \"1717110226250\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1717110226250\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n\"openhouse.tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"openhouse.tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n }\r\n ]\r\n}\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XDELETE\r\nhttp://localhost:8000/v1/databases/d3/tables/t1\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/ | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 14 0 14 0 0 593 0 --:--:-- --:--:-- --:--:-- 608\r\n{\r\n \"results\" : []\r\n}\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [x] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"[PR 2/5] Add validation for storage properties to BaseStorageClient (#…"}},{"before":"c699979cf5ce999348d9703e2ba99699daadb302","after":null,"ref":"refs/heads/lejiang/tmp","pushedAt":"2024-06-06T22:13:18.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"jiang95-dev","name":"Levi Jiang","path":"/jiang95-dev","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19853208?s=80&v=4"}},{"before":null,"after":"c699979cf5ce999348d9703e2ba99699daadb302","ref":"refs/heads/lejiang/tmp","pushedAt":"2024-06-06T22:13:12.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jiang95-dev","name":"Levi Jiang","path":"/jiang95-dev","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19853208?s=80&v=4"},"commit":{"message":"[PR 2/5] Add validation for storage properties to BaseStorageClient (#113)\n\n## Summary\r\nThis is the second PR in a sequence of PRs to add support for S3\r\nstorage.\r\n\r\nOpenhouse catalog currently supports HDFS as the storage backend. The\r\nend goal of this effort is to add the integration with S3 so that the\r\nstorage backend can be configured to be S3 vs HDFS based on storage\r\ntype.\r\nThe entire work will be done via a series of PRs:\r\n\r\n1. Add S3 Storage type and S3StorageClient.\r\n2. Add base class for StorageClient and move common logic like\r\nvalidation of properties there to avoid code duplication.\r\n3. Add S3Storage implementation that uses S3StorageClient.\r\n4. Add support for using S3FileIO for S3 storage type.\r\n5. Add a recipe for end-to-end testing in docker.\r\n\r\nThis PR addresses 2 by adding BaseStorageClient and common logic to\r\nvalidate storage properties for a given storage type.\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [ ] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\nAdded more unot tests for S3 storage properties validation.\r\nManually tested oh-hadoop-spark recipe in docker.\r\n\r\n- [x] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [x] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nDocker testing:\r\n\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XPOST\r\nhttp://localhost:8000/v1/databases/d3/tables/ \\\r\n> --data-raw '{\r\n> \"tableId\": \"t1\",\r\n> \"databaseId\": \"d3\",\r\n> \"baseTableVersion\": \"INITIAL_VERSION\",\r\n> \"clusterId\": \"LocalHadoopCluster\",\r\n> \"schema\": \"{\\\"type\\\": \\\"struct\\\", \\\"fields\\\": [{\\\"id\\\":\r\n1,\\\"required\\\": true,\\\"name\\\": \\\"id\\\",\\\"type\\\": \\\"string\\\"},{\\\"id\\\":\r\n2,\\\"required\\\": true,\\\"name\\\": \\\"name\\\",\\\"type\\\": \\\"string\\\"},{\\\"id\\\":\r\n3,\\\"required\\\": true,\\\"name\\\": \\\"ts\\\",\\\"type\\\": \\\"timestamp\\\"}]}\",\r\n> \"timePartitioning\": {\r\n> \"columnName\": \"ts\",\r\n> \"granularity\": \"HOUR\"\r\n> },\r\n> \"clustering\": [\r\n> {\r\n> \"columnName\": \"name\"\r\n> }\r\n> ],\r\n> \"tableProperties\": {\r\n> \"key\": \"value\"\r\n> }\r\n> }' | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 2174 0 1600 100 574 614 220 0:00:02 0:00:02 --:--:-- 835\r\n{\r\n \"clusterId\" : \"LocalHadoopCluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1717110226250,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1717110226250,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"hdfs://namenode:9000/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalHadoopCluster\",\r\n \"openhouse.creationTime\" : \"1717110226250\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1717110226250\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n \"openhouse.tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"openhouse.tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n}\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/t1\r\n\r\n{\"tableId\":\"t1\",\"databaseId\":\"d3\",\"clusterId\":\"LocalHadoopCluster\",\"tableUri\":\"LocalHadoopCluster.d3.t1\",\"tableUUID\":\"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\"tableLocation\":\"hdfs://namenode:9000/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\"tableVersion\":\"INITIAL_VERSION\",\"tableCreator\":\"DUMMY_ANONYMOUS_USER\",\"schema\":\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\"lastModifiedTime\":1717110226250,\"creationTime\":1717110226250,\"tableProperties\":{\"policies\":\"\",\"write.metadata.delete-after-commit.enabled\":\"true\",\"openhouse.tableId\":\"t1\",\"openhouse.clusterId\":\"LocalHadoopCluster\",\"openhouse.lastModifiedTime\":\"1717110226250\",\"openhouse.tableVersion\":\"INITIAL_VERSION\",\"openhouse.creationTime\":\"1717110226250\",\"openhouse.tableUri\":\"LocalHadoopCluster.d3.t1\",\"write.format.default\":\"orc\",\"write.metadata.previous-versions-max\":\"28\",\"openhouse.databaseId\":\"d3\",\"openhouse.tableType\":\"PRIMARY_TABLE\",\"openhouse.tableLocation\":\"/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\"openhouse.tableUUID\":\"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\"key\":\"value\",\"openhouse.tableCreator\":\"DUMMY_ANONYMOUS_USER\"},\"timePartitioning\":{\"columnName\":\"ts\",\"granularity\":\"HOUR\"},\"clustering\":[{\"columnName\":\"name\",\"transform\":null}],\"policies\":null,\"tableType\":\"PRIMARY_TABLE\"}lajain-mn2:oh-hadoop-spark\r\nlajacurl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/ | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 1614 0 1614 0 0 6673 0 --:--:-- --:--:-- --:--:-- 6669\r\n{\r\n \"results\" : [\r\n {\r\n \"clusterId\" : \"LocalHadoopCluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1717110226250,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1717110226250,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"hdfs://namenode:9000/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalHadoopCluster\",\r\n \"openhouse.creationTime\" : \"1717110226250\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1717110226250\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n\"openhouse.tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"openhouse.tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n }\r\n ]\r\n}\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/ | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 1614 0 1614 0 0 14629 0 --:--:-- --:--:-- --:--:-- 14672\r\n{\r\n \"results\" : [\r\n {\r\n \"clusterId\" : \"LocalHadoopCluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1717110226250,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1717110226250,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"hdfs://namenode:9000/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalHadoopCluster\",\r\n \"openhouse.creationTime\" : \"1717110226250\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1717110226250\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n\"openhouse.tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"openhouse.tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n }\r\n ]\r\n}\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XDELETE\r\nhttp://localhost:8000/v1/databases/d3/tables/t1\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/ | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 14 0 14 0 0 593 0 --:--:-- --:--:-- --:--:-- 608\r\n{\r\n \"results\" : []\r\n}\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [x] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"[PR 2/5] Add validation for storage properties to BaseStorageClient (#…"}},{"before":"a291af7e8ed8f4413fc82bcd5d1cdedce0fc3401","after":"c699979cf5ce999348d9703e2ba99699daadb302","ref":"refs/heads/main","pushedAt":"2024-06-05T21:29:02.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"jainlavina","name":"Lavina Jain","path":"/jainlavina","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114708561?s=80&v=4"},"commit":{"message":"[PR 2/5] Add validation for storage properties to BaseStorageClient (#113)\n\n## Summary\r\nThis is the second PR in a sequence of PRs to add support for S3\r\nstorage.\r\n\r\nOpenhouse catalog currently supports HDFS as the storage backend. The\r\nend goal of this effort is to add the integration with S3 so that the\r\nstorage backend can be configured to be S3 vs HDFS based on storage\r\ntype.\r\nThe entire work will be done via a series of PRs:\r\n\r\n1. Add S3 Storage type and S3StorageClient.\r\n2. Add base class for StorageClient and move common logic like\r\nvalidation of properties there to avoid code duplication.\r\n3. Add S3Storage implementation that uses S3StorageClient.\r\n4. Add support for using S3FileIO for S3 storage type.\r\n5. Add a recipe for end-to-end testing in docker.\r\n\r\nThis PR addresses 2 by adding BaseStorageClient and common logic to\r\nvalidate storage properties for a given storage type.\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [ ] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\nAdded more unot tests for S3 storage properties validation.\r\nManually tested oh-hadoop-spark recipe in docker.\r\n\r\n- [x] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [x] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nDocker testing:\r\n\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XPOST\r\nhttp://localhost:8000/v1/databases/d3/tables/ \\\r\n> --data-raw '{\r\n> \"tableId\": \"t1\",\r\n> \"databaseId\": \"d3\",\r\n> \"baseTableVersion\": \"INITIAL_VERSION\",\r\n> \"clusterId\": \"LocalHadoopCluster\",\r\n> \"schema\": \"{\\\"type\\\": \\\"struct\\\", \\\"fields\\\": [{\\\"id\\\":\r\n1,\\\"required\\\": true,\\\"name\\\": \\\"id\\\",\\\"type\\\": \\\"string\\\"},{\\\"id\\\":\r\n2,\\\"required\\\": true,\\\"name\\\": \\\"name\\\",\\\"type\\\": \\\"string\\\"},{\\\"id\\\":\r\n3,\\\"required\\\": true,\\\"name\\\": \\\"ts\\\",\\\"type\\\": \\\"timestamp\\\"}]}\",\r\n> \"timePartitioning\": {\r\n> \"columnName\": \"ts\",\r\n> \"granularity\": \"HOUR\"\r\n> },\r\n> \"clustering\": [\r\n> {\r\n> \"columnName\": \"name\"\r\n> }\r\n> ],\r\n> \"tableProperties\": {\r\n> \"key\": \"value\"\r\n> }\r\n> }' | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 2174 0 1600 100 574 614 220 0:00:02 0:00:02 --:--:-- 835\r\n{\r\n \"clusterId\" : \"LocalHadoopCluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1717110226250,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1717110226250,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"hdfs://namenode:9000/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalHadoopCluster\",\r\n \"openhouse.creationTime\" : \"1717110226250\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1717110226250\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n \"openhouse.tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"openhouse.tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n}\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/t1\r\n\r\n{\"tableId\":\"t1\",\"databaseId\":\"d3\",\"clusterId\":\"LocalHadoopCluster\",\"tableUri\":\"LocalHadoopCluster.d3.t1\",\"tableUUID\":\"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\"tableLocation\":\"hdfs://namenode:9000/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\"tableVersion\":\"INITIAL_VERSION\",\"tableCreator\":\"DUMMY_ANONYMOUS_USER\",\"schema\":\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\"lastModifiedTime\":1717110226250,\"creationTime\":1717110226250,\"tableProperties\":{\"policies\":\"\",\"write.metadata.delete-after-commit.enabled\":\"true\",\"openhouse.tableId\":\"t1\",\"openhouse.clusterId\":\"LocalHadoopCluster\",\"openhouse.lastModifiedTime\":\"1717110226250\",\"openhouse.tableVersion\":\"INITIAL_VERSION\",\"openhouse.creationTime\":\"1717110226250\",\"openhouse.tableUri\":\"LocalHadoopCluster.d3.t1\",\"write.format.default\":\"orc\",\"write.metadata.previous-versions-max\":\"28\",\"openhouse.databaseId\":\"d3\",\"openhouse.tableType\":\"PRIMARY_TABLE\",\"openhouse.tableLocation\":\"/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\"openhouse.tableUUID\":\"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\"key\":\"value\",\"openhouse.tableCreator\":\"DUMMY_ANONYMOUS_USER\"},\"timePartitioning\":{\"columnName\":\"ts\",\"granularity\":\"HOUR\"},\"clustering\":[{\"columnName\":\"name\",\"transform\":null}],\"policies\":null,\"tableType\":\"PRIMARY_TABLE\"}lajain-mn2:oh-hadoop-spark\r\nlajacurl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/ | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 1614 0 1614 0 0 6673 0 --:--:-- --:--:-- --:--:-- 6669\r\n{\r\n \"results\" : [\r\n {\r\n \"clusterId\" : \"LocalHadoopCluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1717110226250,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1717110226250,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"hdfs://namenode:9000/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalHadoopCluster\",\r\n \"openhouse.creationTime\" : \"1717110226250\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1717110226250\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n\"openhouse.tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"openhouse.tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n }\r\n ]\r\n}\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/ | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 1614 0 1614 0 0 14629 0 --:--:-- --:--:-- --:--:-- 14672\r\n{\r\n \"results\" : [\r\n {\r\n \"clusterId\" : \"LocalHadoopCluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1717110226250,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1717110226250,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"hdfs://namenode:9000/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalHadoopCluster\",\r\n \"openhouse.creationTime\" : \"1717110226250\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1717110226250\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"/data/openhouse/d3/t1-3b602ebe-562b-4dcf-8f5e-0713ded56a2b/00000-e605bcfc-0aea-4066-a608-7d45c88ecfbb.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n\"openhouse.tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"openhouse.tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"3b602ebe-562b-4dcf-8f5e-0713ded56a2b\",\r\n \"tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n }\r\n ]\r\n}\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XDELETE\r\nhttp://localhost:8000/v1/databases/d3/tables/t1\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/ | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 14 0 14 0 0 593 0 --:--:-- --:--:-- --:--:-- 608\r\n{\r\n \"results\" : []\r\n}\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [x] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"[PR 2/5] Add validation for storage properties to BaseStorageClient (#…"}},{"before":"7aa5e259b4744d60ae9c8ffef208144ad5032fc3","after":"9b229e270ea5e0cc7e4c49f2b5852f647d3e5cf2","ref":"refs/heads/gh-pages","pushedAt":"2024-06-05T01:51:50.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"deploy: 78fb27c5f07136ffc87f54bdae06f8cb4fbbd9a0","shortMessageHtmlLink":"deploy: 78fb27c"}},{"before":"f2e6319518eb7f3841d991545ce29c04198d9318","after":"78fb27c5f07136ffc87f54bdae06f8cb4fbbd9a0","ref":"refs/heads/docsite","pushedAt":"2024-06-05T01:50:50.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"HotSushi","name":"Sushant Raikar","path":"/HotSushi","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6441597?s=80&v=4"},"commit":{"message":"Update Tables spec and Jobs spec to v0.5.74 (#118)\n\n## Summary\r\nFollowed instructions in\r\nhttps://github.com/linkedin/openhouse/blob/docsite/README.md\r\nand\r\nhttps://github.com/linkedin/openhouse/blob/docsite/specs/README.md\r\n\r\n## Changes\r\n- [X] Documentation\r\n\r\n## Testing Done\r\n`npm install`\r\n& `npm run start` looks good ✅ \r\n\r\n\r\n![image](https://github.com/linkedin/openhouse/assets/6441597/b511355d-f38d-453f-a9bf-991547d99aa4)","shortMessageHtmlLink":"Update Tables spec and Jobs spec to v0.5.74 (#118)"}},{"before":"79fd05ba0472fab33b93e10917d9fa26620b06e8","after":"a291af7e8ed8f4413fc82bcd5d1cdedce0fc3401","ref":"refs/heads/main","pushedAt":"2024-06-04T21:34:23.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"teamurko","name":"Stas Pak","path":"/teamurko","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/270580?s=80&v=4"},"commit":{"message":"Make mapTableResponseToTableMetadata protected in TablesClient (#117)\n\n## Summary\r\nThis change allows to override mapTableResponseToTableMetadata method in\r\na subclass.\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [x] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [ ] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n\r\n\r\n- [ ] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [x] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nNo tests added/changed since the change relaxes method visibility only.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [ ] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Make mapTableResponseToTableMetadata protected in TablesClient (#117)"}},{"before":"226b378bd449e3ff8d537224eb0b816398cd4660","after":"79fd05ba0472fab33b93e10917d9fa26620b06e8","ref":"refs/heads/main","pushedAt":"2024-06-04T17:58:42.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"teamurko","name":"Stas Pak","path":"/teamurko","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/270580?s=80&v=4"},"commit":{"message":"Handle unset execution conf in jobs service (#114)\n\n## Summary\r\nFixed code to handle null execution conf field in JobConf\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [x] Bug Fixes\r\n- [ ] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [x] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n\r\n\r\n- [x] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [x] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\n```\r\ndocker compose --profile with_jobs_scheduler run openhouse-jobs-scheduler - --type SNAPSHOTS_EXPIRATION --cluster local --tablesURL http://openhouse-tables:8080 --jobsURL http://openhouse-jobs:8080\r\n...\r\n2024-06-03 17:52:53 2024-06-04 00:52:53 INFO OtelConfig:70 - initializing open-telemetry sdk\r\n2024-06-03 17:52:54 2024-06-04 00:52:54 INFO Reflections:219 - Reflections took 153 ms to scan 1 urls, producing 4 keys and 10 values\r\n2024-06-03 17:52:54 2024-06-04 00:52:54 INFO JobsScheduler:108 - Starting scheduler\r\n2024-06-03 17:52:54 2024-06-04 00:52:54 INFO JobsScheduler:130 - Fetching task list based on the job type: SNAPSHOTS_EXPIRATION\r\n2024-06-03 17:53:01 2024-06-04 00:53:01 INFO OperationTasksBuilder:29 - metadata: dbName: db, tableName: test, creator: openhouse\r\n2024-06-03 17:53:01 2024-06-04 00:53:01 INFO JobsScheduler:140 - Submitting and running 1 jobs based on the job type: SNAPSHOTS_EXPIRATION\r\n2024-06-03 17:53:01 2024-06-04 00:53:01 INFO OperationTask:71 - Launching job for dbName: db, tableName: test, creator: openhouse\r\n2024-06-03 17:53:06 2024-06-04 00:53:06 INFO OperationTask:97 - Launched a job with id SNAPSHOTS_EXPIRATION_db_test_15c49365-2bb3-4ba6-89ae-5ce396ad67c0 for dbName: db, tableName: test, creator: openhouse\r\n...\r\n2024-06-03 17:58:42 2024-06-04 00:58:42 INFO OperationTask:143 - Finished job for entity dbName: db, tableName: test, creator: openhouse: JobId SNAPSHOTS_EXPIRATION_db_test_15c49365-2bb3-4ba6-89ae-5ce396ad67c0, executionId 0, runTime 47116, queuedTime 17212, state SUCCEEDED\r\n2024-06-03 17:58:42 2024-06-04 00:58:42 INFO JobsScheduler:182 - Finishing scheduler for job type SNAPSHOTS_EXPIRATION, tasks stats: 1 created, 1 succeeded, 0 cancelled (timeout), 0 failed, 0 skipped (no state)\r\n```\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [ ] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"Handle unset execution conf in jobs service (#114)"}},{"before":"6390661e8f66447e6b7d3020357a5a9b5201ea58","after":"226b378bd449e3ff8d537224eb0b816398cd4660","ref":"refs/heads/main","pushedAt":"2024-05-30T23:32:49.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"jiang95-dev","name":"Levi Jiang","path":"/jiang95-dev","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/19853208?s=80&v=4"},"commit":{"message":"JobScheduler switch using getAllTables to searchTables api (#95)","shortMessageHtmlLink":"JobScheduler switch using getAllTables to searchTables api (#95)"}},{"before":"d0c6583c7dd5b73c4140ef74d5672a71592ee9c9","after":"6390661e8f66447e6b7d3020357a5a9b5201ea58","ref":"refs/heads/main","pushedAt":"2024-05-30T20:09:18.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"jainlavina","name":"Lavina Jain","path":"/jainlavina","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/114708561?s=80&v=4"},"commit":{"message":"[PR1/5] Add S3 Storage type and S3StorageClient (#107)\n\n## Summary\r\nOpenhouse catalog currently supports HDFS as the storage backend. The\r\nend goal of this effort is to add the integration with S3 so that the\r\nstorage backend can be configured to be S3 vs HDFS based on storage\r\ntype.\r\nThe entire work will be done via a series of PRs:\r\n1. Add S3 Storage type and S3StorageClient.\r\n2. Add base class for StorageClient and move common logic like\r\nvalidation of properties there to avoid code duplication.\r\n3. Add S3Storage implementation that uses S3StorageClient.\r\n4. Add support for using S3FileIO for S3 storage type.\r\n5. Add a recipe for end-to-end testing in docker.\r\n\r\nThis PR addresses 1 by adding S3 storage type and S3StorageClient.\r\n\r\n## Changes\r\n\r\n- [ ] Client-facing API Changes\r\n- [ ] Internal API Changes\r\n- [ ] Bug Fixes\r\n- [x] New Features\r\n- [ ] Performance Improvements\r\n- [ ] Code Style\r\n- [ ] Refactoring\r\n- [ ] Documentation\r\n- [ ] Tests\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\nAdded unit tests. This is one of the first few PRs to complete plugging\r\nin S3 storage. More test recipes to test with S3 storage will be added\r\nwhen S3 storage is completely plugged in. Currently tested\r\noh-hadoop-spark recipe to ensure that the HDFS integration is not broken\r\nby this change.\r\n\r\n- [x] Manually Tested on local docker setup. Please include commands\r\nran, and their output.\r\n- [ ] Added new tests for the changes made.\r\n- [ ] Updated existing tests to reflect the changes made.\r\n- [ ] No tests added or updated. Please explain why. If unsure, please\r\nfeel free to ask for help.\r\n- [ ] Some other form of testing like staging or soak time in\r\nproduction. Please explain.\r\n\r\nFor all the boxes checked, include a detailed description of the testing\r\ndone for the changes made in this pull request.\r\n\r\nDocker testing oh-hadoop-spark recipe:\r\n\r\n1. Create table:\r\n\r\n$ curl \"${curlArgs[@]}\" -XPOST\r\nhttp://localhost:8000/v1/databases/d3/tables/ \\\r\n> --data-raw '{\r\n> \"tableId\": \"t1\",\r\n> \"databaseId\": \"d3\",\r\n> \"baseTableVersion\": \"INITIAL_VERSION\",\r\n> \"clusterId\": \"LocalHadoopCluster\",\r\n> \"schema\": \"{\\\"type\\\": \\\"struct\\\", \\\"fields\\\": [{\\\"id\\\":\r\n1,\\\"required\\\": true,\\\"name\\\": \\\"id\\\",\\\"type\\\": \\\"string\\\"},{\\\"id\\\":\r\n2,\\\"required\\\": true,\\\"name\\\": \\\"name\\\",\\\"type\\\": \\\"string\\\"},{\\\"id\\\":\r\n3,\\\"required\\\": true,\\\"name\\\": \\\"ts\\\",\\\"type\\\": \\\"timestamp\\\"}]}\",\r\n> \"timePartitioning\": {\r\n> \"columnName\": \"ts\",\r\n> \"granularity\": \"HOUR\"\r\n> },\r\n> \"clustering\": [\r\n> {\r\n> \"columnName\": \"name\"\r\n> }\r\n> ],\r\n> \"tableProperties\": {\r\n> \"key\": \"value\"\r\n> }\r\n> }' | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 2174 0 1600 100 574 548 196 0:00:02 0:00:02 --:--:-- 745\r\n{\r\n \"clusterId\" : \"LocalHadoopCluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1717098450027,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1717098450027,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"hdfs://namenode:9000/data/openhouse/d3/t1-30f90df6-4c45-47ee-916c-7cf0bdea5d4d/00000-5fec4fc6-4bf9-4bab-b6a9-112967a57497.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalHadoopCluster\",\r\n \"openhouse.creationTime\" : \"1717098450027\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1717098450027\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"/data/openhouse/d3/t1-30f90df6-4c45-47ee-916c-7cf0bdea5d4d/00000-5fec4fc6-4bf9-4bab-b6a9-112967a57497.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n \"openhouse.tableUUID\" : \"30f90df6-4c45-47ee-916c-7cf0bdea5d4d\",\r\n \"openhouse.tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"30f90df6-4c45-47ee-916c-7cf0bdea5d4d\",\r\n \"tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n}\r\n\r\n2. Read table:\r\n$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/t1 | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 1600 0 1600 0 0 22736 0 --:--:-- --:--:-- --:--:-- 22857\r\n{\r\n \"clusterId\" : \"LocalHadoopCluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1717098450027,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1717098450027,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"hdfs://namenode:9000/data/openhouse/d3/t1-30f90df6-4c45-47ee-916c-7cf0bdea5d4d/00000-5fec4fc6-4bf9-4bab-b6a9-112967a57497.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalHadoopCluster\",\r\n \"openhouse.creationTime\" : \"1717098450027\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1717098450027\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"/data/openhouse/d3/t1-30f90df6-4c45-47ee-916c-7cf0bdea5d4d/00000-5fec4fc6-4bf9-4bab-b6a9-112967a57497.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n \"openhouse.tableUUID\" : \"30f90df6-4c45-47ee-916c-7cf0bdea5d4d\",\r\n \"openhouse.tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"30f90df6-4c45-47ee-916c-7cf0bdea5d4d\",\r\n \"tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n}\r\n\r\n3. List tables:\r\n$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/ | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 1614 0 1614 0 0 30447 0 --:--:-- --:--:-- --:--:-- 31038\r\n{\r\n \"results\" : [\r\n {\r\n \"clusterId\" : \"LocalHadoopCluster\",\r\n \"clustering\" : [\r\n {\r\n \"columnName\" : \"name\",\r\n \"transform\" : null\r\n }\r\n ],\r\n \"creationTime\" : 1717098450027,\r\n \"databaseId\" : \"d3\",\r\n \"lastModifiedTime\" : 1717098450027,\r\n \"policies\" : null,\r\n\"schema\" :\r\n\"{\\\"type\\\":\\\"struct\\\",\\\"schema-id\\\":0,\\\"fields\\\":[{\\\"id\\\":1,\\\"name\\\":\\\"id\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":2,\\\"name\\\":\\\"name\\\",\\\"required\\\":true,\\\"type\\\":\\\"string\\\"},{\\\"id\\\":3,\\\"name\\\":\\\"ts\\\",\\\"required\\\":true,\\\"type\\\":\\\"timestamp\\\"}]}\",\r\n \"tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"tableId\" : \"t1\",\r\n\"tableLocation\" :\r\n\"hdfs://namenode:9000/data/openhouse/d3/t1-30f90df6-4c45-47ee-916c-7cf0bdea5d4d/00000-5fec4fc6-4bf9-4bab-b6a9-112967a57497.metadata.json\",\r\n \"tableProperties\" : {\r\n \"key\" : \"value\",\r\n \"openhouse.clusterId\" : \"LocalHadoopCluster\",\r\n \"openhouse.creationTime\" : \"1717098450027\",\r\n \"openhouse.databaseId\" : \"d3\",\r\n \"openhouse.lastModifiedTime\" : \"1717098450027\",\r\n \"openhouse.tableCreator\" : \"DUMMY_ANONYMOUS_USER\",\r\n \"openhouse.tableId\" : \"t1\",\r\n\"openhouse.tableLocation\" :\r\n\"/data/openhouse/d3/t1-30f90df6-4c45-47ee-916c-7cf0bdea5d4d/00000-5fec4fc6-4bf9-4bab-b6a9-112967a57497.metadata.json\",\r\n \"openhouse.tableType\" : \"PRIMARY_TABLE\",\r\n\"openhouse.tableUUID\" : \"30f90df6-4c45-47ee-916c-7cf0bdea5d4d\",\r\n \"openhouse.tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"openhouse.tableVersion\" : \"INITIAL_VERSION\",\r\n \"policies\" : \"\",\r\n \"write.format.default\" : \"orc\",\r\n \"write.metadata.delete-after-commit.enabled\" : \"true\",\r\n \"write.metadata.previous-versions-max\" : \"28\"\r\n },\r\n \"tableType\" : \"PRIMARY_TABLE\",\r\n \"tableUUID\" : \"30f90df6-4c45-47ee-916c-7cf0bdea5d4d\",\r\n \"tableUri\" : \"LocalHadoopCluster.d3.t1\",\r\n \"tableVersion\" : \"INITIAL_VERSION\",\r\n \"timePartitioning\" : {\r\n \"columnName\" : \"ts\",\r\n \"granularity\" : \"HOUR\"\r\n }\r\n }\r\n ]\r\n}\r\n\r\n4. Delete table:\r\n$ curl \"${curlArgs[@]}\" -XDELETE\r\nhttp://localhost:8000/v1/databases/d3/tables/t1\r\nlajain-mn2:oh-hadoop-spark lajain$ curl \"${curlArgs[@]}\" -XGET\r\nhttp://localhost:8000/v1/databases/d3/tables/ | json_pp\r\n% Total % Received % Xferd Average Speed Time Time Time Current\r\nDload Upload Total Spent Left Speed\r\n100 14 0 14 0 0 331 0 --:--:-- --:--:-- --:--:-- 333\r\n{\r\n \"results\" : []\r\n}\r\n\r\nTesting via spark shell:\r\nscala> spark.sql(\"CREATE TABLE openhouse.db.tb (ts timestamp, col1\r\nstring, col2 string) PARTITIONED BY (days(ts))\").show()\r\n++\r\n||\r\n++\r\n++\r\n\r\n\r\nscala> spark.sql(\"DESCRIBE TABLE openhouse.db.tb\").show()\r\n+--------------+---------+-------+\r\n| col_name|data_type|comment|\r\n+--------------+---------+-------+\r\n| ts|timestamp| |\r\n| col1| string| |\r\n| col2| string| |\r\n| | | |\r\n|# Partitioning| | |\r\n| Part 0| days(ts)| |\r\n+--------------+---------+-------+\r\n\r\n\r\nscala> spark.sql(\"INSERT INTO TABLE openhouse.db.tb VALUES\r\n(current_timestamp(), 'val1', 'val2')\")\r\nres2: org.apache.spark.sql.DataFrame = []\r\n\r\nscala> spark.sql(\"INSERT INTO TABLE openhouse.db.tb VALUES\r\n(date_sub(CAST(current_timestamp() as DATE), 30), 'val1', 'val2')\")\r\nres3: org.apache.spark.sql.DataFrame = []\r\n\r\nscala> spark.sql(\"INSERT INTO TABLE openhouse.db.tb VALUES\r\n(date_sub(CAST(current_timestamp() as DATE), 60), 'val1', 'val2')\")\r\nres4: org.apache.spark.sql.DataFrame = []\r\n\r\nscala> spark.sql(\"SELECT * FROM openhouse.db.tb\").show()\r\n+--------------------+----+----+\r\n| ts|col1|col2|\r\n+--------------------+----+----+\r\n|2024-05-30 19:52:...|val1|val2|\r\n| 2024-03-31 00:00:00|val1|val2|\r\n| 2024-04-30 00:00:00|val1|val2|\r\n+--------------------+----+----+\r\n\r\n\r\nscala> spark.sql(\"SHOW TABLES IN openhouse.db\").show()\r\n+---------+---------+\r\n|namespace|tableName|\r\n+---------+---------+\r\n| db| tb|\r\n+---------+---------+\r\n\r\n\r\n# Additional Information\r\n\r\n- [ ] Breaking Changes\r\n- [ ] Deprecations\r\n- [x] Large PR broken into smaller PRs, and PR plan linked in the\r\ndescription.\r\n\r\nFor all the boxes checked, include additional details of the changes\r\nmade in this pull request.","shortMessageHtmlLink":"[PR1/5] Add S3 Storage type and S3StorageClient (#107)"}},{"before":"f84712e446a7e5332be4099147b3635a6bc4f05b","after":"d0c6583c7dd5b73c4140ef74d5672a71592ee9c9","ref":"refs/heads/main","pushedAt":"2024-05-30T00:04:22.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"HotSushi","name":"Sushant Raikar","path":"/HotSushi","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6441597?s=80&v=4"},"commit":{"message":"Move CatalogOperationsTest to `:openhouse-spark-itest` module (#112)\n\nMove CatalogOperationsTest to `:openhouse-spark-itest` module (#112)","shortMessageHtmlLink":"Move CatalogOperationsTest to :openhouse-spark-itest module (#112)"}},{"before":"b02fa9da7daae4e573efe8e61b841d1e1d3f2106","after":"f84712e446a7e5332be4099147b3635a6bc4f05b","ref":"refs/heads/main","pushedAt":"2024-05-29T21:39:42.000Z","pushType":"pr_merge","commitsCount":1,"pusher":{"login":"HotSushi","name":"Sushant Raikar","path":"/HotSushi","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6441597?s=80&v=4"},"commit":{"message":"Fix 'io-impl' relocation in `openhouse-java-runtime` and `openhouse-spark-runtime` (#111)\n\n## Summary\r\nThis bug was discovered in the PR:\r\nhttps://github.com/linkedin/openhouse/pull/106\r\n\r\nAs a result of this bug, we need to specify the spark config as:\r\n```\r\n--conf spark.sql.catalog.openhouse.com.linkedin.openhouse.relocated.io-impl=XYZ\r\n```\r\ninstead of the right way:\r\n```\r\n--conf spark.sql.catalog.openhouse.io-impl=XYZ\r\n```\r\nThis bug was introduced because of incorrect relocation.\r\nThe jar before the change contains following code:\r\n\r\n![image](https://github.com/linkedin/openhouse/assets/6441597/9b95ab75-3174-4d1c-bd15-1ba06795f0ec)\r\n\r\n## Changes\r\n\r\n- [X] Bug Fixes\r\n\r\nFor all the boxes checked, please include additional details of the\r\nchanges made in this pull request.\r\n\r\n## Testing Done\r\n✅ build will succeed\r\n✅ new jar and code looks good:\r\n\r\n![image](https://github.com/linkedin/openhouse/assets/6441597/25257b3d-5fd1-4efc-9746-89205b1f0902)\r\n✅ older relocation looks good:\r\n\r\n![image](https://github.com/linkedin/openhouse/assets/6441597/166263c1-8890-447a-8b1c-a0215c90ce9c)\r\n\r\n- [X] Some other form of testing.","shortMessageHtmlLink":"Fix 'io-impl' relocation in openhouse-java-runtime and `openhouse-s…"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEgVmJawA","startCursor":null,"endCursor":null}},"title":"Activity · linkedin/openhouse"}