-
Hi SDLF community, I'm really new to SDLF and have some questions regarding the framework Data Ingestion I work with Airbyte and would like to include Airbyte OSS as a replacement for I saw that As Airbyte will populate raw catalogs, is it possible to bypass crawlers? Git & multi-environments We use gitlab as git hosting provider and prefer to use gitlab instead of CodeCommit. In all our projects we're using gitflow with branches:
and have 4 environments:
Can I update CI/CD configuration to match to our gitflow? Or do you think it may break the framework too much? IaC & multi-accounts deployment Almost all my stacks are migrated to I've seen that SDLF helps deploy a data lake into dev, test and prod accounts, but in my case I'd really like to follow Designing a data lake for growth and scale on the AWS Cloud reference architecture as a base (In the FAQ they said it's better to start with this in order to anticipate future needs) With Is it possible to achieve something similar to Designing a data lake for growth and scale on the AWS Cloud with SDLF? CI/CD CI/CD in this framework looks really good, but isn't it much simplier with CDK Pipelines? Thanks, |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
You can remove the crawler from sdlf-dataset and also from sdlf-stageB - so yes it is entirely possible!
In several places the mapping between branches and environments is defined as follows:
It's easy to modify a branch name or an environment name by modifying these mappings. What you're asking is a bit harder though, I must admit. We want to eventually be more flexible, this is a feature I started working on but it's not ready yet.
This reference architecture follows a pattern called data mesh. It is possible to modify SDLF to make it fit such a pattern, yes, there are several ways to achieve that. We are also working on a new major version of the framework that would support this pattern out of the box.
CDK Pipelines makes things quite a bit easier! SDLF is written in CloudFormation, not CDK, that's why we don't use it. However you might be interested in SDLF DDK Lightweight: It's a version of SDLF written with CDK. It is very close in terms of constructs. Our plan is to have it part of the main SDLF repository eventually, hopefully this month or the next. |
Beta Was this translation helpful? Give feedback.
-
Hi cnfait, Thank you so much for your really detailed answer, I will look at SDLF 2.0 version. Is there an example project using it? Also, you're right about data mesh, I read a lot about it last week and found that it's what I need but data mesh is about decentralize the data lake into multiple ones, decouple responsibilities and have a central governance for data exploration, publication and request. aws-analytics-reference-architecture is an example architecture I found. As I understand it's totally possible to have something like aws-analytics-reference-architecture for data mesh and SDLF implementation for each data domain? Will it be easier with SDLF 2.0 to implement that kind of architecture? When I read about data mesh I also read about AWS DataZone, a managed service to simplify data mesh implementation with an interface for all teams to publish, explore and request access to data. SDLF could also work with this new service? I would love to have some example or further reading about data mesh and/or datazone with SDLF (maybe with SDLF 2.0) Thanks, |
Beta Was this translation helpful? Give feedback.
-
Hi @cnfait, I don't know if you saw my previous comment because i didn't tag you. Have you an quick example of a data mesh with SDLF as data domain (1 producer and 1 consumer for example) and about DataZone Thanks, |
Beta Was this translation helpful? Give feedback.
You can remove the crawler from sdlf-dataset and also from sdlf-stageB - so yes it is entirely possible!