-
Notifications
You must be signed in to change notification settings - Fork 1.7k
AMBARI-23130 Persist mpack information instead of the full provision request #603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMBARI-23130 Persist mpack information instead of the full provision request #603
Conversation
…ids on server restart (benyoka)
…eature-AMBARI-14714
…eature-AMBARI-14714
| public void setProvisionAction(ProvisionAction provisionAction) { | ||
| this.provisionAction = provisionAction; | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved here from PersistedStateImpl to take advantage of polymorphism instead of doing instanceof checks.
| return entity; | ||
| } | ||
|
|
||
| private TopologyHostGroupEntity toHostGroupEntity(HostGroupInfo groupInfo, TopologyRequestEntity topologyRequestEntity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc.
| for (String hostName : hosts) { | ||
| TopologyHostInfoEntity hostInfoEntity = new TopologyHostInfoEntity(); | ||
| hostInfoEntity.setTopologyHostGroupEntity(entity); | ||
| if (groupInfo.getPredicate() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the if-check necessary? Can you just set it to the value of the predicate, even if the predicate is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually didn't write this method but copied it over from PersistedStateImpl. I can change it nevertheless.
| } | ||
| } | ||
|
|
||
| public static <T> T fromJson(String json, Class<?> valueType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc.
| cluster_id BIGINT NOT NULL, | ||
| bp_name VARCHAR(100) NOT NULL, | ||
| raw_request_body CLOB NOT NULL, | ||
| mpack_instances CLOB NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an example of what mpack_instances looks like? Does it make sense to try to normalize this data into its own table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has the same structure as for blueprints. For blueprints it is modelled by the BlueprintMpackInstanceEntity, BlueprintServiceEntity, BlueprintMpackConfigEntities.
Currently we are interested in the stack id's only, later also in configurations and service descriptors associated with mpacks (I saw configs are already dumped as json too).
Do you think it is worth normalizing? (would mean 3 more tables, later maybe more) This information wouldn't be used outside of the scope of the replayed request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is always that when de-serializing this, if the object it's being mapped to changed in any way, the deserialization will fail? We've actually tried to move away from this model. Repositories used to be a solid chunk of JSON but we recently broke this out into 3 new tables to better manage it as entities.
I hate giant blob of JSON, especially if we need to worry about deserializing it ourselves and pulling data out of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on @jonathan-hurley 's feedback, and also based on looking at the BlueprintMPackInstanceEntity, it seems like this data should be normalized.
Since the MPack entities can be defined in either the Blueprint or the cluster creation template, it seems like it might make sense to use the same tables to store them.
Is there some technical reason for storing this data in the topology_request table? It seems to me that we'd be better off persisting all MPack instance data in the same table, regardless of whether it is defined in the Blueprint or the Cluster Creation template.
Since Service instance definitions and config can be specified in either document, I'm not sure why these should be modelled separately based on the location of the definition. It looks like we should be using the existing tables for both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, this shouldn't be a problem for blueprints, since backward compatibility is required. If users should be able to successfully submit "old" blueprint / cluster creation requests even if code is changed, so should Ambari internally.
jonathan-hurley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor changes and a question.
| private static final ObjectMapper JSON_SERIALIZER = new ObjectMapper(); | ||
| static { | ||
| JSON_SERIALIZER.setSerializationInclusion(JsonInclude.Include.NON_NULL); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setSerializationInclusion and other similar methods return the mapper, so it can be chained like:
private static final ObjectMapper JSON_SERIALIZER = new ObjectMapper()
.setSerializationInclusion(JsonInclude.Include.NON_NULL);
rnettleton
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code changes in the patch look fine to me, but it looks like it may be worth considering using the same database tables to store the mpack instance information, regardless of whether this information is stored in a Blueprint or a Cluster Creation template.
| cluster_id BIGINT NOT NULL, | ||
| bp_name VARCHAR(100) NOT NULL, | ||
| raw_request_body CLOB NOT NULL, | ||
| mpack_instances CLOB NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on @jonathan-hurley 's feedback, and also based on looking at the BlueprintMPackInstanceEntity, it seems like this data should be normalized.
Since the MPack entities can be defined in either the Blueprint or the cluster creation template, it seems like it might make sense to use the same tables to store them.
Is there some technical reason for storing this data in the topology_request table? It seems to me that we'd be better off persisting all MPack instance data in the same table, regardless of whether it is defined in the Blueprint or the Cluster Creation template.
Since Service instance definitions and config can be specified in either document, I'm not sure why these should be modelled separately based on the location of the definition. It looks like we should be using the existing tables for both cases.
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
Refer to this link for build results (access rights to CI server needed): |
|
I implemented normalized persistence for topology requests mpack information. Tables are shared with blueprint mpack information using JPA's single table inheritance. Tables and entities have been renamed to reflect the decoupling from blueprints (mpack instance entities can belong to topology requests too). |
|
Refer to this link for build results (access rights to CI server needed): |
|
I think the patch looks fine to me, but we should ask @jonathan-hurley to re-review, to make sure the table design and persistence code is correct. |
| private String mpackUri; | ||
|
|
||
| @OneToMany(cascade = CascadeType.ALL, mappedBy = "mpackInstance") | ||
| private Collection<MpackInstanceServiceEntity> serviceInstances = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make these a Collection instead of a List?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer using the most general suitable interface.
| return mpackInstanceEntity; | ||
| } | ||
|
|
||
| private void setCommonProperties(MpackInstanceEntity mpackInstanceEntity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation of which properties are being set for the bi-directional relationship.
| CONSTRAINT PK_hosts PRIMARY KEY (host_id), | ||
| CONSTRAINT UQ_hosts_host_name UNIQUE (host_name)); | ||
|
|
||
| CREATE TABLE mpack_host_state ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you moved this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have the tables in the same order as in Ambari-DDL-Postgres-CREATE.sql for easier comparision.
jonathan-hurley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some doc/minor changes.
Any reason you didn't use orphanRemoval on the collections?
|
Hey @jonathan-hurley Re orphanRemoval: these entities are saved once and are not expected to be updated later. I can add orphanRemoval though. I noticed that orphanRemoval is missing in other places where updates are not expected. |
|
Refer to this link for build results (access rights to CI server needed): |
What changes were proposed in this pull request?
This a followup for #559.
In a previous PR I implemented persisting the raw request in the topology_request entity. After a discussion with @rnettleton it turned out this is not the ideal way. A much desirable way is to persist the raw provision request as a cluster artifact so that it will be retrievable via REST API (with passwords filtered out). This effort will be tracked in a separate JIRA.
This PR modifies the original PR by persisting only the mpack related information in the TopologyRequestEntity. This information is still needed to replay persisted topology requests.
How was this patch tested?
Tests run: 4960, Failures: 31, Errors: 93, Skipped: 43