David,

 

Sarath was working on tag-propagation, but had to take up tasks related to JanusGraph and others. He will be resuming tag-propagation work next week; this feature would be part of Atlas-1.0.0 release.

 

- lose BOTH - this is still in the code - I think we agreed we wanted to get rid of this.
Agree.

 

- should honour the classification entitytypes - so that we do not get classifications applied to inappropriate entityTypes  
Perhaps we should stop the propagation at the entity where the classification is not applicable? I think it wouldn’t be correct to block a classification association to an entity if the classification is not applicable for a down-stream entity.

 

- There is the question about how the propagated classifications would look in the get entity rest API  - I suggest that they appear in the entities classification with a field indicating that they are derived (and hence not able to be removed by an entity update).
I was thinking about a separate attribute, AtlasEntity.propagatedClassifications, for this. However, I think your suggestion of adding a field to AtlasClassification is a better one; with this approach no changes would be needed in applications that process classifications on an entity. How about we capture the guid of the source entity on which the classification is associated, AtlasClassification.sourceEntityGuid? If this value is null, then the classification is associated with the current entity directly.

 

- I would hope that Ranger would pick up these new propagated tags using the existing tag sync.
Yes. With the approach detailed above, no changes would be needed in Ranger.

 

- I think you wanted the derived classifications to be picked up at query time. I also remember suggesting that we store the derived classifications in a derivedClassifiation property in the entity which would contain the list of derived classifications. Or we could store them as a new type of edge "propagated classification" edges to the real classification. I like the edge idea.
To  enable queries like ‘get list of entities that are classified as PII’, it will be performant if each entity vertex has data about the propagated classifications as well, similar to entities having data on classifications directly associated with the entity currently. However, all the entities should directly reference a single instance of a classification, so that it will be easier to manage changes to classification attribute values. Sarath will send an update on the design choices later next week.

 

If we had the above, we could classify a Term as PSI, and use the semantic mapping to propagate the classifications to the hive column. The hive column would not pick up classifications defined in the area 3 model like "SpineObject", which is defined as only applying to "GlossaryTerm".  
Yes. This usecase should be covered by the design discussed above.

 

Thanks,

Madhan

 

From: David Radley <[EMAIL PROTECTED]>
Date: Thursday, January 11, 2018 at 8:52 AM
To: Madhan Neethiraj <[EMAIL PROTECTED]>
Cc: atlas <[EMAIL PROTECTED]>
Subject: Tag propagation

 

Hi Madhan,
I have a look in the code - I was surprised that the tag propagation was not in. Is this something you are looking at in the near future? If not I may need to look into it. I suggest the tag propagation implementation should phase 1 should:
- lose BOTH - this is still in the code - I think we agreed we wanted to get rid of this.
- should honour the classification entitytypes - so that we do not get classifications applied to inappropriate entityTypes  
- There is the question about how the propagated classifications would look in the get entity rest API  - I suggest that they appear in the entities classification with a field indicating that they are derived (and hence not able to be removed by an entity update).
- I would hope that Ranger would pick up these new propagated tags using the existing tag sync.
- I think you wanted the derived classifications to be picked up at query time. I also remember suggesting that we store the derived classifications in a derivedClassifiation property in the entity which would contain the list of derived classifications. Or we could store them as a new type of edge "propagated classification" edges to the real classification. I like the edge idea.

If we had the above, we could classify a Term as PSI, and use the semantic mapping to propagate the classifications to the hive column. The hive column would not pick up classifications defined in the area 3 model like "SpineObject", which is defined as only applying to "GlossaryTerm".  

What do you think?   all the best, David.

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU