Modifying Tags on Datasets
Why Would You Use Tags on Datasets?
Tags are informal, loosely controlled labels that help in search & discovery. They can be added to datasets, dataset schemas, or containers, for an easy way to label or categorize entities – without having to associate them to a broader business glossary or vocabulary. For more information about tags, refer to About DataHub Tags.
Goal Of This Guide
This guide will show you how to
- Create: create a tag named
Deprecated
- Read: read tags attached to a dataset
SampleHiveDataset
- Add: add a
CustomerAccount
tag to theuser_name
column of a dataset calledfct_users_created
. - Remove: remove a
Legacy
from theshipment_info
column of a dataset calledSampleHdfsDataset
.
Prerequisites
For this tutorial, you need to deploy DataHub Quickstart and ingest sample data. For detailed information, please refer to Datahub Quickstart Guide.
Before modifying tags, you need to ensure the target dataset is already present in your DataHub instance. If you attempt to manipulate entities that do not exist, your operation will fail. In this guide, we will be using data from sample ingestion.
For more information on how to set up for GraphQL, please refer to How To Set Up GraphQL.
Create Tags
The following code creates a tag Deprecated
.
- GraphQL
- Curl
- Python
mutation createTag {
createTag(input:
{
name: "Deprecated",
id: "deprecated",
description: "Having this tag means this column or table is deprecated."
})
}
If you see the following response, the operation was successful:
{
"data": {
"createTag": "urn:li:tag:deprecated"
},
"extensions": {}
}
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation createTag { createTag(input: { name: \"Deprecated\", id: \"deprecated\",description: \"Having this tag means this column or table is deprecated.\" }) }", "variables":{}}'
Expected Response:
{ "data": { "createTag": "urn:li:tag:deprecated" }, "extensions": {} }
# Inlined from /metadata-ingestion/examples/library/create_tag.py
import logging
from datahub.emitter.mce_builder import make_tag_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
# Imports for metadata model classes
from datahub.metadata.schema_classes import TagPropertiesClass
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
tag_urn = make_tag_urn("deprecated")
tag_properties_aspect = TagPropertiesClass(
name="Deprecated",
description="Having this tag means this column or table is deprecated.",
)
event: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
entityUrn=tag_urn,
aspect=tag_properties_aspect,
)
# Create rest emitter
rest_emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
rest_emitter.emit(event)
log.info(f"Created tag {tag_urn}")
Expected Outcome of Creating Tags
You can now see the new tag Deprecated
has been created.
We can also verify this operation by programmatically searching Deprecated
tag after running this code using the datahub
cli.
datahub get --urn "urn:li:tag:deprecated" --aspect tagProperties
{
"tagProperties": {
"description": "Having this tag means this column or table is deprecated.",
"name": "Deprecated"
}
}
Read Tags
- GraphQL
- Curl
- Python
query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)") {
tags {
tags {
tag {
name
urn
properties {
description
colorHex
}
}
}
}
}
}
If you see the following response, the operation was successful:
{
"data": {
"dataset": {
"tags": {
"tags": [
{
"tag": {
"name": "Legacy",
"urn": "urn:li:tag:Legacy",
"properties": {
"description": "Indicates the dataset is no longer supported",
"colorHex": null,
"name": "Legacy"
}
}
}
]
}
}
},
"extensions": {}
}
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "{dataset(urn: \"urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)\") {tags {tags {tag {name urn properties { description colorHex } } } } } }", "variables":{}}'
Expected Response:
{
"data": {
"dataset": {
"tags": {
"tags": [
{
"tag": {
"name": "Legacy",
"urn": "urn:li:tag:Legacy",
"properties": {
"description": "Indicates the dataset is no longer supported",
"colorHex": null
}
}
}
]
}
}
},
"extensions": {}
}
Coming Soon!
Add Tags
The following code shows you how can add tags to a dataset.
In the following code, we add a tag Deprecated
to a dataset named fct_users_created
.
- GraphQL
- Curl
- Python
mutation addTags {
addTags(
input: {
tagUrns: ["urn:li:tag:deprecated"],
resourceUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
}
)
}
Note that you can also add a tag on a column of a dataset if you specify subResourceType
and subResource
.
mutation addTags {
addTags(
input: {
tagUrns: ["urn:li:tag:deprecated"],
resourceUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
subResourceType:DATASET_FIELD,
subResource:"user_name"})
}
If you see the following response, the operation was successful:
{
"data": {
"addTags": true
},
"extensions": {}
}
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation addTags { addTags(input: { tagUrns: [\"urn:li:tag:deprecated\"], resourceUrn: \"urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)\" }) }", "variables":{}}'
Expected Response:
{ "data": { "addTags": true }, "extensions": {} }
# Inlined from /metadata-ingestion/examples/library/create_tag.py
import logging
from datahub.emitter.mce_builder import make_tag_urn
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
# Imports for metadata model classes
from datahub.metadata.schema_classes import TagPropertiesClass
log = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
tag_urn = make_tag_urn("deprecated")
tag_properties_aspect = TagPropertiesClass(
name="Deprecated",
description="Having this tag means this column or table is deprecated.",
)
event: MetadataChangeProposalWrapper = MetadataChangeProposalWrapper(
entityUrn=tag_urn,
aspect=tag_properties_aspect,
)
# Create rest emitter
rest_emitter = DatahubRestEmitter(gms_server="http://localhost:8080")
rest_emitter.emit(event)
log.info(f"Created tag {tag_urn}")
Expected Outcome of Adding Tags
You can now see Deprecated
tag has been added to user_name
column.
We can also verify this operation programmatically by checking the globalTags
aspect using the datahub
cli.
datahub get --urn "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)" --aspect globalTags
Remove Tags
The following code remove a tag from a dataset.
After running this code, Deprecated
tag will be removed from a user_name
column.
- GraphQL
- Curl
- Python
mutation removeTag {
removeTag(
input: {
tagUrn: "urn:li:tag:deprecated",
resourceUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
subResourceType:DATASET_FIELD,
subResource:"user_name"})
}
curl --location --request POST 'http://localhost:8080/api/graphql' \
--header 'Authorization: Bearer <my-access-token>' \
--header 'Content-Type: application/json' \
--data-raw '{ "query": "mutation removeTag { removeTag(input: { tagUrn: \"urn:li:tag:deprecated\", resourceUrn: \"urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)\" }) }", "variables":{}}'
Coming Soon!
Expected Outcome of Removing Tags
You can now see Deprecated
tag has been removed to user_name
column.
We can also verify this operation programmatically by checking the gloablTags
aspect using the datahub
cli.
datahub get --urn "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)" --aspect globalTags
{
"globalTags": {
"tags": []
}
}