Skip to content

Python library for working with ThoughtSpot Modeling Language (TML) files programmatically

License

Notifications You must be signed in to change notification settings

saurabhsingh1608/thoughtspot_tml

 
 

Repository files navigation

ThoughtSpot TML

a Python package for working with ThoughtSpot Modeling Language (TML) files programmatically

Badge: Lint Badge: Test

🚨 If your examples or scripts are built on thoughtspot_tml==1.3.0, see our Migration to v2.0.0 guide. 🚨

Features

This package will not perform validation of the constructed TML files or interact with your ThoughtSpot cluster!

Please leverage the ThoughtSpot REST API for this purpose.

Installation

thoughtspot_tml requires at least Python 3.7, preferably Python 3.9 and above.

Installation is as simple as:

pip install thoughtspot-tml

A Basic Example

This example creates a command-line tool for changing the prefix in the names of the Table objects that a Worksheet object connects to.

# worksheet_remapping.py
from thoughtspot_tml import Worksheet
import argparse
import pathlib


def filepath(fp: str) -> pathlib.Path:
    """
    Converts a string to a pathlib.Path.
    """
    path = pathlib.Path(fp)

    if not path.exists():
        raise argparse.ArgumentTypeError(f"path '{fp!r}' does not exist")

    if not path.is_file():
        raise argparse.ArgumentValueError(f"path must be a file, got '{fp!r}'")

    return path


def main():
    # Create a command line application
    #   - argument for a WORKSHEET.worksheet.tml
    #   - options for the "before" and "after" tabling naming conventions
    parser = argparse.ArgumentParser()
    parser.add_argument("worksheet_tml", help="a worksheet.tml to remap", type=filepath)
    parser.add_argument("-s", "--src-prefix", metavar="SRC", default="DEV_", type=str, help="(default: %(default)s)")
    parser.add_argument("-d", "--dst-prefix", metavar="DST", default="TEST_", type=str, help="(default: %(default)s)")

    # Parse CLI input
    args = parser.parse_args()

    # Read from file
    tml = Worksheet.load(args.worksheet_tml)
    
    # Replace instances of DEV_ with TEST_
    for table in tml.worksheet.tables:
        table.name = table.name.replace(args.src_prefix, args.dst_prefix)

    # Save to file
    tml.dump(args.worksheet_tml)


if __name__ == '__main__':
    raise SystemExit(main())
>>> python worksheet_remapping.py -h

usage: [-h] [-s SRC] [-d DST] worksheet_tml

positional arguments:
  worksheet_tml         a worksheet.tml to remap

options:
  -h, --help                show this help message and exit
  -s SRC, --src-prefix SRC  (default: DEV_)
  -d DST, --dst-prefix DST  (default: TEST_)

A more complex version of this example, as well as more examples can be found in the /examples directory in this repository.

thoughtspot_tml Reference

TML Objects

from thoughtspot_tml import Table, View, SQLView, Worksheet
from thoughtspot_tml import Answer, Liveboard, Pinboard

# aliases
from thoughtspot_tml import ThoughtSpotView    # View
from thoughtspot_tml import SavedAnswer        # Answer
from thoughtspot_tml import SystemTable        # Table

Each TML object has a top-level attribute for the globally unique identifier, or GUID, as well as the document form of the object it represents. This identically mirrors the TML specification you can find in the ThoughtSpot documentation. In addition, the name attribute of the TML document itself has been pulled into the top-level namespace.

@dataclass
class Worksheet(TML):
    """
    Representation of a ThoughtSpot Worksheet TML.
    """

    guid: GUID
    worksheet: WorksheetEDocProto

    @property
    def name(self) -> str:
        return self.worksheet.name

The full, composable TML specification can found in _scriptability.py. Each piece of the spec is a python dataclasses.dataclass field. The internal _scriptability.py module is generated code from the ThoughtSpot's internal architecture and allows for thoughtspot_tml to offer the deep attribute access experience in python.

@dataclass
class Table(TML):
    """
    Representation of a ThoughtSpot Table TML.
    """

    guid: GUID
    table: LogicalTableEDocProto

    @property
    def name(self) -> str:
        return self.table.name

For example, interesting attributes about the Table TML spec are exposed via attributes which can, in turn expose their own attributes themselves. This functionality offers common pattersn to be expressed natively in Python, such as remapping a Table's connection details.

tml = Table.load("tests/data/DUMMY.table.tml")

# get the Table document object
tml.table           # => LogicalTableEdocProto(...)

# get the Table's underlying connected details
tml.table.db        # => 'PMMDB'
tml.table.schema    # => 'RETAILAPPAREL'
tml.table.db_table  # => 'dim_retapp_products'

# get the Table's columns
tml.table.columns   # => [LogicalTableEDocProtoLogicalColumnEDocProto(...), ...]

# repoint this ThoughtSpot Table to a new external table
tml.table.schema = "RETAILAPPAREL_V2"
tml.table.db_table = "DIM_RETAPP_PRODUCTS"

The Connection is a special type of TML object. Connections (also known as "Embrace" Connections) were implemented prior to the TML spec being officially released. The remapping file (connection.yaml), obtained from your platform at Data > Connections > (...) in the top right > Remapping > Download defines how ThoughtSpot table objects relate to their external counterparts.

from thoughtspot_tml import Connection

# aliases
from thoughtspot_tml import EmbraceConnection  # Connection

Even though the resulting file is different, we've implemented the Connection with an identical form.

The Connection GUID, while optional in thoughtspot_tml, is required when modifying or removing an existing connection via the REST API. A Connection's GUID can be obtained by calling the connection/list endpoint.

When loading from a file, if thoughtspot_tml identifies the filename is a GUID, then the property will be set on the resulting object.

The connection/update REST API endpoint requires connections to formatted in a different way. For this, we provide a method to generate the metadata parameter data, which is a mapping of configuration attributes, as well as database, schema, and table objects.

*ThoughtSpot plans to release Connection TML in a future release.

@dataclass
class Connection(TML):
    """
    Representation of a ThoughtSpot Connection YAML.
    """

    guid: Optional[GUID]
    connection: ConnectionDoc

    def to_rest_api_v1_metadata(self) -> ConnectionMetadata:
        ...

Each object contains multiple methods for serialization and deserialization.

Deserialization

For deserialization of a TML document into a python object.

ws = Worksheet.load(path: PathLike = "tests/data/DUMMY.worksheet.tml")
ws = Worksheet.loads(tml_document: str = ...)  # can be obtained from the ThoughtSpot REST API

ws.guid == "2ea7add9-0ccb-4ac1-90bb-231794ebb377"

.load a worksheet from a .worksheet.tml file, or as a string directly from the metadata/tml/export API with .loads.


Serialization

For serialization of a TML python object back into data.

data = ws.to_dict()
data["guid"] == "2ea7add9-0ccb-4ac1-90bb-231794ebb377"

ws.dump(path="tests/data/DUMMY.worksheet.tml")
# DUMMY.worksheet.tml
#
# guid: 2ea7add9-0ccb-4ac1-90bb-231794ebb377
# worksheet:
#   ...

data_s = ws.dumps(format_type="YAML")
data = yaml.load(data_s)
data["guid"] == "2ea7add9-0ccb-4ac1-90bb-231794ebb377"

# -or-

data = ws.dumps(format_type="JSON")
data_s = json.loads(data_s)
data["guid"] == "2ea7add9-0ccb-4ac1-90bb-231794ebb377"

.to_dict to convert the entire object tree into python native types, or write back to a file with .dump as a TML-formatted string. The formatting can be overriden to JSON if the JSON file type is used (.worksheet.json). .dumps allows access to the formatted string directly, typically used as input for the metadata/tml/import API.

SpotApp

from thoughtspot_tml import SpotApp

SpotApps are bundles of TML which can be obtained directly from the ThoughtSpot user interace as a zip file archive or from the /metadata/tml/export API endpoint using the export_associated = true query parameter.

export_response = ...  # /metadata/tml/export
s = SpotApp.from_api(export_response)
print(s.tml)  # => [Worksheet(...), Table(...), Table(...)]
print(s.manifest)  # => Manifest(...)

# -or-

s = SpotApp.read("tests/data/DUMMY_spot_app.zip")
print(s.tml)  # => [Worksheet(...), Table(...), Table(...)]
print(s.manifest)  # => Manifest(...)

SpotApps can also be saved to a new zipfile archive through the .save method.

s = SpotApp.read("tests/data/DUMMY_spot_app.zip")
s.save("tests/data/NEW_DUMMY_spot_app.zip")

Utilities

thoughtspot_tml.utils are additional methods which can help or speed up working with TML documents.

determine_tml_type

TML is both a data structure and file format, and these formats vary slightly across each document. determine_tml_type will return the appropriate TML class so that you can call deserialization methods directly. Pass either the path keyword with a filepath, or the file info directly from one of the objects returned in the /metadata/tml/export response data.

    signature

def determine_tml_type(*, info: TMLDocInfo = None, path: PathLike = None) -> Union[Connection, TMLObject]:
    """
    Get the appropriate TML class based on input data.

    Parameters
    ----------
    info : TMLDocInfo
      API edoc info response

    path : PathLike
      filepath to parse

    Raises
    ------
    TMLError, when a valid TML type could not be found based on input
    """

    usage

from thoughtspot_tml.utils import determine_tml_type

tml_cls = determine_tml_type(path="/tests/data/DUMMY.worksheet.tml")
tml = tml_cls.load(path="/tests/data/DUMMY.worksheet.tml")
type(tml) is Worksheet  # => True

# -or-

export_response = ...  # /metadata/tml/export
tml_cls = determine_tml_type(info=export_response["object"][0]["info"])
tml = tml_cls.loads(tml_document=export_response["object"][0]["edoc"])
type(tml) is Worksheet  # => True

EnvironmentGUIDMapper

The EnvironmentGUIDMapper is a dictionary-like data structure which can help you maintain references to objects across your ThoughtSpot environments. The underlying data structure is intended to clearly show the relationship of a given object between any number of environments. An "environment" can be any scope you consider separate from each other, be it 2 ThoughtSpot servers, 2 Connections on the same server, or even "Copy of" the same object within a single Connection.

    signature

class EnvironmentGUIDMapper:
    """
    Attributes
    ----------
    environment_transformer : Callable(str) -> str
      a function which transforms the ENV name before adding it to the mapping
    """

    def __init__(self, environment_transformer: Callable[[str], str] = str.upper):

    usage

from thoughtspot_tml.utils import EnvironmentGUIDMapper

# create a new mapper
mapper = EnvironmentGUIDMapper()  # or EnvironmentGUIDMapper.read(path=...)

# map 3 guids to represent the same ThoughtSpot object across environments
mapper["guid1"] = ("PROD", "guid1")  # 1. add a new guid into the mapper
mapper["guid1"] = ("TEST", "guid2")  # 2. map guid1 to a guid in another environment
mapper["guid2"] = ("DEV", "guid3")   # 3. map a new guid3 to any of existing guid

# persist the mapping file to disk
mapper.save(path="marketing_thoughtspot_guid_mapping.json")

# what's the JSON data structure look like?
print(mapper)
{
    "guid1__guid2__guid3": {
        "PROD": "guid1",
        "TEST": "guid2",
        "DEV": "guid3"
    }
}

# create a new mapper from a file
new_mapper = EnvironmentGUIDMapper.read(path="marketing_thoughtspot_guid_mapping.json")

# add another object mapping
new_mapper.set("guid10", environment="PROD", guid="guid10")   # equivalent to new_mapper["guid10"] = ("PROD", "guid10")
new_mapper.set("guid10", environment="TEST", guid="guid11")
new_mapper.set("guid10", environment="DEV", guid="guid12")

# get all the environments that would map to "guid10"
print(new_mapper["guid10"])  # or  new_mapper.get("guid10")
{
    "PROD": "guid10",
    "TEST": "guid11",
    "DEV": "guid12"
}

# get a mapping of all DEV -> PROD related ThoughtSpot objects
print(new_mapper.generate_mapping(from_environment="DEV", to_environment="PROD"))
{
    "guid3": "guid1",
    "guid12": "guid10"
}

disambiguate

In ThoughtSpot, the uniqueness constraint exists on the underlying object's guid. This means that there can be multiple objects of the same type with the same name. An example of this is maintaining both a DEV and PROD Connection. All the development work happens on one set of objects (that are not shared with any of the End User community), while the production connection contains objects with identical names that are shared with the End User community.

To reduce ambiguity, you may need to add the fqn key to your TML document when you reference source tables or connections. If you do not add the fqn key, and the connection or table you reference does not have a unique name, the import will fail.

NOTE: Prior to ThoughtSpot V8.7.0, TML does not export with the fqn automatically.

    signature

def disambiguate(
    tml: TMLObject,
    *,
    guid_mapping: Dict[str, GUID],
    remap_object_guid: bool = True,
    delete_unmapped_guids: bool = False,
) -> TMLObject:
    """
    Deep scan the TML looking for fields to add FQNs to.

    This will explore the top-level guid and all nested objects looking on
    Tables, Worksheets, etc to disambiguate.

    Parameters
    ----------
    tml : TMLObject
      the tml to scan

    guid_mapping : {str: GUID}
      a mapping of names or guids, to the FQN to add to the object

    remap_object_guid : bool = True
      whether or not to remap the tml.guid

    delete_unmapped_guids : bool = False
      if a match could not be found, set the FQN and object guid to None
    """

    usage

from thoughtspot_tml.utils import disambiguate
from thoughtspot_tml import Worksheet

# Load a Worksheet and check its data
ws = Worksheet.load("tests/data/DUMMY.worksheet.tml")
ws.guid == "2ea7add9-0ccb-4ac1-90bb-231794ebb377"     # => True
ws.worksheet.tables[0].name == "dim_retapp_products"  # => True
ws.worksheet.tables[0].fqn is None  # => True

# Assign a Table an FQN. This information can be retrieved from ThoughtSpot REST API metadata/list.
ws = disambiguate(ws, guid_mapping={"dim_retapp_products": "7fd39fdb-9dfe-4954-b5dd-9a5d846085b0"})
ws.worksheet.tables[0].fqn is None  # => False
ws.worksheet.tables[0].fqn == "7fd39fdb-9dfe-4954-b5dd-9a5d846085b0"  # => True

# Re-assign the GUID to a new environment.
ws = disambiguate(ws, guid_mapping={"7fd39fdb-9dfe-4954-b5dd-9a5d846085b0": "99999999-9999-4999-9999-999999999999"})
ws.worksheet.tables[0].fqn == "7fd39fdb-9dfe-4954-b5dd-9a5d846085b0"  # => True
ws.worksheet.tables[0].fqn == "99999999-9999-4999-9999-999999999999"  # => False

# Remove GUIDs which aren't found in the mapping, including the top-level GUID.
ws = disambiguate(ws, guid_mapping={}, delete_unmapped_guids=True)
ws.worksheet.tables[0].name == "dim_retapp_products"  # => True
ws.worksheet.tables[0].fqn is None  # => True
ws.guid is None   # => True

The disambiguate function will walk through the thoughtspot_tml TML object specifying the .fqn based on keys in the guid_mapping dictionary.

The guid_mapping will typically be a mapping of GUIDs between 2 environments, but the "before" environment can be any string. This can be helpful to quickly add fqn to any object which has yet to define it.

The remap_object_guid (default: True) will consider the top-level TML.guid as a candidate for re-mapping.

The delete_unmapped_guids (default: False) will remove any .fqns which are not found in the guid_mapping.


Migration to v2.0.0

With V2.0.0, we now programmatically build the TML spec from the underlying microservice's data structure. The largest benefit of this move is that we can now

Round-tripping to File

The utility class YAMLTML has been replaced with utils.determine_tml_type and a private base class TML, which all public metadata objects inherit from. The TML type which is returned has the appropriate [de]serialization methods.

Both of the following patterns represent round-tripping.

import pathlib

worksheet_fp = "tests/data/DUMMY.worksheet.tml"
worksheet_tml_str = pathlib.Path(worksheet_fp).read_text()

# V1.3.0
from thoughtspot_tml import YAMLTML

tml = YAMLTML.get_tml_object(worksheet_tml_str)
tml_document_str = YAMLTML.dump_tml_object(tml)


# V2.0.0
from thoughtspot_tml.utils import determine_tml_type
from thoughtspot_tml import Worksheet

tml_cls = determine_tml_type(path=worksheet_fp)
tml = tml_cls.loads(worksheet_tml_str)
# any one of these methods..
# tml = tml_cls.load(worksheet_fp)
# tml = Worksheet.loads(worksheet_tml_str)
# tml = Worksheet.load(worksheet_fp)
tml_document_str = tml.dumps(worksheet_fp)

Identifying the TML Object Type

To identify the type of TML object you are working with in V1.3.0 you would use .content_type, with V2.0.0 you can now use .tml_type_name.

GUID & FQN Handling

In V1.3.0, GUIDs were deleted from the underlying data structure with .remove_guid() in order to ensure the REST API created new objects. With V2.0.0, you simply set the .guid attribute (on the object itself) to None.

# V1.3.0
tml = YAMLTML.get_tml_object(worksheet_tml_str)
tml.remove_guid()


# V2.0.0
tml = Worksheet.loads(worksheet_tml_str)
tml.guid = None

In V1.3.0, each TML object had their own methods for finding and replacing GUIDs. These took the form of .remap_<object_type>_to_new_fqn() and .change_<object_type>_by_fqn(), replacing <object_type> for the underlying data source which maps into the object you're operating on. These methods modify the underlying object.

In V2.0.0, we supply a single method to help add the fqn key to your TML document when referencing source tables or connections that share a name. See disambiguation for additional information.

For example, the below example shows adding the Table FQN references in a Worksheet.

# V1.3.0
name_guid_map = {"Table 1": "0f814ce1-dba1-496a-b3de-38c4b9a288ed", "Table 2": "2e7a0676-2acf-4700-965c-efebf8c0b594"}
tml = YAMLTML.get_tml_object(worksheet_tml_str)
tml.remap_table_to_new_fqn(name_to_fqn_map=name_guid_map)
# - or -
tml.change_table_by_fqn(original_table_name="Table 1", new_table_guid="0f814ce1-dba1-496a-b3de-38c4b9a288ed")


# V2.0.0
from thoughtspot_tml.utils import disambiguate

tml = Worksheet.loads(worksheet_tml_str)
tml = disambiguate(tml, guid_mapping=name_guid_map)

Notes on ThoughtSpot Modeling Language

  • TML is implemented in the YAML 1.1 spec.
  • When importing a TML file, if the guid matches to an existing object, then that object will be updated. If the guid is missing or does not match an object, a new object is created with a new GUID.

Want to contribute?

We welcome all help! ❤️ For guidance on setting up a development environment, see our Contributing Guide.

About

Python library for working with ThoughtSpot Modeling Language (TML) files programmatically

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%