Before ingesting Databricks metadata into Dawiso, prepare your account for authentication using either:

  • Client ID and Client Secret, or
  • A generated Databricks Personal Authentification Token (Bearer Token)

Dawiso gathers Databricks metadata through the Databaricks Unity Catalog REST API.

Connection prerequisities

  • The Unity catalog must be enabled for the workspace
  • You have sufficient permissions to generate a Databricks Personal Token
  • Sufficient READ, EXECUTE, USE_SCHEMA, and USE_CATALOG privileges

Data lineage requirements

In order to scan Data Lineage of your Databricks objects, you must fulfil these requirements.

To scan your lineages:

  1. Provide access to the following tables: system.access.table_lineage and system.access.column_lineage. For more information, see the Databricks’ official documentation. You can also provide views for these tables.
  2. Make sure you have Account Admin permissions and grant access to the system catalog and access schema via console or API.

To add access to system.access catalog with API:

  1. Generate token using:
    •   {databricksUrl}/settings/user/developer/access-tokens
    • Replace the {databricksUrl} with your actual Databricks workspace URL.
  2. Show all system schemas using:
    •   GET {databricksUrl}/api/2.0/unity-catalog/metastores/{metastoreId}/systemschemas
    • Replace the {databricksUrl} with your actual Databricks workspace URL.
  3. Enable access to the schema using:
    •   PUT {databricksUrl}/api/2.0/unity-catalog/metastores/{metastoreId}/systemschemas/access
    • Replace the {databricksUrl} with your actual Databricks workspace URL.
  4. Grant DataReaderpermissions to system catalog to user.
    1. In the left-side menu of your Databricks environment, select Catalog.
    2. Under the My organization section, click system.
    3. On the system page, switch to the Permissions tab. Click the Grant button.
    4. Select Data reader.
Tip

You can also create a view to the following:

  • system/access/table_lineage
  • system/access/column_lineage Then, provide a SELECT query in the configuration. This prevents security risks connected with creating additional access to the system schema catalog.

Connection requirements

To connect Dawiso to your Databricks instance, you will need your:

  1. Personal Access Token or Client ID and Client Secret
  2. Databricks Workspace URL
  3. Warehouse ID
  4. Catalog name

Get your Databricks Workspace URL

Find your workspace URL in your browser’s address bar. The URL will have the following format:

  • https://adb-000XXX000XXX.azuredatabricks.net for Databricks on Azure
  • https://dbc-000XXX000XXX.cloud.databricks.com for Databricks on AWS
  • https://000XXX000XXX.gcp.databricks.com for Databricks on GCP

Generate new Personal Access Token

  1. In your Databricks account, click your Databricks username in the top. Then select Settings from the drop down.
  2. Click Developer.
  3. Next to Access tokens, click Manage.
  4. Click Generate new token.
  5. [Optional] Name the token and change the default expiry date of 90 days. Leave the box empty for no expiry date (you won’t have to generate a new token for each Dawiso metadata scan and refresh).
  6. Click Generate.
  7. Copy the displayed token, and click Done.

Databricks provider options

{"settings": {	"warehouseId": "<warehouseId>", -- optional, needed for Lineage Load via SQL statements	"catalogNames": ["<catalogName>"], -- optional, if not provided, loaded all catalogs	//"schemaNames": ["poc_dbx_dwh"], -- optional, if not provided, loaded all schemas	//"loadTableLineageViaRequests": false, -- optional, load Table Linage via Request - send request for each table fullname	"loadTableLineageViaStatements": true, -- optional, load Table Linage via SQL Statement - send request for schema - faster	//"lableLineageStatement": "select * from system.access.table_lineage where source_table_schema = :schema", -- optional, to set custom view for table linage	//"loadColumnLineage": false, -- optional, load Column Linage via Request - send request for each column fullname	"loadColumnLineageViaStatements": true, -- optional, load Column Linage via SQL Statement - send request for schema - faster	//"lolumnLineageStatement": "select * from system.access.column_lineage where source_table_schema = :schema", -- optional, to set custom view for column linage	"loadVolumeLineage": true -- optional, Load Volume Lineage		}
}

Configuration File for Integration Runtime

{"general": {	"workingFolder": "D:\\Serverdata\\ingestion-files\\IngestionRuntime",	"extractOnly": true},"dataSource": {	"providerKey": "core_databricks",	"uuid": "<datasource UUID>",	"format": "full",	"connection": {		"apiUrl": "https://<prefix>.azuredatabricks.net",		"clientId": "<clientId>",		"secret": "<password>"	},	"settings": {		"warehouseId": "<warehouseId>", -- optional, needed for Lineage Load via SQL statements		"catalogNames": ["<catalogName>"], -- optional, if not provided, loaded all catalogs		//"schemaNames": ["poc_dbx_dwh"], -- optional, if not provided, loaded all schemas		//"loadTableLineageViaRequests": false, -- optional, load Table Linage via Request - send request for each table fullname		"loadTableLineageViaStatements": true, -- optional, load Table Linage via SQL Statement - send request for schema - faster		//"tableLineageStatement": "select * from system.access.table_lineage where source_table_schema = :schema", -- optional, to set custom view for table linage		//"toadColumnLineage": false, -- optional, load Column Linage via Request - send request for each column fullname		"toadColumnLineageViaStatements": true, -- optional, load Column Linage via SQL Statement - send request for schema - faster		//"columnLineageStatement": "select * from system.access.column_lineage where source_table_schema = :schema", -- optional, to set custom view for column linage		"loadVolumeLineage": true -- optional, Load Volume Lineage			}},	"messageTypes": [	{		"key": "aaaa",		"query": ""	}],"ingestionCloud": {	"urlAddress": "https://aaa.com/",	"apiToken": "test"}
}

For more information, see Dawiso Integration Runtime (DIR).