Databricks is a unified analytics platform based on Apache Spark, streamlining data science and engineering workflows for scalable data processing, advanced analytics, and machine learning. Ideal for collaborative environments, it supports real-time data processing and integrates seamlessly with diverse data sources.
Supported versions
Dawiso supports reading Databricks metadata through their multi-purpose REST API. To support this, ensure that:
- The Unity Catalog enabled is enabled in your Databricks workspace.
- Your Databricks environment runs Databricks Runtime version 11.3 LTS or above.
Default Hierarchy
Dawiso Adjusted Hierarchy
In the chart below, you will find the organizational structure of Databricks objects. Objects are in represented in green and folders are in yellow.
Data Lineage
Data Lineage in Databricks can be tracked either over tables or columns using the Data Lineage API.
Table Lineage
| Field | Values | Data Type |
|---|---|---|
| Flow Direction | Upstream / Downstream | List of objects |
| Table Info | Name, Catalog Name, Schema Name, Table Type | String (name of object) |
| Notebook Info | Workspace ID, Notebook ID | INT (ID of object) |
Column Lineage
| Field | Values | Data Type |
|---|---|---|
| Flow Direction | Upstream columns / Downstream columns | List of objects |
| Column Info | Name, Catalog Name, Schema Name, Table Name, Table Type | String (name of object) |
Object Types
Dawiso ingests the following objects from Databricks:
Catalog
Maps to Databricks Catalog object type.
| Source Attribute | Target Attribute in Dawiso | Description |
|---|---|---|
| name | Name | Name of catalog. |
| owner | Owner | Username of current owner of catalog. |
| comment | Comment | User-provided free-form text description. |
| storage_root | Storage Root | Storage root URL for managed tables within catalog. |
| metastore_id | Metastore ID | Unique identifier of parent metastore. |
| created_at | Created At | Time at which this catalog was created. |
| created_by | Created By | Username of catalog creator. |
| updated_at | Updated At | Time at which this catalog was last modified. |
| updated_by | Updated By | Username of user who last modified catalog. |
| catalog_type | Catalog Type | The type of the catalog. |
| storage_location | Storage Location | Storage Location URL (full path) for managed tables within catalog. |
| storage_location | Storage Location | Time at which this catalog was created. |
| isolation_mode | Isolation Mode | Whether the current securable is accessible from all workspaces or a specific set of workspaces. |
| full_name | Full Name | The full name of the catalog. Corresponds with the name field |
| securable_kind | Securable Kind | Kind of catalog securable. |
| securable_type | Securable Type | Default “CATALOG”. |
Schema
Maps to Databricks Schema object type.
| Source Attribute | Target Attribute in Dawiso | Description |
|---|---|---|
| name | Name | Name of schema, relative to parent catalog. |
| catalog_name | Catalog Name | Name of parent catalog. |
| owner | Owner | Username of current owner of schema. |
| comment | Comment | User-provided free-form text description. |
| storage_root | Storage Root | Storage root URL for managed tables within schema. |
| metastore_id | Metastore ID | Unique identifier of parent metastore. |
| full_name | Full Name | Full name of schema, in form of catalog_name.schema_name. |
| storage_location | Storage Location | Storage location for managed tables within schema. |
| created_at | Created At | Time at which this schema was created. |
| created_by | Created By | Username of schema creator. |
| updated_at | Updated At | Time at which this schema was created. |
| updated_by | Updated By | Username of user who last modified schema. |
| catalog_type | Catalog Type | The type of the parent catalog. |
Table
Maps to Databricks Table object type.
| Source Attribute | Target Attribute in Dawiso | Description |
|---|---|---|
| name | Name | Name of table, relative to parent schema. |
| catalog_name | Catalog Name | Name of parent catalog. |
| schema_name | Schema Name | Name of parent schema relative to its parent catalog. |
| table_type | Table Type | Type of table (MANAGED, EXTERNAL) |
| data_source_format | Data Source Format | Data source format. |
| storage_location | Storage Location | Storage root URL for table |
| sql_path | SQL Path | List of schemes whose objects can be referenced without qualification. |
| owner | Owner | Username of current owner of table. |
| comment | Comment | User-provided free-form text description. |
| storage_credential_name | Storage Credential Name | Name of the storage credential, when a storage credential is configured for use with this table. |
| metastore_id | Metastore ID | Unique identifier of parent metastore. |
| full_name | Full Name | Full name of table, in form of catalog_name.schema_name.table_name |
| created_at | Created At | Time at which this table was created. |
| created_by | Created By | Username of table creator. |
| updated_at | Updated At | Time at which this table was last modified. |
| updated_by | Updated By | Username of user who last modified the table. |
| deleted_at | Deleted At | Time at which this table was deleted. Field is omitted if table is not deleted. |
| table_id | Table ID | The unique identifier of the table. |
| pipeline_id | Pipeline ID | The pipeline ID of the table. Applicable for tables created by pipelines (Materialized View, Streaming Table, etc.). |
Table Column
Maps to Databricks Table object type.
| Source Attribute | Target Attribute in Dawiso | Description |
|---|---|---|
| name | Name | Name of Column. |
| type_text | Full Data Type | Full data type specification as SQL/catalogString text. |
| type_json | JSON’d Data Type | Full data type specification, JSON-serialized. |
| type_name | Type Name | Name of type (INT, STRUCT, MAP, etc.). |
| type_precision | Type Precision | Digits of precision; required for DecimalTypes. |
| type_scale | Type Scale | Digits to right of decimal; Required for DecimalTypes. |
| type_interval_type | Type Interval Type | Format of IntervalType. |
| position | Position | Ordinal position of column (starting at position 0). |
| comment | Comment | User-provided free-form text description. |
| nullable | Nullable | Whether field may be Null (default: true). |
| partition_index | Partition Index | Partition index for column. |
View
Maps to Databricks View object type.
| Source Attribute | Target Attribute in Dawiso | Description |
|---|---|---|
| name | Name | Name of view, relative to parent schema. |
| catalog_name | Catalog Name | Name of parent catalog. |
| schema_name | Schema Name | Name of parent schema relative to its parent catalog. |
| table_type | Table Type | Type of table (VIEW, MATERIALIZED_VIEW, STREAMING_TABLE) |
| data_source_format | Data Source Format | Data source format. |
| viev_definition | View Definition | View definition SQL (when table_type is VIEW, MATERIALIZED_VIEW, or STREAMING_TABLE) |
| view_dependencies | View Dependencies | View dependencies (when table_type == VIEW or MATERIALIZED_VIEW, STREAMING_TABLE) |
| storage_location | Storage Location | Storage root URL for view |
| sql_path | SQL Path | List of schemes whose objects can be referenced without qualification. |
| owner | Owner | Username of current owner of view. |
| comment | Comment | User-provided free-form text description. |
| storage_credential_name | Storage Credential Name | Name of the storage credential, when a storage credential is configured for use with this view. |
| metastore_id | Metastore ID | Unique identifier of parent metastore. |
| full_name | Full Name | Full name of view, in form of catalog_name.schema_name.view_name |
| created_at | Created At | Time at which this view was created. |
| created_by | Created By | Username of view creator. |
| updated_at | Updated At | Time at which this view was last modified. |
| updated_by | Updated By | Username of user who last modified the view. |
| deleted_at | Deleted At | Time at which this view was deleted. Field is omitted if table is not deleted. |
| table_id | Table ID | The unique identifier of the view. |
| pipeline_id | Pipeline ID | The pipeline ID of the view. Applicable for tables created by pipelines (Materialized View, Streaming Table, etc.). |
View Column
Maps to Databricks Volume object type.
| Source Attribute | Target Attribute in Dawiso | Description |
|---|---|---|
| name | Name | Name of Column. |
| type_text | Full Data Type | Full data type specification as SQL/catalogString text. |
| type_json | JSON’d Data Type | Full data type specification, JSON-serialized. |
| type_name | Type Name | Name of type (INT, STRUCT, MAP, etc.). |
| type_precision | Type Precision | Digits of precision; required for DecimalTypes. |
| type_scale | Type Scale | Digits to right of decimal; Required for DecimalTypes. |
| type_interval_type | Type Interval Type | Format of IntervalType. |
| position | Position | Ordinal position of column (starting at position 0). |
| comment | Comment | User-provided free-form text description. |
| nullable | Nullable | Whether field may be Null (default: true). |
| partition_index | Partition Index | Partition index for column. |
Function
Maps to Databricks Function object type.
| Source Attribute | Target Attribute in Dawiso | Description |
|---|---|---|
| name | Name | Name of function, relative to parent schema. |
| catalog_name | Catalog Name | Name of parent catalog. |
| schema_name | Schema Name | Name of parent schema relative to its parent catalog. |
| data_type | Data Type | Scalar function return data type. |
| full_data_type | Full Data Type | Pretty printed function data type. |
| routine_body | Routine Body | Function language (SQL, EXTERNAL). |
| routine_definition | Routine Definition | Function body. |
| parameter_style | Parameter Style | Function parameter style. S is the value for SQL. |
| is_deterministic | Is Deterministic | Whether the function is deterministic. |
| sql_data_access | SQL Data Access | Function SQL data access. |
| is_null_call | Is NULL Call | Function null call. |
| security_type | Security Type | Function security type. |
| specific_name | Specific Name | Specific name of the function. |
| external_name | External Name | External function name. |
| external_language | External Language | External function language. |
| sql_path | SQL Path | List of schemes whose objects can be referenced without qualification. |
| owner | Owner | Username of current owner of function. |
| comment | Comment | User-provided free-form text description. |
| metastore_id | Metastore ID | Unique identifier of parent metastore. |
| full_name | Full Name | Full name of function, in form of catalog_name.schema_name.function__name |
| created_at | Created At | Time at which this function was created. |
| created_by | Created By | Username of function creator. |
| updated_at | Updated At | Time at which this function was created. |
| updated_by | Updated By | Username of user who last modified function. |
| function_id | Function ID | Id of Function, relative to parent schema. |
Function Parameter
Maps to Databricks Function Parameter object type.
| Source Attribute | Target Attribute in Dawiso | Description |
|---|---|---|
| name | Name | Name of parameter. |
| type_text | Type Text | Full data type spec, SQL/catalogString text. |
| type_json | Type JSON | Full data type spec, JSON-serialized. |
| type_name | Type Name | Name of type (INT, STRUCT, MAP, etc.). |
| type_precision | Type Precision | Digits of precision; required on Create for DecimalTypes. |
| type_scale | Type Scale | Digits to right of decimal; Required on Create for DecimalTypes. |
| type_interval_type | Type Interval Type | Format of IntervalType. |
| position | Position | Ordinal position of column (starting at position 0). |
| parameter_mode | Parameter Mode | The mode of the function parameter. |
| parameter_type | Parameter Type | The type of function parameter. |
| parameter_default | Parameter Default | Default value of the parameter. |
| comment | Comment | User-provided free-form text description. |
Volume
Maps to Databricks View Column object type.
| Source Attribute | Target Attribute in Dawiso | Description |
|---|---|---|
| name | Name | The name of the volume. |
| catalog_name | Catalog Name | The name of the catalog where the schema and the volume are. |
| schema_name | Schema Name | The name of the schema where the volume is. |
| full_name | Full Name | The three-level (fully qualified) name of the volume. |
| volume_type | Volume Type | Type of Volume (EXTERNAL, MANAGED) |
| owner | Owner | The identifier of the user who owns the volume. |
| volume_id | Volume ID | The unique identifier of the volume |
| metastore_id | Metastore ID | The unique identifier of the metastore |
| created_at | Created At | Time at which this volume was created. |
| created_by | Created By | The identifier of the user who created the volume. |
| updated_at | Updated At | Time at which this volume was last modified. |
| updated_by | Updated By | The identifier of the user who updated the volume last time. |
| storage_location | Storage Location | The storage location on the cloud. |
| comment | Comment | The comment attached to the volume. |
Model
Maps to Databricks View Column object type.
| Source Attribute | Target Attribute in Dawiso | Description |
|---|---|---|
| name | Name | The name of the registered model. |
| catalog_name | Catalog Name | The name of the catalog where the schema and the registered model reside. |
| schema_name | Schema Name | The name of the schema where the registered model resides. |
| full_name | Full Name | The three-level (fully qualified) name of the registered model. |
| owner | Owner | The identifier of the user who owns the registered model. |
| model_id | Model ID | The unique identifier of the model. |
| metastore_id | Metastore ID | The unique identifier of the metastore. |
| created_at | Created At | Creation timestamp of the registered model. |
| created_by | Created By | The identifier of the user who created the registered model. |
| updated_at | Updated At | Last-update timestamp of the registered model. |
| updated_by | Updated By | The identifier of the user who updated the registered model last time. |
| storage_location | Storage Location | The storage location on the cloud under which model version data files are stored. |
| securable_kind | Securable Kind | Kind of catalog securable. |
| comment | Comment | The comment attached to the registered model. |