Databricks is a unified analytics platform based on Apache Spark, streamlining data science and engineering workflows for scalable data processing, advanced analytics, and machine learning. Ideal for collaborative environments, it supports real-time data processing and integrates seamlessly with diverse data sources.

Supported versions

Dawiso supports reading Databricks metadata through their multi-purpose REST API. To support this, ensure that:

Default Hierarchy

Dawiso Adjusted Hierarchy

In the chart below, you will find the organizational structure of Databricks objects. Objects are in represented in green and folders are in yellow.

inline-svg-1

Data Lineage

Data Lineage in Databricks can be tracked either over tables or columns using the Data Lineage API.

Table Lineage

FieldValuesData Type
Flow DirectionUpstream / DownstreamList of objects
Table InfoName, Catalog Name, Schema Name, Table TypeString (name of object)
Notebook InfoWorkspace ID, Notebook IDINT (ID of object)

Column Lineage

FieldValuesData Type
Flow DirectionUpstream columns / Downstream columnsList of objects
Column InfoName, Catalog Name, Schema Name, Table Name, Table TypeString (name of object)

Object Types

Dawiso ingests the following objects from Databricks:

Catalog

Maps to Databricks Catalog object type.

Source AttributeTarget Attribute in DawisoDescription
nameNameName of catalog.
ownerOwnerUsername of current owner of catalog.
commentCommentUser-provided free-form text description.
storage_rootStorage RootStorage root URL for managed tables within catalog.
metastore_idMetastore IDUnique identifier of parent metastore.
created_atCreated AtTime at which this catalog was created.
created_byCreated ByUsername of catalog creator.
updated_atUpdated AtTime at which this catalog was last modified.
updated_byUpdated ByUsername of user who last modified catalog.
catalog_typeCatalog TypeThe type of the catalog.
storage_locationStorage LocationStorage Location URL (full path) for managed tables within catalog.
storage_locationStorage LocationTime at which this catalog was created.
isolation_modeIsolation ModeWhether the current securable is accessible from all workspaces or a specific set of workspaces.
full_nameFull NameThe full name of the catalog. Corresponds with the name field
securable_kindSecurable KindKind of catalog securable.
securable_typeSecurable TypeDefault “CATALOG”.

Schema

Maps to Databricks Schema object type.

Source AttributeTarget Attribute in DawisoDescription
nameNameName of schema, relative to parent catalog.
catalog_nameCatalog NameName of parent catalog.
ownerOwnerUsername of current owner of schema.
commentCommentUser-provided free-form text description.
storage_rootStorage RootStorage root URL for managed tables within schema.
metastore_idMetastore IDUnique identifier of parent metastore.
full_nameFull NameFull name of schema, in form of catalog_name.schema_name.
storage_locationStorage LocationStorage location for managed tables within schema.
created_atCreated AtTime at which this schema was created.
created_byCreated ByUsername of schema creator.
updated_atUpdated AtTime at which this schema was created.
updated_byUpdated ByUsername of user who last modified schema.
catalog_typeCatalog TypeThe type of the parent catalog.

Table

Maps to Databricks Table object type.

Source AttributeTarget Attribute in DawisoDescription
nameNameName of table, relative to parent schema.
catalog_nameCatalog NameName of parent catalog.
schema_nameSchema NameName of parent schema relative to its parent catalog.
table_typeTable TypeType of table (MANAGED, EXTERNAL)
data_source_formatData Source FormatData source format.
storage_locationStorage LocationStorage root URL for table
sql_pathSQL PathList of schemes whose objects can be referenced without qualification.
ownerOwnerUsername of current owner of table.
commentCommentUser-provided free-form text description.
storage_credential_nameStorage Credential NameName of the storage credential, when a storage credential is configured for use with this table.
metastore_idMetastore IDUnique identifier of parent metastore.
full_nameFull NameFull name of table, in form of catalog_name.schema_name.table_name
created_atCreated AtTime at which this table was created.
created_byCreated ByUsername of table creator.
updated_atUpdated AtTime at which this table was last modified.
updated_byUpdated ByUsername of user who last modified the table.
deleted_atDeleted AtTime at which this table was deleted. Field is omitted if table is not deleted.
table_idTable IDThe unique identifier of the table.
pipeline_idPipeline IDThe pipeline ID of the table. Applicable for tables created by pipelines (Materialized View, Streaming Table, etc.).

Table Column

Maps to Databricks Table object type.

Source AttributeTarget Attribute in DawisoDescription
nameNameName of Column.
type_textFull Data TypeFull data type specification as SQL/catalogString text.
type_jsonJSON’d Data TypeFull data type specification, JSON-serialized.
type_nameType NameName of type (INT, STRUCT, MAP, etc.).
type_precisionType PrecisionDigits of precision; required for DecimalTypes.
type_scaleType ScaleDigits to right of decimal; Required for DecimalTypes.
type_interval_typeType Interval TypeFormat of IntervalType.
positionPositionOrdinal position of column (starting at position 0).
commentCommentUser-provided free-form text description.
nullableNullableWhether field may be Null (default: true).
partition_indexPartition IndexPartition index for column.

View

Maps to Databricks View object type.

Source AttributeTarget Attribute in DawisoDescription
nameNameName of view, relative to parent schema.
catalog_nameCatalog NameName of parent catalog.
schema_nameSchema NameName of parent schema relative to its parent catalog.
table_typeTable TypeType of table (VIEW, MATERIALIZED_VIEW, STREAMING_TABLE)
data_source_formatData Source FormatData source format.
viev_definitionView DefinitionView definition SQL (when table_type is VIEW, MATERIALIZED_VIEW, or STREAMING_TABLE)
view_dependenciesView DependenciesView dependencies (when table_type == VIEW or MATERIALIZED_VIEW, STREAMING_TABLE)
storage_locationStorage LocationStorage root URL for view
sql_pathSQL PathList of schemes whose objects can be referenced without qualification.
ownerOwnerUsername of current owner of view.
commentCommentUser-provided free-form text description.
storage_credential_nameStorage Credential NameName of the storage credential, when a storage credential is configured for use with this view.
metastore_idMetastore IDUnique identifier of parent metastore.
full_nameFull NameFull name of view, in form of catalog_name.schema_name.view_name
created_atCreated AtTime at which this view was created.
created_byCreated ByUsername of view creator.
updated_atUpdated AtTime at which this view was last modified.
updated_byUpdated ByUsername of user who last modified the view.
deleted_atDeleted AtTime at which this view was deleted. Field is omitted if table is not deleted.
table_idTable IDThe unique identifier of the view.
pipeline_idPipeline IDThe pipeline ID of the view. Applicable for tables created by pipelines (Materialized View, Streaming Table, etc.).

View Column

Maps to Databricks Volume object type.

Source AttributeTarget Attribute in DawisoDescription
nameNameName of Column.
type_textFull Data TypeFull data type specification as SQL/catalogString text.
type_jsonJSON’d Data TypeFull data type specification, JSON-serialized.
type_nameType NameName of type (INT, STRUCT, MAP, etc.).
type_precisionType PrecisionDigits of precision; required for DecimalTypes.
type_scaleType ScaleDigits to right of decimal; Required for DecimalTypes.
type_interval_typeType Interval TypeFormat of IntervalType.
positionPositionOrdinal position of column (starting at position 0).
commentCommentUser-provided free-form text description.
nullableNullableWhether field may be Null (default: true).
partition_indexPartition IndexPartition index for column.

Function

Maps to Databricks Function object type.

Source AttributeTarget Attribute in DawisoDescription
nameNameName of function, relative to parent schema.
catalog_nameCatalog NameName of parent catalog.
schema_nameSchema NameName of parent schema relative to its parent catalog.
data_typeData TypeScalar function return data type.
full_data_typeFull Data TypePretty printed function data type.
routine_bodyRoutine BodyFunction language (SQL, EXTERNAL).
routine_definitionRoutine DefinitionFunction body.
parameter_styleParameter StyleFunction parameter style. S is the value for SQL.
is_deterministicIs DeterministicWhether the function is deterministic.
sql_data_accessSQL Data AccessFunction SQL data access.
is_null_callIs NULL CallFunction null call.
security_typeSecurity TypeFunction security type.
specific_nameSpecific NameSpecific name of the function.
external_nameExternal NameExternal function name.
external_languageExternal LanguageExternal function language.
sql_pathSQL PathList of schemes whose objects can be referenced without qualification.
ownerOwnerUsername of current owner of function.
commentCommentUser-provided free-form text description.
metastore_idMetastore IDUnique identifier of parent metastore.
full_nameFull NameFull name of function, in form of catalog_name.schema_name.function__name
created_atCreated AtTime at which this function was created.
created_byCreated ByUsername of function creator.
updated_atUpdated AtTime at which this function was created.
updated_byUpdated ByUsername of user who last modified function.
function_idFunction IDId of Function, relative to parent schema.

Function Parameter

Maps to Databricks Function Parameter object type.

Source AttributeTarget Attribute in DawisoDescription
nameNameName of parameter.
type_textType TextFull data type spec, SQL/catalogString text.
type_jsonType JSONFull data type spec, JSON-serialized.
type_nameType NameName of type (INT, STRUCT, MAP, etc.).
type_precisionType PrecisionDigits of precision; required on Create for DecimalTypes.
type_scaleType ScaleDigits to right of decimal; Required on Create for DecimalTypes.
type_interval_typeType Interval TypeFormat of IntervalType.
positionPositionOrdinal position of column (starting at position 0).
parameter_modeParameter ModeThe mode of the function parameter.
parameter_typeParameter TypeThe type of function parameter.
parameter_defaultParameter DefaultDefault value of the parameter.
commentCommentUser-provided free-form text description.

Volume

Maps to Databricks View Column object type.

Source AttributeTarget Attribute in DawisoDescription
nameNameThe name of the volume.
catalog_nameCatalog NameThe name of the catalog where the schema and the volume are.
schema_nameSchema NameThe name of the schema where the volume is.
full_nameFull NameThe three-level (fully qualified) name of the volume.
volume_typeVolume TypeType of Volume (EXTERNAL, MANAGED)
ownerOwnerThe identifier of the user who owns the volume.
volume_idVolume IDThe unique identifier of the volume
metastore_idMetastore IDThe unique identifier of the metastore
created_atCreated AtTime at which this volume was created.
created_byCreated ByThe identifier of the user who created the volume.
updated_atUpdated AtTime at which this volume was last modified.
updated_byUpdated ByThe identifier of the user who updated the volume last time.
storage_locationStorage LocationThe storage location on the cloud.
commentCommentThe comment attached to the volume.

Model

Maps to Databricks View Column object type.

Source AttributeTarget Attribute in DawisoDescription
nameNameThe name of the registered model.
catalog_nameCatalog NameThe name of the catalog where the schema and the registered model reside.
schema_nameSchema NameThe name of the schema where the registered model resides.
full_nameFull NameThe three-level (fully qualified) name of the registered model.
ownerOwnerThe identifier of the user who owns the registered model.
model_idModel IDThe unique identifier of the model.
metastore_idMetastore IDThe unique identifier of the metastore.
created_atCreated AtCreation timestamp of the registered model.
created_byCreated ByThe identifier of the user who created the registered model.
updated_atUpdated AtLast-update timestamp of the registered model.
updated_byUpdated ByThe identifier of the user who updated the registered model last time.
storage_locationStorage LocationThe storage location on the cloud under which model version data files are stored.
securable_kindSecurable KindKind of catalog securable.
commentCommentThe comment attached to the registered model.