Data integration connects external data sources to Dawiso applications through a chain of five asset types. Each link must be configured correctly — a missing provider, an incomplete mapping, or a wrong version key causes the scan button to disappear without error.
The integration chain
Five asset types form a pipeline from connection metadata to attribute-level field mapping:
providerCategories → providers → dataSourceDefinitions → dataIntegrations → application
| Asset type | Defined in | Purpose |
|---|---|---|
providerCategories | core.json only | Groups providers into UI categories (Storage, Analytics & BI, etc.) |
providers | core.json + 4 connector packages | Defines the connector type, connection form UI, and ingestion format |
dataSourceDefinitions | Each connector package (26 total) | Declares queries, versions, and query definitions for a data source |
dataIntegrations | Each connector package (26 total) | Maps query fields to object types and attribute types |
externalDataSources | None (legacy) | Replaced by the chain above. Schema exists but no package uses it. |
26 of 49 standard packages define a complete integration chain. The remaining packages are feature packages (business glossary, code lists), relationship bridges, or dashboards that do not ingest external data.
Provider categories
Provider categories group providers in the connection setup UI. The core package defines five categories.
| Property | Type | Required | Purpose |
|---|---|---|---|
key | string | Yes | Unique identifier |
name | string | Yes | Display name |
description | string | No | Description text |
translations | array | No | Localized name and description per language |
{
"providerCategories": [
{
"key": "data_platform",
"name": "Data Platforms",
"translations": [
{ "languageKey": "cs-CZ", "name": "Datové Platformy" }
]
}
]
}
The five standard categories are storage, data_platform, di_dwa (Data Integrations & Warehousing), analytics_bi (Analytics & BI Platforms), and file_types. Custom packages reference these categories by key — they do not need to redefine them.
Providers
A provider represents a connector type (Snowflake, Power BI, PostgreSQL). It defines the connection form UI and the ingestion message format.
| Property | Type | Required | Purpose |
|---|---|---|---|
key | string | Yes | Unique identifier. Must start with core prefix. |
name | string | Yes | Display name |
ingestionTypeKey | enum | Yes | "message_type_json" or "raw_json" |
connectionTemplate | object | Yes | UI form definition for connection setup |
description | string | No | Provider description |
iconKey | string | No | Reference to an icon asset |
providerCategoryKey | string | No | Reference to a provider category key |
isInstalled | boolean | No | Whether the provider’s backend connector is deployed. Default: false. |
documentationUrl | string | No | Link to external documentation |
translations | array | No | Localized name and description |
Connection template
The connectionTemplate defines a multi-step form for configuring the data source connection. Each step renders a form page in the UI.
{
"connectionTemplate": {
"$schema": "https://schema.dawiso.com/provider-schema.json",
"providerName": "core_snowflake",
"steps": [
{
"centerArea": {
"data": [
{
"type": "title",
"titleKey": "#di.provider.template.connection.core_snowflake.title",
"subtitleKey": "#di.provider.template.connection.core_snowflake.subtitle",
"showAppIcon": true
},
{
"type": "section",
"data": [
{
"type": "input",
"key": "host",
"labelKey": "#di.provider.template.connection.host.label",
"required": true,
"encrypted": false,
"multiLine": 0,
"onEditDeletePassword": false
},
{
"type": "input",
"key": "password",
"labelKey": "#di.provider.template.connection.password.label",
"required": true,
"encrypted": true,
"multiLine": 0,
"onEditDeletePassword": true
}
]
}
]
},
"buttons": {
"test": {
"labelKey": "#di.provider.template.connection.test_connection"
}
}
}
]
}
}
Form field types
The connectionTemplate supports ten form field types inside centerArea.data[]:
| Type | Required properties | Purpose |
|---|---|---|
input | key, required, encrypted, multiLine, onEditDeletePassword | Text input (plain or password) |
json_input | key, required, encrypted, multiLine, onEditDeletePassword | JSON editor input |
select | key, required, multiselect, onEditDeletePassword | Dropdown with data[] items |
schedule | key, required | Cron schedule picker |
checkbox | key, default, onEditDeletePassword | Boolean toggle |
text | — | Static text label |
title | showAppIcon | Step header with optional app icon |
section | — | Groups nested form fields |
workflow_select | key, required, onEditDeletePassword | Workflow state picker |
space_select | key, required, onEditDeletePassword | Space/hierarchy picker |
All text-bearing fields support labelKey, tooltipKey, and placeholderKey properties for i18n translation references. The labelVariant property controls typography: "h1" through "h5", "paragraph", "body1", "captionText1", and their underline/caps variants.
Set encrypted: true on password fields. Set onEditDeletePassword: true to clear the stored value when the user edits the connection.
Minimal vs full provider
Minimal (placeholder without connection UI):
{
"key": "core_amazon_s3",
"name": "Amazon S3",
"ingestionTypeKey": "message_type_json",
"iconKey": "core_amazon_s3",
"providerCategoryKey": "storage",
"isInstalled": false,
"connectionTemplate": {}
}
Full (deployed connector with form):
{
"key": "core_amazon_redshift",
"name": "Amazon Redshift",
"ingestionTypeKey": "message_type_json",
"iconKey": "core_amazon_redshift",
"providerCategoryKey": "data_platform",
"isInstalled": true,
"documentationUrl": "https://docs.dawiso.com/connectors/redshift",
"connectionTemplate": {
"$schema": "https://schema.dawiso.com/provider-schema.json",
"providerName": "core_amazon_redshift",
"steps": [{ "centerArea": { "data": [...] }, "buttons": { "test": { "labelKey": "..." } } }]
}
}
Data source definitions
A data source definition declares what queries a connector can run and what versions of the query set exist. Each connector package defines exactly one data source definition.
| Property | Type | Required | Purpose |
|---|---|---|---|
key | string | Yes | Unique identifier (typically "data_source_definition") |
name | string | Yes | Display name |
provider | string | Yes | Key of the provider this definition uses. Provider must be installed. |
description | string | No | Description text |
state | enum | No | "active", "inactive", "in-validation", "hidden". Default: "active". |
queries | array | No | Query (message type) definitions |
versions | array | No | Versioned query sets with options and definitions |
translations | array | No | Localized name and description |
Queries
Each query represents one type of data the connector retrieves — one object type or one relation type. Queries define what the scanner extracts.
| Property | Type | Required | Purpose |
|---|---|---|---|
key | string | Yes | Unique query identifier (e.g., "table", "view", "dependency") |
type | enum | Yes | "object" — entity data, "relation" — relationship data, "generic" — preprocessing/configuration |
abbreviation | string | No | Short display label. Maximum 100 characters. |
name | string | No | Display name |
description | string | No | Description |
state | enum | No | Same values as data source definition state |
translations | array | No | Localized name and description |
Query type determines how the ingestion pipeline processes the data:
"object"— creates or updates objects in the target object type"relation"— creates relationships between existing objects"generic"— runs setup or preprocessing steps (e.g.,"configure","prepare","scan_result")
A simple JDBC connector like PostgreSQL defines 9 queries. A cloud data warehouse like Snowflake defines 22. BI platforms like Power BI use "generic" queries for scan orchestration alongside "object" queries for each asset type.
Versions
Versions group query definitions and configure ingestion behavior. Most connectors define a single version named "default".
| Property | Type | Required | Purpose |
|---|---|---|---|
key | string | Yes | Version identifier (typically "default") |
name | string | Yes | Display name (e.g., "Version 1.0") |
description | string | No | Description |
state | enum | No | Same values as data source definition state |
options | object | No | Ingestion behavior settings |
queryDefinitions | array | No | SQL or API query templates per query key |
template | object | No | Ingestion template (same schema as provider connectionTemplate) |
translations | array | No | Localized name and description |
Version options
| Property | Type | Required | Purpose |
|---|---|---|---|
batchSize | integer | Yes | Number of records per ingestion batch (typical: 1000–5000) |
convertBooleanToNumeric | boolean | Yes | Convert boolean values to 0/1 |
createChangeLogs | boolean | Yes | Track attribute value changes over time |
createVersions | boolean | Yes | Create versioned snapshots of ingested data |
ingestionFormat | string | No | "diff" (incremental) or "full" (complete replacement) |
{
"options": {
"batchSize": 5000,
"ingestionFormat": "diff",
"convertBooleanToNumeric": true,
"createChangeLogs": false,
"createVersions": false
}
}
Most JDBC connectors use "diff" format with batchSize: 5000. Data platforms like Keboola use "full" format with batchSize: 1000.
Query definitions
Each query definition contains the actual SQL or API query text, field declarations, and processing options.
| Property | Type | Required | Purpose |
|---|---|---|---|
queryKey | string | Yes | References a query key from the queries array |
format | string | Yes | Query format identifier |
order | integer | Yes | Execution order within the version |
definition | string | No | The query text (SQL, API path, or JSON template) |
isExecutable | boolean | No | Whether this query runs during scan |
isOrdered | boolean | No | Whether result ordering matters |
isCompressed | boolean | No | Whether the response is compressed |
state | enum | No | Same values as data source definition state |
fields | array | No | Field declarations for the query result |
options | object | No | Additional options (relationTypeKey, scanMasterOnly) |
Query definition fields
Fields describe the columns or properties returned by a query. They drive the scan pipeline’s understanding of the result structure.
| Property | Type | Purpose |
|---|---|---|
key | string | Field identifier matching the source system column name |
isKey | boolean | Primary identifier for the object |
isName | boolean | Display name field |
isParentKey | boolean | Reference to parent object (builds hierarchy) |
isFromKey | boolean | Source object in a relation (for "relation" queries) |
isToKey | boolean | Target object in a relation (for "relation" queries) |
isRelationTypeKey | boolean | Relation type identifier (for "relation" queries) |
These field flags tell the ingestion pipeline how to interpret each column. An "object" query needs at least isKey and isName. A "relation" query needs isFromKey, isToKey, and typically isRelationTypeKey.
Data integrations
Data integrations are the mapping layer. They connect a data source definition’s queries to an application’s object types and attribute types. This is where external fields become Dawiso attributes.
| Property | Type | Required | Purpose |
|---|---|---|---|
applicationKey | string | Yes | Target application |
dataSourceDefinitionKey | string | Yes | Data source definition to use |
versionKey | string | Yes | Data source version to use |
mappings | array | Yes | Query-to-object-type mappings |
Mappings
Each mapping connects one query to one object type and maps individual fields to attribute types.
| Property | Type | Required | Purpose |
|---|---|---|---|
queryKey | string | Yes | References a query from the data source definition |
objectTypeKey | string | Yes | Target object type in the application |
mapping | array | Yes | Field-to-attribute mappings |
Field mappings
| Property | Type | Required | Purpose |
|---|---|---|---|
fieldKey | string | Yes | Source field name (case-sensitive, matches the source system output) |
attributeTypeKey | string | Yes | Target attribute type on the object type |
{
"applicationKey": "app",
"dataSourceDefinitionKey": "data_source_definition",
"versionKey": "default",
"mappings": [
{
"queryKey": "database",
"objectTypeKey": "database",
"mapping": [
{ "fieldKey": "DESCRIPTION", "attributeTypeKey": "core_description_scanned" },
{ "fieldKey": "DATABASE_OWNER", "attributeTypeKey": "owner" },
{ "fieldKey": "CREATED", "attributeTypeKey": "created" },
{ "fieldKey": "LAST_ALTERED", "attributeTypeKey": "last_altered" }
]
},
{
"queryKey": "table",
"objectTypeKey": "table",
"mapping": [
{ "fieldKey": "DESCRIPTION", "attributeTypeKey": "core_description_scanned" },
{ "fieldKey": "TABLE_OWNER", "attributeTypeKey": "owner" },
{ "fieldKey": "TABLE_TYPE", "attributeTypeKey": "type" },
{ "fieldKey": "ROW_COUNT", "attributeTypeKey": "row_count" },
{ "fieldKey": "BYTES", "attributeTypeKey": "bytes" }
]
}
]
}
The fieldKey value must match exactly what the source system returns — column names from SQL connectors are typically uppercase (DESCRIPTION, ROW_COUNT), while API connectors may use camelCase (DatasetId, ConfiguredBy).
Use core_description_scanned as the attributeTypeKey for description fields across all connectors. This core attribute type has search features pre-configured.
Mapping cardinality
The number of field mappings varies by connector complexity:
| Connector type | Queries | Fields per query | Example |
|---|---|---|---|
| Simple JDBC | 9 | 4–8 | PostgreSQL |
| Cloud data warehouse | 22 | 6–15 | Snowflake |
| BI platform | 20+ | 8–30+ | Power BI |
Loader execution order
The backend processes data integration assets through 17 loaders in strict numeric order. Each loader depends on assets created by earlier loaders.
| Position | Loader | Asset | Validates |
|---|---|---|---|
| 109 | DI_C_PROVIDER_CATEGORY | Provider categories | Nothing |
| 111 | DI_C_PROVIDER | Providers | ingestionTypeKey exists, iconKey exists (optional), key starts with core |
| 113 | R_PROVIDER_CATEGORY_PROVIDER | Provider ↔ category link | Both provider and category exist |
| 118 | C_DATA_SOURCE_TYPE | Data source definitions | provider exists and has isInstalled: true |
| 120 | DI_DATA_SOURCE_VERSION | Versions | Parent data source type exists |
| 122 | DI_MESSAGE_TYPE | Queries (message types) | abbreviation max 100 chars |
| 124 | DI_MESSAGE_VERSION | Message versions | — |
| 126 | DI_R_DATA_SOURCE_VERSION_MESSAGE_VERSION | Version ↔ query links | — |
| 220 | R_APPLICATION_DATA_SOURCE_VERSION | Data integrations | See below |
The most complex loader is at position 220. It validates the complete mapping chain:
dataSourceDefinitionKeymust reference an existing data source typeversionKeymust reference an existing data source version- Each
queryKeyin mappings must reference an existing query (message type) - Each
objectTypeKeymust reference an existing object type - The object type must be assigned to the referenced application
- Each
attributeTypeKeymust reference an existing attribute type - The attribute type must be assigned to the referenced object type
If any check fails, the loader collects all errors and throws a combined exception.
Configuration patterns
Pattern A: Simple JDBC connector
PostgreSQL, MySQL, Oracle, SQL Server. These connectors scan database metadata through SQL system catalogs.
- Queries: 8–10 object queries (database, schema, table, table_column, view, view_column, function, procedure) + 1 relation query (dependency)
- Version options:
batchSize: 5000,ingestionFormat: "diff",convertBooleanToNumeric: true - Field mappings: 4–8 fields per object type (owner, description, OID, type)
Pattern B: Cloud data warehouse
Snowflake, Amazon Redshift, Google BigQuery, Teradata. These add platform-specific object types beyond standard SQL metadata.
- Queries: 15–22 object queries (standard SQL types + dynamic tables, streams, stages, pipes, tasks, file formats)
- Generic queries:
"configure"and"prepare"run setup before scanning - Version options: Same as JDBC, sometimes with
createChangeLogs: true - Field mappings: 6–15 fields per object type (more metadata from information schema)
Pattern C: BI platform
Power BI, Tableau, Qlik, Metabase, SSRS. These scan report catalogs and dataset metadata through REST APIs.
- Queries: 15–20+ object queries (workspaces, datasets, tables, columns, reports, dashboards, dataflows)
- Generic queries:
"scan_result"for API response preprocessing - Version options:
batchSizevaries (1000–5000),ingestionFormat: "diff"or"full" - Field mappings: 8–30+ fields per object type (API returns rich metadata)
Gotchas
Provider isInstalled check is not enforced. The C_DATA_SOURCE_TYPE loader (position 118) validates that the referenced provider has isInstalled: true. If the check fails, the loader logs an error but does not throw an exception. The data source definition installs and appears in the UI, but the scan button does not work. Set isInstalled: true on every provider that has a deployed backend connector.
Provider key must start with “core”. The DI_C_PROVIDER loader (position 111) enforces that provider keys begin with the core prefix. A key like "snowflake" or "custom_snowflake" throws a PkgLoaderException and the entire package installation fails. Use "core_snowflake" or "core_my_connector".
Application-to-data-source mapping validates seven cross-references. The R_APPLICATION_DATA_SOURCE_VERSION loader (position 220) checks the complete chain: data source type, version, query, object type, application membership, attribute type, and attribute assignment. A missing objectTypeKey in the application’s objectTypeKeys[] array causes the mapping to fail. Verify that every mapped object type appears in the application definition.
Query abbreviation truncation. The DI_MESSAGE_TYPE loader validates that abbreviation does not exceed 100 characters. Longer values cause a validation error. Keep abbreviations short — they serve as display labels in the scan status UI.
externalDataSources is legacy. The schema defines externalDataSources with 8 properties including uuid, type, name, applicationKey, querySetKey, queryMappingKey, hierarchyDefinitionKey, and spacePath. No standard package uses this asset type. New connectors should use the dataSourceDefinitions + dataIntegrations chain instead.
Field key case sensitivity. The fieldKey in data integration mappings must match exactly what the source system returns. SQL connectors typically return uppercase column names (DESCRIPTION, ROW_COUNT). API connectors return the casing defined by the API (DatasetId, ConfiguredBy). A case mismatch causes the field to be silently ignored during scan.
Complete example
A minimal but functional connector package with all required assets:
{
"$schema": "https://schema.dawiso.com/package-schema.json",
"package": {
"key": "core_my_connector",
"name": "My Connector",
"dependsOn": [{ "packageKey": "core" }],
"assets": {
"providers": [
{
"key": "core_my_connector",
"name": "My Connector",
"ingestionTypeKey": "message_type_json",
"providerCategoryKey": "data_platform",
"iconKey": "core_my_connector",
"isInstalled": true,
"connectionTemplate": {
"$schema": "https://schema.dawiso.com/provider-schema.json",
"providerName": "core_my_connector",
"steps": [
{
"centerArea": {
"data": [
{
"type": "title",
"titleKey": "#di.provider.template.connection.core_my_connector.title",
"showAppIcon": true
},
{
"type": "section",
"data": [
{
"type": "input",
"key": "host",
"labelKey": "#di.provider.template.connection.host.label",
"required": true,
"encrypted": false,
"multiLine": 0,
"onEditDeletePassword": false
},
{
"type": "input",
"key": "password",
"labelKey": "#di.provider.template.connection.password.label",
"required": true,
"encrypted": true,
"multiLine": 0,
"onEditDeletePassword": true
}
]
}
]
},
"buttons": {
"test": {
"labelKey": "#di.provider.template.connection.test_connection"
}
}
}
]
}
}
],
"dataSourceDefinitions": [
{
"key": "data_source_definition",
"name": "My Connector Data Source",
"provider": "core_my_connector",
"queries": [
{ "key": "schema", "type": "object", "abbreviation": "schema" },
{ "key": "table", "type": "object", "abbreviation": "table" },
{ "key": "column", "type": "object", "abbreviation": "column" },
{ "key": "dependency", "type": "relation", "abbreviation": "dependency" }
],
"versions": [
{
"key": "default",
"name": "Version 1.0",
"options": {
"batchSize": 5000,
"ingestionFormat": "diff",
"convertBooleanToNumeric": true,
"createChangeLogs": false,
"createVersions": false
},
"queryDefinitions": [
{
"queryKey": "schema",
"format": "json",
"order": 1,
"isExecutable": true,
"fields": [
{ "key": "KEY", "isKey": true },
{ "key": "NAME", "isName": true },
{ "key": "DESCRIPTION" }
]
},
{
"queryKey": "table",
"format": "json",
"order": 2,
"isExecutable": true,
"fields": [
{ "key": "KEY", "isKey": true },
{ "key": "NAME", "isName": true },
{ "key": "PARENT_KEY", "isParentKey": true },
{ "key": "DESCRIPTION" },
{ "key": "ROW_COUNT" }
]
},
{
"queryKey": "column",
"format": "json",
"order": 3,
"isExecutable": true,
"fields": [
{ "key": "KEY", "isKey": true },
{ "key": "NAME", "isName": true },
{ "key": "PARENT_KEY", "isParentKey": true },
{ "key": "DATA_TYPE" }
]
},
{
"queryKey": "dependency",
"format": "json",
"order": 4,
"isExecutable": true,
"fields": [
{ "key": "FROM_KEY", "isFromKey": true },
{ "key": "TO_KEY", "isToKey": true },
{ "key": "RELATION_TYPE", "isRelationTypeKey": true }
]
}
]
}
]
}
],
"dataIntegrations": [
{
"applicationKey": "app",
"dataSourceDefinitionKey": "data_source_definition",
"versionKey": "default",
"mappings": [
{
"queryKey": "schema",
"objectTypeKey": "schema",
"mapping": [
{ "fieldKey": "DESCRIPTION", "attributeTypeKey": "core_description_scanned" }
]
},
{
"queryKey": "table",
"objectTypeKey": "table",
"mapping": [
{ "fieldKey": "DESCRIPTION", "attributeTypeKey": "core_description_scanned" },
{ "fieldKey": "ROW_COUNT", "attributeTypeKey": "row_count" }
]
},
{
"queryKey": "column",
"objectTypeKey": "column",
"mapping": [
{ "fieldKey": "DATA_TYPE", "attributeTypeKey": "data_type" }
]
}
]
}
],
"applications": [
{
"key": "app",
"name": "My Connector",
"state": "active",
"colorKey": "core_my_connector",
"iconKey": "core_my_connector",
"objectTypeKeys": ["schema", "table", "column"],
"hierarchyDefinitions": [
{ "hierarchyDefinitionKey": "default", "isDefault": true }
],
"settings": [
{ "key": "read_only_hierarchy", "value": true }
]
}
]
}
}
}
This example defines a connector with three object types (schema → table → column) and one relation query for dependencies. The data integration maps three object queries to their corresponding object types, with field-level attribute mappings. The application declares the object types and enables read-only hierarchy navigation.
To make this work in production, the package also needs: object type definitions with templates, attribute type definitions, hierarchy definitions, relation types for parent-child relationships, and a graph metamodel for lineage visualization. See Package Patterns for the complete Connector Entity pattern.