Data integration connects external data sources to Dawiso applications through a chain of five asset types. Each link must be configured correctly — a missing provider, an incomplete mapping, or a wrong version key causes the scan button to disappear without error.

The integration chain

Five asset types form a pipeline from connection metadata to attribute-level field mapping:

providerCategories → providers → dataSourceDefinitions → dataIntegrations → application
Asset typeDefined inPurpose
providerCategoriescore.json onlyGroups providers into UI categories (Storage, Analytics & BI, etc.)
providerscore.json + 4 connector packagesDefines the connector type, connection form UI, and ingestion format
dataSourceDefinitionsEach connector package (26 total)Declares queries, versions, and query definitions for a data source
dataIntegrationsEach connector package (26 total)Maps query fields to object types and attribute types
externalDataSourcesNone (legacy)Replaced by the chain above. Schema exists but no package uses it.

26 of 49 standard packages define a complete integration chain. The remaining packages are feature packages (business glossary, code lists), relationship bridges, or dashboards that do not ingest external data.

Provider categories

Provider categories group providers in the connection setup UI. The core package defines five categories.

PropertyTypeRequiredPurpose
keystringYesUnique identifier
namestringYesDisplay name
descriptionstringNoDescription text
translationsarrayNoLocalized name and description per language
{
  "providerCategories": [
    {
      "key": "data_platform",
      "name": "Data Platforms",
      "translations": [
        { "languageKey": "cs-CZ", "name": "Datové Platformy" }
      ]
    }
  ]
}

The five standard categories are storage, data_platform, di_dwa (Data Integrations & Warehousing), analytics_bi (Analytics & BI Platforms), and file_types. Custom packages reference these categories by key — they do not need to redefine them.

Providers

A provider represents a connector type (Snowflake, Power BI, PostgreSQL). It defines the connection form UI and the ingestion message format.

PropertyTypeRequiredPurpose
keystringYesUnique identifier. Must start with core prefix.
namestringYesDisplay name
ingestionTypeKeyenumYes"message_type_json" or "raw_json"
connectionTemplateobjectYesUI form definition for connection setup
descriptionstringNoProvider description
iconKeystringNoReference to an icon asset
providerCategoryKeystringNoReference to a provider category key
isInstalledbooleanNoWhether the provider’s backend connector is deployed. Default: false.
documentationUrlstringNoLink to external documentation
translationsarrayNoLocalized name and description

Connection template

The connectionTemplate defines a multi-step form for configuring the data source connection. Each step renders a form page in the UI.

{
  "connectionTemplate": {
    "$schema": "https://schema.dawiso.com/provider-schema.json",
    "providerName": "core_snowflake",
    "steps": [
      {
        "centerArea": {
          "data": [
            {
              "type": "title",
              "titleKey": "#di.provider.template.connection.core_snowflake.title",
              "subtitleKey": "#di.provider.template.connection.core_snowflake.subtitle",
              "showAppIcon": true
            },
            {
              "type": "section",
              "data": [
                {
                  "type": "input",
                  "key": "host",
                  "labelKey": "#di.provider.template.connection.host.label",
                  "required": true,
                  "encrypted": false,
                  "multiLine": 0,
                  "onEditDeletePassword": false
                },
                {
                  "type": "input",
                  "key": "password",
                  "labelKey": "#di.provider.template.connection.password.label",
                  "required": true,
                  "encrypted": true,
                  "multiLine": 0,
                  "onEditDeletePassword": true
                }
              ]
            }
          ]
        },
        "buttons": {
          "test": {
            "labelKey": "#di.provider.template.connection.test_connection"
          }
        }
      }
    ]
  }
}

Form field types

The connectionTemplate supports ten form field types inside centerArea.data[]:

TypeRequired propertiesPurpose
inputkey, required, encrypted, multiLine, onEditDeletePasswordText input (plain or password)
json_inputkey, required, encrypted, multiLine, onEditDeletePasswordJSON editor input
selectkey, required, multiselect, onEditDeletePasswordDropdown with data[] items
schedulekey, requiredCron schedule picker
checkboxkey, default, onEditDeletePasswordBoolean toggle
textStatic text label
titleshowAppIconStep header with optional app icon
sectionGroups nested form fields
workflow_selectkey, required, onEditDeletePasswordWorkflow state picker
space_selectkey, required, onEditDeletePasswordSpace/hierarchy picker

All text-bearing fields support labelKey, tooltipKey, and placeholderKey properties for i18n translation references. The labelVariant property controls typography: "h1" through "h5", "paragraph", "body1", "captionText1", and their underline/caps variants.

Set encrypted: true on password fields. Set onEditDeletePassword: true to clear the stored value when the user edits the connection.

Minimal vs full provider

Minimal (placeholder without connection UI):

{
  "key": "core_amazon_s3",
  "name": "Amazon S3",
  "ingestionTypeKey": "message_type_json",
  "iconKey": "core_amazon_s3",
  "providerCategoryKey": "storage",
  "isInstalled": false,
  "connectionTemplate": {}
}

Full (deployed connector with form):

{
  "key": "core_amazon_redshift",
  "name": "Amazon Redshift",
  "ingestionTypeKey": "message_type_json",
  "iconKey": "core_amazon_redshift",
  "providerCategoryKey": "data_platform",
  "isInstalled": true,
  "documentationUrl": "https://docs.dawiso.com/connectors/redshift",
  "connectionTemplate": {
    "$schema": "https://schema.dawiso.com/provider-schema.json",
    "providerName": "core_amazon_redshift",
    "steps": [{ "centerArea": { "data": [...] }, "buttons": { "test": { "labelKey": "..." } } }]
  }
}

Data source definitions

A data source definition declares what queries a connector can run and what versions of the query set exist. Each connector package defines exactly one data source definition.

PropertyTypeRequiredPurpose
keystringYesUnique identifier (typically "data_source_definition")
namestringYesDisplay name
providerstringYesKey of the provider this definition uses. Provider must be installed.
descriptionstringNoDescription text
stateenumNo"active", "inactive", "in-validation", "hidden". Default: "active".
queriesarrayNoQuery (message type) definitions
versionsarrayNoVersioned query sets with options and definitions
translationsarrayNoLocalized name and description

Queries

Each query represents one type of data the connector retrieves — one object type or one relation type. Queries define what the scanner extracts.

PropertyTypeRequiredPurpose
keystringYesUnique query identifier (e.g., "table", "view", "dependency")
typeenumYes"object" — entity data, "relation" — relationship data, "generic" — preprocessing/configuration
abbreviationstringNoShort display label. Maximum 100 characters.
namestringNoDisplay name
descriptionstringNoDescription
stateenumNoSame values as data source definition state
translationsarrayNoLocalized name and description

Query type determines how the ingestion pipeline processes the data:

  • "object" — creates or updates objects in the target object type
  • "relation" — creates relationships between existing objects
  • "generic" — runs setup or preprocessing steps (e.g., "configure", "prepare", "scan_result")

A simple JDBC connector like PostgreSQL defines 9 queries. A cloud data warehouse like Snowflake defines 22. BI platforms like Power BI use "generic" queries for scan orchestration alongside "object" queries for each asset type.

Versions

Versions group query definitions and configure ingestion behavior. Most connectors define a single version named "default".

PropertyTypeRequiredPurpose
keystringYesVersion identifier (typically "default")
namestringYesDisplay name (e.g., "Version 1.0")
descriptionstringNoDescription
stateenumNoSame values as data source definition state
optionsobjectNoIngestion behavior settings
queryDefinitionsarrayNoSQL or API query templates per query key
templateobjectNoIngestion template (same schema as provider connectionTemplate)
translationsarrayNoLocalized name and description

Version options

PropertyTypeRequiredPurpose
batchSizeintegerYesNumber of records per ingestion batch (typical: 1000–5000)
convertBooleanToNumericbooleanYesConvert boolean values to 0/1
createChangeLogsbooleanYesTrack attribute value changes over time
createVersionsbooleanYesCreate versioned snapshots of ingested data
ingestionFormatstringNo"diff" (incremental) or "full" (complete replacement)
{
  "options": {
    "batchSize": 5000,
    "ingestionFormat": "diff",
    "convertBooleanToNumeric": true,
    "createChangeLogs": false,
    "createVersions": false
  }
}

Most JDBC connectors use "diff" format with batchSize: 5000. Data platforms like Keboola use "full" format with batchSize: 1000.

Query definitions

Each query definition contains the actual SQL or API query text, field declarations, and processing options.

PropertyTypeRequiredPurpose
queryKeystringYesReferences a query key from the queries array
formatstringYesQuery format identifier
orderintegerYesExecution order within the version
definitionstringNoThe query text (SQL, API path, or JSON template)
isExecutablebooleanNoWhether this query runs during scan
isOrderedbooleanNoWhether result ordering matters
isCompressedbooleanNoWhether the response is compressed
stateenumNoSame values as data source definition state
fieldsarrayNoField declarations for the query result
optionsobjectNoAdditional options (relationTypeKey, scanMasterOnly)

Query definition fields

Fields describe the columns or properties returned by a query. They drive the scan pipeline’s understanding of the result structure.

PropertyTypePurpose
keystringField identifier matching the source system column name
isKeybooleanPrimary identifier for the object
isNamebooleanDisplay name field
isParentKeybooleanReference to parent object (builds hierarchy)
isFromKeybooleanSource object in a relation (for "relation" queries)
isToKeybooleanTarget object in a relation (for "relation" queries)
isRelationTypeKeybooleanRelation type identifier (for "relation" queries)

These field flags tell the ingestion pipeline how to interpret each column. An "object" query needs at least isKey and isName. A "relation" query needs isFromKey, isToKey, and typically isRelationTypeKey.

Data integrations

Data integrations are the mapping layer. They connect a data source definition’s queries to an application’s object types and attribute types. This is where external fields become Dawiso attributes.

PropertyTypeRequiredPurpose
applicationKeystringYesTarget application
dataSourceDefinitionKeystringYesData source definition to use
versionKeystringYesData source version to use
mappingsarrayYesQuery-to-object-type mappings

Mappings

Each mapping connects one query to one object type and maps individual fields to attribute types.

PropertyTypeRequiredPurpose
queryKeystringYesReferences a query from the data source definition
objectTypeKeystringYesTarget object type in the application
mappingarrayYesField-to-attribute mappings

Field mappings

PropertyTypeRequiredPurpose
fieldKeystringYesSource field name (case-sensitive, matches the source system output)
attributeTypeKeystringYesTarget attribute type on the object type
{
  "applicationKey": "app",
  "dataSourceDefinitionKey": "data_source_definition",
  "versionKey": "default",
  "mappings": [
    {
      "queryKey": "database",
      "objectTypeKey": "database",
      "mapping": [
        { "fieldKey": "DESCRIPTION", "attributeTypeKey": "core_description_scanned" },
        { "fieldKey": "DATABASE_OWNER", "attributeTypeKey": "owner" },
        { "fieldKey": "CREATED", "attributeTypeKey": "created" },
        { "fieldKey": "LAST_ALTERED", "attributeTypeKey": "last_altered" }
      ]
    },
    {
      "queryKey": "table",
      "objectTypeKey": "table",
      "mapping": [
        { "fieldKey": "DESCRIPTION", "attributeTypeKey": "core_description_scanned" },
        { "fieldKey": "TABLE_OWNER", "attributeTypeKey": "owner" },
        { "fieldKey": "TABLE_TYPE", "attributeTypeKey": "type" },
        { "fieldKey": "ROW_COUNT", "attributeTypeKey": "row_count" },
        { "fieldKey": "BYTES", "attributeTypeKey": "bytes" }
      ]
    }
  ]
}

The fieldKey value must match exactly what the source system returns — column names from SQL connectors are typically uppercase (DESCRIPTION, ROW_COUNT), while API connectors may use camelCase (DatasetId, ConfiguredBy).

Tip

Use core_description_scanned as the attributeTypeKey for description fields across all connectors. This core attribute type has search features pre-configured.

Mapping cardinality

The number of field mappings varies by connector complexity:

Connector typeQueriesFields per queryExample
Simple JDBC94–8PostgreSQL
Cloud data warehouse226–15Snowflake
BI platform20+8–30+Power BI

Loader execution order

The backend processes data integration assets through 17 loaders in strict numeric order. Each loader depends on assets created by earlier loaders.

PositionLoaderAssetValidates
109DI_C_PROVIDER_CATEGORYProvider categoriesNothing
111DI_C_PROVIDERProvidersingestionTypeKey exists, iconKey exists (optional), key starts with core
113R_PROVIDER_CATEGORY_PROVIDERProvider ↔ category linkBoth provider and category exist
118C_DATA_SOURCE_TYPEData source definitionsprovider exists and has isInstalled: true
120DI_DATA_SOURCE_VERSIONVersionsParent data source type exists
122DI_MESSAGE_TYPEQueries (message types)abbreviation max 100 chars
124DI_MESSAGE_VERSIONMessage versions
126DI_R_DATA_SOURCE_VERSION_MESSAGE_VERSIONVersion ↔ query links
220R_APPLICATION_DATA_SOURCE_VERSIONData integrationsSee below

The most complex loader is at position 220. It validates the complete mapping chain:

  1. dataSourceDefinitionKey must reference an existing data source type
  2. versionKey must reference an existing data source version
  3. Each queryKey in mappings must reference an existing query (message type)
  4. Each objectTypeKey must reference an existing object type
  5. The object type must be assigned to the referenced application
  6. Each attributeTypeKey must reference an existing attribute type
  7. The attribute type must be assigned to the referenced object type

If any check fails, the loader collects all errors and throws a combined exception.

Configuration patterns

Pattern A: Simple JDBC connector

PostgreSQL, MySQL, Oracle, SQL Server. These connectors scan database metadata through SQL system catalogs.

  • Queries: 8–10 object queries (database, schema, table, table_column, view, view_column, function, procedure) + 1 relation query (dependency)
  • Version options: batchSize: 5000, ingestionFormat: "diff", convertBooleanToNumeric: true
  • Field mappings: 4–8 fields per object type (owner, description, OID, type)

Pattern B: Cloud data warehouse

Snowflake, Amazon Redshift, Google BigQuery, Teradata. These add platform-specific object types beyond standard SQL metadata.

  • Queries: 15–22 object queries (standard SQL types + dynamic tables, streams, stages, pipes, tasks, file formats)
  • Generic queries: "configure" and "prepare" run setup before scanning
  • Version options: Same as JDBC, sometimes with createChangeLogs: true
  • Field mappings: 6–15 fields per object type (more metadata from information schema)

Pattern C: BI platform

Power BI, Tableau, Qlik, Metabase, SSRS. These scan report catalogs and dataset metadata through REST APIs.

  • Queries: 15–20+ object queries (workspaces, datasets, tables, columns, reports, dashboards, dataflows)
  • Generic queries: "scan_result" for API response preprocessing
  • Version options: batchSize varies (1000–5000), ingestionFormat: "diff" or "full"
  • Field mappings: 8–30+ fields per object type (API returns rich metadata)

Gotchas

Warning

Provider isInstalled check is not enforced. The C_DATA_SOURCE_TYPE loader (position 118) validates that the referenced provider has isInstalled: true. If the check fails, the loader logs an error but does not throw an exception. The data source definition installs and appears in the UI, but the scan button does not work. Set isInstalled: true on every provider that has a deployed backend connector.

Danger

Provider key must start with “core”. The DI_C_PROVIDER loader (position 111) enforces that provider keys begin with the core prefix. A key like "snowflake" or "custom_snowflake" throws a PkgLoaderException and the entire package installation fails. Use "core_snowflake" or "core_my_connector".

Warning

Application-to-data-source mapping validates seven cross-references. The R_APPLICATION_DATA_SOURCE_VERSION loader (position 220) checks the complete chain: data source type, version, query, object type, application membership, attribute type, and attribute assignment. A missing objectTypeKey in the application’s objectTypeKeys[] array causes the mapping to fail. Verify that every mapped object type appears in the application definition.

Warning

Query abbreviation truncation. The DI_MESSAGE_TYPE loader validates that abbreviation does not exceed 100 characters. Longer values cause a validation error. Keep abbreviations short — they serve as display labels in the scan status UI.

Warning

externalDataSources is legacy. The schema defines externalDataSources with 8 properties including uuid, type, name, applicationKey, querySetKey, queryMappingKey, hierarchyDefinitionKey, and spacePath. No standard package uses this asset type. New connectors should use the dataSourceDefinitions + dataIntegrations chain instead.

Warning

Field key case sensitivity. The fieldKey in data integration mappings must match exactly what the source system returns. SQL connectors typically return uppercase column names (DESCRIPTION, ROW_COUNT). API connectors return the casing defined by the API (DatasetId, ConfiguredBy). A case mismatch causes the field to be silently ignored during scan.

Complete example

A minimal but functional connector package with all required assets:

{
  "$schema": "https://schema.dawiso.com/package-schema.json",
  "package": {
    "key": "core_my_connector",
    "name": "My Connector",
    "dependsOn": [{ "packageKey": "core" }],
    "assets": {
      "providers": [
        {
          "key": "core_my_connector",
          "name": "My Connector",
          "ingestionTypeKey": "message_type_json",
          "providerCategoryKey": "data_platform",
          "iconKey": "core_my_connector",
          "isInstalled": true,
          "connectionTemplate": {
            "$schema": "https://schema.dawiso.com/provider-schema.json",
            "providerName": "core_my_connector",
            "steps": [
              {
                "centerArea": {
                  "data": [
                    {
                      "type": "title",
                      "titleKey": "#di.provider.template.connection.core_my_connector.title",
                      "showAppIcon": true
                    },
                    {
                      "type": "section",
                      "data": [
                        {
                          "type": "input",
                          "key": "host",
                          "labelKey": "#di.provider.template.connection.host.label",
                          "required": true,
                          "encrypted": false,
                          "multiLine": 0,
                          "onEditDeletePassword": false
                        },
                        {
                          "type": "input",
                          "key": "password",
                          "labelKey": "#di.provider.template.connection.password.label",
                          "required": true,
                          "encrypted": true,
                          "multiLine": 0,
                          "onEditDeletePassword": true
                        }
                      ]
                    }
                  ]
                },
                "buttons": {
                  "test": {
                    "labelKey": "#di.provider.template.connection.test_connection"
                  }
                }
              }
            ]
          }
        }
      ],
      "dataSourceDefinitions": [
        {
          "key": "data_source_definition",
          "name": "My Connector Data Source",
          "provider": "core_my_connector",
          "queries": [
            { "key": "schema", "type": "object", "abbreviation": "schema" },
            { "key": "table", "type": "object", "abbreviation": "table" },
            { "key": "column", "type": "object", "abbreviation": "column" },
            { "key": "dependency", "type": "relation", "abbreviation": "dependency" }
          ],
          "versions": [
            {
              "key": "default",
              "name": "Version 1.0",
              "options": {
                "batchSize": 5000,
                "ingestionFormat": "diff",
                "convertBooleanToNumeric": true,
                "createChangeLogs": false,
                "createVersions": false
              },
              "queryDefinitions": [
                {
                  "queryKey": "schema",
                  "format": "json",
                  "order": 1,
                  "isExecutable": true,
                  "fields": [
                    { "key": "KEY", "isKey": true },
                    { "key": "NAME", "isName": true },
                    { "key": "DESCRIPTION" }
                  ]
                },
                {
                  "queryKey": "table",
                  "format": "json",
                  "order": 2,
                  "isExecutable": true,
                  "fields": [
                    { "key": "KEY", "isKey": true },
                    { "key": "NAME", "isName": true },
                    { "key": "PARENT_KEY", "isParentKey": true },
                    { "key": "DESCRIPTION" },
                    { "key": "ROW_COUNT" }
                  ]
                },
                {
                  "queryKey": "column",
                  "format": "json",
                  "order": 3,
                  "isExecutable": true,
                  "fields": [
                    { "key": "KEY", "isKey": true },
                    { "key": "NAME", "isName": true },
                    { "key": "PARENT_KEY", "isParentKey": true },
                    { "key": "DATA_TYPE" }
                  ]
                },
                {
                  "queryKey": "dependency",
                  "format": "json",
                  "order": 4,
                  "isExecutable": true,
                  "fields": [
                    { "key": "FROM_KEY", "isFromKey": true },
                    { "key": "TO_KEY", "isToKey": true },
                    { "key": "RELATION_TYPE", "isRelationTypeKey": true }
                  ]
                }
              ]
            }
          ]
        }
      ],
      "dataIntegrations": [
        {
          "applicationKey": "app",
          "dataSourceDefinitionKey": "data_source_definition",
          "versionKey": "default",
          "mappings": [
            {
              "queryKey": "schema",
              "objectTypeKey": "schema",
              "mapping": [
                { "fieldKey": "DESCRIPTION", "attributeTypeKey": "core_description_scanned" }
              ]
            },
            {
              "queryKey": "table",
              "objectTypeKey": "table",
              "mapping": [
                { "fieldKey": "DESCRIPTION", "attributeTypeKey": "core_description_scanned" },
                { "fieldKey": "ROW_COUNT", "attributeTypeKey": "row_count" }
              ]
            },
            {
              "queryKey": "column",
              "objectTypeKey": "column",
              "mapping": [
                { "fieldKey": "DATA_TYPE", "attributeTypeKey": "data_type" }
              ]
            }
          ]
        }
      ],
      "applications": [
        {
          "key": "app",
          "name": "My Connector",
          "state": "active",
          "colorKey": "core_my_connector",
          "iconKey": "core_my_connector",
          "objectTypeKeys": ["schema", "table", "column"],
          "hierarchyDefinitions": [
            { "hierarchyDefinitionKey": "default", "isDefault": true }
          ],
          "settings": [
            { "key": "read_only_hierarchy", "value": true }
          ]
        }
      ]
    }
  }
}

This example defines a connector with three object types (schema → table → column) and one relation query for dependencies. The data integration maps three object queries to their corresponding object types, with field-level attribute mappings. The application declares the object types and enables read-only hierarchy navigation.

To make this work in production, the package also needs: object type definitions with templates, attribute type definitions, hierarchy definitions, relation types for parent-child relationships, and a graph metamodel for lineage visualization. See Package Patterns for the complete Connector Entity pattern.