Apache Airflow is an open-source workflow orchestration platform for programmatically authoring, scheduling, and monitoring data pipelines. It orchestrates data movement between databases, warehouses, BI tools, and other systems. The Dawiso integration connects to the Airflow REST API to discover DAGs, tasks, connections, variables, pools, and datasets — providing visibility into the pipeline landscape and cross-system lineage.

Supported versions

  • Apache Airflow 2.0 and later (REST API v1)
  • Apache Airflow 2.10+ and 3.x (REST API v2, JWT authentication)

The scanner auto-detects the API version by probing the v2 endpoint first and falling back to v1.

Default hierarchy

In the chart below, you can see how Apache Airflow objects are organized. Objects are in green and folders are in yellow.

graph LR
    Instance(Airflow Instance)
    DAGs(DAGs)
    DAG(DAG)
    Task(Task)
    Connections(Connections)
    Connection(Connection)
    Variables(Variables)
    Variable(Variable)
    Pools(Pools)
    Pool(Pool)
    Datasets(Datasets)
    Dataset(Dataset)

    Instance --> DAGs
    Instance --> Connections
    Instance --> Variables
    Instance --> Pools
    Instance --> Datasets

    DAGs --> DAG
    DAG --> Task
    Connections --> Connection
    Variables --> Variable
    Pools --> Pool
    Datasets --> Dataset

    classDef entity fill:#d4edda,stroke:#5cb85c,color:#000
    classDef group fill:#fff3cd,stroke:#f0ad4e,color:#000

    class Instance,DAG,Task,Connection,Variable,Pool,Dataset entity
    class DAGs,Connections,Variables,Pools,Datasets group

Object types

Dawiso integrates from Apache Airflow:

Airflow Instance

Maps to Airflow Instance object type. One instance is created per configured connection and represents the Airflow deployment itself.

Source AttributeTarget Attribute in Dawiso
hostnameName
hostnameHostname
airflow_versionAirflow Version

DAG

Maps to DAG object type. A Directed Acyclic Graph is the core workflow definition in Airflow — each DAG contains one or more tasks connected by dependencies.

Source AttributeTarget Attribute in Dawiso
dag_idDAG ID
dag_display_nameName
descriptionScanned Description
filelocFile Location
ownersOwners
tagsTags
schedule_intervalSchedule Interval
is_pausedIs Paused
is_activeIs Active
is_subdagIs Sub-DAG
max_active_runsMax Active Runs
next_dagrunNext DAG Run
last_parsed_timeLast Parsed Time

Task

Maps to Task object type. A task is an individual unit of work within a DAG, defined by an operator (PythonOperator, BashOperator, etc.).

Source AttributeTarget Attribute in Dawiso
task_idTask ID
task_display_nameName
operator_nameOperator Name
class_ref.class_nameOperator Class
poolPool
pool_slotsPool Slots
queueQueue
priority_weightPriority Weight
retriesRetries
trigger_ruleTrigger Rule
depends_on_pastDepends on Past
ownerOwner

Connection

Maps to Connection object type. Connections define how Airflow communicates with external systems (databases, cloud services, APIs).

Source AttributeTarget Attribute in Dawiso
connection_idConnection ID
connection_idName
conn_typeConnection Type
descriptionScanned Description
hostHost
portPort
schemaSchema
loginLogin
Info

Connection passwords and sensitive credential fields are never extracted by the scanner. The Airflow API redacts passwords by default, and additional extra fields containing password, secret, key, token, credential, or api_key are filtered out before ingestion.

Variable

Maps to Variable object type. Variables are key-value configuration entries used by DAGs to store environment-specific settings, feature flags, and shared configuration.

Source AttributeTarget Attribute in Dawiso
keyKey
keyName
descriptionScanned Description
Info

Variable values are intentionally excluded for security. Only the key and description are ingested.

Pool

Maps to Pool object type. Pools control concurrency by limiting how many tasks can run simultaneously.

Source AttributeTarget Attribute in Dawiso
namePool Name
nameName
descriptionScanned Description
slotsTotal Slots
occupied_slotsOccupied Slots
running_slotsRunning Slots
queued_slotsQueued Slots
open_slotsOpen Slots

Dataset

Maps to Dataset object type. Datasets (available in Airflow 2.4+, renamed to Assets in Airflow 3) are data-aware scheduling targets that enable event-driven DAG runs.

Source AttributeTarget Attribute in Dawiso
uriURI
uriName
extraExtra
created_atCreated At
updated_atUpdated At

Relations

The Apache Airflow integration extracts three categories of lineage edges, all rendered as Data Source relations in the Dawiso lineage diagram:

From Object TypeRelation TypeTo Object TypeDescription
TaskData SourceTaskUpstream task must complete before the downstream task runs (within a DAG)
TaskData SourceDatasetTask produces or updates this dataset (Airflow 2.4+)
DatasetData SourceDAGDAG is triggered when this dataset is updated (Airflow 2.4+)

Together, these provide the execution dependency graph within each DAG plus the dataset-driven data flow between DAGs.