Apache Airflow is an open-source workflow orchestration platform for programmatically authoring, scheduling, and monitoring data pipelines. It orchestrates data movement between databases, warehouses, BI tools, and other systems. The Dawiso integration connects to the Airflow REST API to discover DAGs, tasks, connections, variables, pools, and datasets — providing visibility into the pipeline landscape and cross-system lineage.
Supported versions
- Apache Airflow 2.0 and later (REST API v1)
- Apache Airflow 2.10+ and 3.x (REST API v2, JWT authentication)
The scanner auto-detects the API version by probing the v2 endpoint first and falling back to v1.
Default hierarchy
In the chart below, you can see how Apache Airflow objects are organized. Objects are in green and folders are in yellow.
graph LR
Instance(Airflow Instance)
DAGs(DAGs)
DAG(DAG)
Task(Task)
Connections(Connections)
Connection(Connection)
Variables(Variables)
Variable(Variable)
Pools(Pools)
Pool(Pool)
Datasets(Datasets)
Dataset(Dataset)
Instance --> DAGs
Instance --> Connections
Instance --> Variables
Instance --> Pools
Instance --> Datasets
DAGs --> DAG
DAG --> Task
Connections --> Connection
Variables --> Variable
Pools --> Pool
Datasets --> Dataset
classDef entity fill:#d4edda,stroke:#5cb85c,color:#000
classDef group fill:#fff3cd,stroke:#f0ad4e,color:#000
class Instance,DAG,Task,Connection,Variable,Pool,Dataset entity
class DAGs,Connections,Variables,Pools,Datasets group
Object types
Dawiso integrates from Apache Airflow:
Airflow Instance
Maps to Airflow Instance object type. One instance is created per configured connection and represents the Airflow deployment itself.
| Source Attribute | Target Attribute in Dawiso |
|---|---|
| hostname | Name |
| hostname | Hostname |
| airflow_version | Airflow Version |
DAG
Maps to DAG object type. A Directed Acyclic Graph is the core workflow definition in Airflow — each DAG contains one or more tasks connected by dependencies.
| Source Attribute | Target Attribute in Dawiso |
|---|---|
| dag_id | DAG ID |
| dag_display_name | Name |
| description | Scanned Description |
| fileloc | File Location |
| owners | Owners |
| tags | Tags |
| schedule_interval | Schedule Interval |
| is_paused | Is Paused |
| is_active | Is Active |
| is_subdag | Is Sub-DAG |
| max_active_runs | Max Active Runs |
| next_dagrun | Next DAG Run |
| last_parsed_time | Last Parsed Time |
Task
Maps to Task object type. A task is an individual unit of work within a DAG, defined by an operator (PythonOperator, BashOperator, etc.).
| Source Attribute | Target Attribute in Dawiso |
|---|---|
| task_id | Task ID |
| task_display_name | Name |
| operator_name | Operator Name |
| class_ref.class_name | Operator Class |
| pool | Pool |
| pool_slots | Pool Slots |
| queue | Queue |
| priority_weight | Priority Weight |
| retries | Retries |
| trigger_rule | Trigger Rule |
| depends_on_past | Depends on Past |
| owner | Owner |
Connection
Maps to Connection object type. Connections define how Airflow communicates with external systems (databases, cloud services, APIs).
| Source Attribute | Target Attribute in Dawiso |
|---|---|
| connection_id | Connection ID |
| connection_id | Name |
| conn_type | Connection Type |
| description | Scanned Description |
| host | Host |
| port | Port |
| schema | Schema |
| login | Login |
Connection passwords and sensitive credential fields are never extracted by the scanner. The Airflow API redacts passwords by default, and additional extra fields containing password, secret, key, token, credential, or api_key are filtered out before ingestion.
Variable
Maps to Variable object type. Variables are key-value configuration entries used by DAGs to store environment-specific settings, feature flags, and shared configuration.
| Source Attribute | Target Attribute in Dawiso |
|---|---|
| key | Key |
| key | Name |
| description | Scanned Description |
Variable values are intentionally excluded for security. Only the key and description are ingested.
Pool
Maps to Pool object type. Pools control concurrency by limiting how many tasks can run simultaneously.
| Source Attribute | Target Attribute in Dawiso |
|---|---|
| name | Pool Name |
| name | Name |
| description | Scanned Description |
| slots | Total Slots |
| occupied_slots | Occupied Slots |
| running_slots | Running Slots |
| queued_slots | Queued Slots |
| open_slots | Open Slots |
Dataset
Maps to Dataset object type. Datasets (available in Airflow 2.4+, renamed to Assets in Airflow 3) are data-aware scheduling targets that enable event-driven DAG runs.
| Source Attribute | Target Attribute in Dawiso |
|---|---|
| uri | URI |
| uri | Name |
| extra | Extra |
| created_at | Created At |
| updated_at | Updated At |
Relations
The Apache Airflow integration extracts three categories of lineage edges, all rendered as Data Source relations in the Dawiso lineage diagram:
| From Object Type | Relation Type | To Object Type | Description |
|---|---|---|---|
| Task | Data Source | Task | Upstream task must complete before the downstream task runs (within a DAG) |
| Task | Data Source | Dataset | Task produces or updates this dataset (Airflow 2.4+) |
| Dataset | Data Source | DAG | DAG is triggered when this dataset is updated (Airflow 2.4+) |
Together, these provide the execution dependency graph within each DAG plus the dataset-driven data flow between DAGs.