SOFTWARE
DBridge
DB Bridge
An ETL batch engine for large-scale data migration between disparate DBMS.
Architecture
The Problem We Solve
Whether for one-off migrations or periodic loads, moving data between disparate DBMS easily bottlenecks on a single core — held back by single-connection round-trips, client-side memory limits, and long-running transactions. DBridge logically partitions tables into N pieces, stabilizes memory using vendor-specific Server-Side Cursors, and maximizes throughput up to the hardware limits through multi-processing.
Supported DBMS & Datastores (6 Types)
- PostgreSQL Source · TargetServer-Side Cursor (Default)
- Oracle Source · TargetServer-Side Cursor · CLOB Processing
- MySQL · MariaDB Source · TargetServer-Side Cursor
- MSSQL Source · TargetClient-Side (Partitioning recommended for large volumes)
- Memgraph Primarily TargetGraph DB — Node/Relationship load (batch_sep=G/R)
- MeiliSearch TargetSearch Indexing (Write-only)
4 Relational DBs + Graph (Memgraph) + Search (MeiliSearch). Source/Target adapters are implemented separately to match each vendor's characteristics — "we don't treat every database the same."
Components
- Multi-process Worker Pool Multiple worker pools. Dynamically allocated to Stage 0 background and main pools.
- Logical Partitioning RANGE / MOD / ROWID / HASH. Divides a single table into N segments for parallel processing.
- Stage Dependency Sequential between stages, parallel within stages. Dependent tasks are automatically canceled if precursors fail.
- Server-Side Cursor Memory stability + Network efficiency. Vendor-specific implementation for Oracle, PostgreSQL, and MySQL.
- Double Lock PID + DB double lock + Auto-cleanup of stale locks. Multi-instance safe.
- ErrorDebugger Automatically traces problematic rows using binary search on failure.
ETL/ELT Model
DBridge delegates Transform to the Target DBMS's SQL. Column mappings, filters, aggregations, and Joins are defined in the SELECT queries within the Meta Query Definitions, while the engine solely handles data type conversion, NULL normalization, and LOB processing.
This follows the ELT pattern, using raw SQL instead of visual mapper GUIs. It preserves the target DB optimizer's execution plan and pushes highly variable business rules down to the SQL layer, leaving the engine untouched. This keeps the "no project-specific hardcoding" principle intact at the core.
Measured Throughput
- Single Process (INSERT)
- ~10,000 rows/sec (Network bottleneck)
- 10 Parallel Processes (Split INSERT)
- ~80,000 rows/sec
- 100M row migration (10 Parallel)
- Approx. 16 min
- 100M row CSV Export (10 Parallel)
- Approx. 2 min (manual import not included)
Source — Internal measurement (Oracle 19c / PostgreSQL 17 single box environment, standard INSERT). Can vary depending on the target, indexes, and trigger configurations.
Operations / Reliability
- Daily folder logs (
app/log/YYYYMMDD/batch_main.log) - 1-minute progress interval — Success / Failure / Running / Queued counts
- Dependency failure propagation — Auto-cancels dependent tasks on precursor failure
- TRUNCATE → INSERT idempotent design (Safe to re-run)
- Auto-cleanup of Stale Locks (PID + DB double)
- Closed network assumption — No external network/CDN dependencies
Operations Console Example
An example (MOCKUP) of an operations UI layout that brings batch progress, failed rows, and throughput trends onto a single screen. We tailor it to fit how your project actually runs.
| Task ID | Status | Stage | Progress | Elapsed |
|---|---|---|---|---|
BT_USER_HIST_2024 | running | 2 / 4 | 6.4M / 12.0M | 04:21 |
BT_TXN_LOG_RANGE_03 | running | 3 / 4 | 1.2M / 3.0M | 01:08 |
BT_DOC_INDEX_MEILI | queued | 0 / 2 | — | — |
BT_GRAPH_REL_LOAD | success | 4 / 4 | 8.9M / 8.9M | 12:47 |
BT_LEGACY_HWPX_BLOB | failed | 2 / 3 | 0.8M / 4.5M | 03:12 |
Specifications
- Version
- v1.0.0
- License
- Private (Internal project)
- Runtime
- Multi-process ETL Engine · 6 DB Adapters
- Meta Storage
- PostgreSQL 18 (Meta tables for batch rules, execution history, etc.)
- Deployment
- Docker compose · Single box · Closed network assumption
- Entry Point
- CLI Batch (`app/main.py`) — External cron integration via exit code 0/1
- Ops UI
- Planned (Mockup — see Ops Console above)
- Last Updated
- 2026-04
Additional metrics (Memory footprint, CPU usage, concurrent batch limits, etc.) are specified above in Measured Throughput along with the measurement environment.
Security & Compliance
- License
- Private (README "Internal use only")
- Operating Premise
- Closed Network — No external network dependencies, single box deployment
- Data Safety
- Source DB is Read-Only, Target DB is INSERT-Only — Source modification prohibited
- Locking & Recovery
- PID + DB Double Lock, Auto-cleanup of Stale Locks, Multi-tier Timeout
- Failure Isolation
- Stage-level retry, failed rows loaded separately
- Tech Support
- A dedicated technical team is assigned to each project, providing incident monitoring and a support channel.
Getting Started
- Requirement Review — Source/Target DBMS types, table sizes, migration windows, network
- Mapping Definition — Register stages, tables, and partitioning policies in meta tables (Batch rules, Meta Query definitions)
- Rehearsal → Main Migration — Partial to Full, transitioning to production after verification
Considering Cubiware for your organization?
We will guide you through setup and rollout tailored to your requirements and operating environment. Reach out for a demo or a proposal.