Five key factors stand out in providing simplification to the often complex process of orchestration of a data pipeline. Each has value in itself, but together they provide a framework around which to build operations ready data pipeline workflow applications.
Essential 1: Understand, Agree and Document Your Data Pipeline
The first may seem obvious but understand, agree and document the pipeline’s intended workflow, clearly indicating dependencies and decision tree branching. For example, if data ingestion fails then proceed down Path B otherwise proceed down Path A. Document and make this visible to all teams.
Essential 2: A Standard Interface
With several teams participating in the creation and running of the data pipeline workflow, it’s important that they have a standard interface to collaboratively define, orchestrate and merge their own element of the workflow with the main pipeline.
Essential 3: Standards, Standards, Standards
It’s critical that all of the teams adhere to agreed standards such as naming conventions for steps in the workflow. Chaos reigns when teams don’t stick to the agreed and documented standards. Making sure that you have documented meaningful descriptions for each step in the workflow allows for faster resolution when failure occurs.
Essential 4: The Pipeline Workflow
Use a single tool to interact with and visualise the data pipeline workflow and all its dependencies. If you can’t see it, you can’t manage it, fix it and change it. Visualisation is critically important when you are defining the workflow and doubly so when its running live.
Essential 5: Don’t Just Visualise
Visualisation is only half the battle; the data pipeline orchestration engine must have all the ancillary capabilities that make it work successfully in operation not just in the lab.
Built in error handling functions and impact analysis are mandatory. If an external file transfer is taking longer than usual, you must have the capability to spontaneously analyse and visualise the impact of this hold-up on the downstream steps in the data pipeline workflow; and any Service Level Agreements hanging off the end of the process.
Likewise, a job failure shouldn’t automatically stop the pipeline workflow in its tracks and require intervention by the support team. You should look to build in a certain level of resilience to allow the orchestration solution to restart a failed step if, for example, something like a network failure is the cause. Alternatively, documented human interaction may be required if a step fails more than a set number of times.
Mastering Data Pipeline Orchestration: Key Essentials and Expert Solutions from MDB
Clearly there’s a lot more to data pipeline orchestration, but these five essentials are worth remembering if you want to orchestrate and build an effective and operationally viable data pipeline workflow.
At MDB we are specialists in Workflow Orchestration, we sell Control-M on premise, Helix Control-M and our own Ortom8 SaaS based Managed Service.
For more information on how we can help you, feel free to contact us or visit www.mdbsc.co.uk
Comments are closed.