AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |
Back to Blog
Amazon managed airflow8/15/2023 If you want to follow along, then you will need the following: With that all working, we will then switch to Apache Airflow and create our DAG that puts these pieces together and creates a workflow that will define and launch these containers to run our ETL scripts and upload our data in our data lake. ![]() Following that I will show how we can run this via Amazon ECS, and then walk through the process of deploying ECS Anywhere so we can run it on Amazon EC2 instances (Cloud) and on my humble (very old) Ubuntu 18.04 box (on-prem). First, creating our ETL container that we can use to extract our data. To make this blog post easier to follow, I will break it down into the different tasks. ![]() Finally, we will use ECS Anywhere, which uses the open source amazon-ecs-agent to simplify how we can run our containers anywhere - in AWS, on premises or on other Clouds. The container based images we launch will contain our ETL code, and this will be parameterised so that we can re-use this multiple times, changing the behaviour by providing parameters during launch (for example, different SQL queries). The approach I am going to take is to create an Apache Airflow workflow (DAG) and leverage an Apache Airflow operator, ECSOperator, which allows us to launch container based images. We will be used an Amazon S3 bucket for this purpose, as it is a common scenario. The goal of this demo/walk through is to orchestrate a workflow that will allow us to do batch extracts from these databases, based on some specific criteria (to simulate some regulator controls for example) and upload these into our data lake. One will be running on an Amazon RDS MySQL instance in AWS, and the other I have running on my local Ubuntu machine here at Home HQ. I am not going to cover setting that up, but I have included the data scripts in the repo, and there is a section at the end of this blog where I share my setup if you want to set this up yourself.The demo will have two MySQL databases running, with the same database schema but with different data. I used Mockaroo and found it super intuitive to generate sample data, and then use that to setup a sample customer database running on MySQL. In this demo I want to show you how you might approach orchestrating a data pipeline to build a centralised data lake across all your Cloud and non Cloud based data silos, that respect local processing and controls that you might have.Īs always, you can find the code for this walk through in this GitHub repo, blogpost-airflow-hybridįirst up we need to create our demo customer data. In this post I will show how you can address both of these use cases, combining open source technology and a number of AWS products and services that will enable you to orchestrate workflows across heterogeneous environments using Apache Airflow. where local regulation and compliance places additional controls as to where data can be processed.where you want to leverage and use existing legacy/heritage systems within your data analytics pipelines.So why would we want to do this? I can see a number of real world applications, but the two that stand out for me are: ![]() I wanted to explore how you can combine the two to see how you can start to build data pipelines that work across hybrid architectures seamlessly. I wanted to combine the learnings from that post (and the code) and apply it to another topic I have been diving deeper into, Apache Airflow. In 2021 my post talked about how you can build and deploy containerised applications, anywhere (AWS, your data centre, other Clouds) and on anything (Intel and Arm). ![]() In some recent discussions with customers, the topic of how open source is increasingly being used as a common mechanisms to help build re-usable solutions that can protect investments in engineering and development time, skills and that work across on premises and Cloud environment. Using Apache Airflow to orchestrate hybrid workflows
0 Comments
Read More
Leave a Reply. |