Azure Data Factory
Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation. Azure Data Factory is a managed cloud service that's built for these complex hybrid extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects. Azure Data Factory does not store any data itself.
It is the cloud-based ETL and data integration service that allows you to create data-driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data-driven workflows (called pipelines) that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight hadoop, azure data bricks, and azure SQL database.
The Data Factory service allows us to create pipelines which helps us to move and transform data and then run the pipelines on a specified schedule which can be daily, hourly or weekly. The data that is consumed and produced by workflows is time-sliced data, and we can specify the pipeline mode as scheduled or one time.
Azure Data Factory allows you to monitor and manage workflows using both programmatic and UI mechanisms. You need an azure subscription, in order to use azure data factory. If you don't have an azure subscription, create a free account.
Four Key components in Data Factory
A dataset is a named view of data that simply points or references the data you want to use in your activities as inputs and outputs. Datasets identify data within different data stores, such as tables, files, folders, and documents.
It represents data structures within the data stores. An input dataset represents the input for an activity in the pipeline. An output dataset represents the output for the activity.
For example, an Azure Blob dataset specifies the blob container and folder in Blob storage from which the activity should read the data.
Pipeline is a group of activities. They are used to group activities into a unit that together performs a task. A data factory may have one or more pipelines.
For example, a pipeline could contain a set of activities that ingest and clean log data, and then kick off a mapping data flow to analyze the log data.
The pipeline allows you to manage the activities as a set instead of each one individually. You deploy and schedule the pipeline instead of the activities independently.
Activities define the actions to perform on your data. Currently, Data Factory supports two types of activities: data movement and data transformation.
For example, you may use a copy activity to copy data from SQL Server to an Azure Blob Storage. Then, use a data flow activity or a Databricks Notebook activity to process and transform data from the blob storage to an Azure Synapse Analytics pool on top of which business intelligence reporting solutions are built.
Linked services define the information needed for Data Factory to connect to external resources. Before you create a dataset, you must create a linked service to link your data store to the data factory.
Linked services are much like connection strings, which define the connection information needed for Data Factory to connect to external resources.
Think of it this way; the dataset represents the structure of the data within the linked data stores, and the linked service defines the connection to the data source
Azure Data Factory
In this tutorial, you will learn what is Azure Data Factory and why do we need it?
After learning this Azure Data Factory tutorial, you will be able to use Data Factory for automating the movement and transformation of data by creating linked services, data sets, pipelines and scheduling those pipelines.
Step2c education Microsoft Azure data factory training gives learners the opportunity to get used to implementing Azure Data Solution. This training ensures that learners improve their skills on Microsoft Azure SQL, Azure Data Lake and Azure Data Factory. By the end of this course, learners will learn to design the Azure Data Solutions, Data Processing, and Data Security.
This course is made up of 125+ comprehensive lectures including an overview, demonstrations, and a conclusion.
If you want to learn and master Azure Data Factory from zero to hero, Join our azure data factory tutorial udemy.