Install Data Collector on Azure HDInsight
You can install the full Data Collector on a Microsoft Azure HDInsight cluster.
Data Collector is installed as an RPM package on a Linux machine in the cluster. Data Collector is available as a service on the instance after the deployment is complete.
- Log in to the Microsoft Azure portal.
- In the Navigation panel, click Create a resource.
- Search the Marketplace for StreamSets Data Collector for HDInsight Cloud, and then click Create.
- On the HDInsight page, click Custom (size, settings, apps).
-
On the Basics page, enter a cluster name, choose a
cluster type, and enter a cluster login user name and password.
You can create the cluster in a new or existing resource group.
For example, the following configuration creates a cluster named sdctrial on a Hadoop 2.7 (HDI 3.6) cluster. The cluster is created in a new resource group named sdctrial:
- Click Next.
- On the Security + networking page, accept the defaults or configure the security options and then click Next.
- On the Storage page, configure the storage options and then click Next.
- On the Applications page, click StreamSets Data Collector for HDInsight.
- Review and accept the legal terms, click Create, and then click Next.
- On the Cluster size page, select a cluster size that meets the minimum Data Collector requirements, and then click Next.
- On the Script actions page, click Next.
-
Verify the details in the Summary page, and then click
Create.
It can take up to 20 minutes to deploy the cluster.
- After the cluster is successfully deployed, view the HDInsight cluster in the Azure portal, and then click Applications.
- Locate the StreamSets Data Collector for HDInsight Cloud application, and then click Portal in the URI column to access the Data Collector UI.
-
Use the following default credentials to log in:
admin
/admin
.For information on administering Data Collector, such as viewing logs and restarting Data Collector, see Administration.