Install Data Collector on Azure HDInsight

You can install the full Data Collector on a Microsoft Azure HDInsight cluster.

Data Collector is installed as an RPM package on a Linux machine in the cluster. Data Collector is available as a service on the instance after the deployment is complete.

  1. Log in to the Microsoft Azure portal.
  2. In the Navigation panel, click Create a resource.
  3. Search the Marketplace for StreamSets Data Collector for HDInsight Cloud, and then click Create.
  4. On the HDInsight page, click Custom (size, settings, apps).
  5. On the Basics page, enter a cluster name, choose a cluster type, and enter a cluster login user name and password.

    You can create the cluster in a new or existing resource group.

    For example, the following configuration creates a cluster named sdctrial on a Hadoop 2.7 (HDI 3.6) cluster. The cluster is created in a new resource group named sdctrial:

  6. Click Next.
  7. On the Security + networking page, accept the defaults or configure the security options and then click Next.
  8. On the Storage page, configure the storage options and then click Next.
  9. On the Applications page, click StreamSets Data Collector for HDInsight.
  10. Review and accept the legal terms, click Create, and then click Next.
  11. On the Cluster size page, select a cluster size that meets the minimum Data Collector requirements, and then click Next.
  12. On the Script actions page, click Next.
  13. Verify the details in the Summary page, and then click Create.
    It can take up to 20 minutes to deploy the cluster.
  14. After the cluster is successfully deployed, view the HDInsight cluster in the Azure portal, and then click Applications.
  15. Locate the StreamSets Data Collector for HDInsight Cloud application, and then click Portal in the URI column to access the Data Collector UI.
  16. Use the following default credentials to log in: admin / admin.

    For information on administering Data Collector, such as viewing logs and restarting Data Collector, see Administration.

    Tip: If you are new to Data Collector, consider starting with the Tutorial.