Install Data Collector on Azure
You can install the full Data Collector on Microsoft Azure.
Data Collector is installed as an RPM package on a Linux virtual machine hosted on Microsoft Azure. Data Collector is available as a service on the instance after the deployment is complete.
- Log in to the Microsoft Azure portal.
- In the Navigation panel, click Create a resource.
- Search the Marketplace for StreamSets Data Collector for Azure, and then click Create.
-
On the
page, enter the name of the new virtual machine, the user name to
log in to that virtual machine, and the authentication method to use for
logins.Important: Do not use sdc as the user name to log in to the virtual machine. The sdc user account must be reserved as the system user account that runs Data Collector as a service.
You can create the virtual machine in a new or existing resource group.
You can optionally change the virtual machine size, but the default size is sufficient in most cases. If you change the default, select a size that meets the minimum Data Collector requirements.
For example, the following configuration creates a virtual machine named sdctrial with a user named sdcuser who can log into the virtual machine using password authentication. The virtual machine is created in a new resource group named sdctrial:
- Click Next.
- On the Disks page under Advanced, verify that Use managed disks is enabled.
- On the Networking page, select an existing group or create a new network security group for the virtual machine.
- On the remaining pages, accept the defaults or configure the optional features.
-
Verify the details in the Review and Create page, and
then click Create.
It can take several minutes for the resource to deploy and for Data Collector to start as a service.
- On the Overview page for the deployment, click the name of the network security group.
-
In the Inbound security rules section for the security
group, click the name of each of the following rules and then configure the
range of IP addresses allowed for each port.
Important: By default, the rules give all IP addresses access to Data Collector. Be sure to modify the default values to restrict access to known IP addresses only.
Inbound Security Rule Description sdcport Range of IP addresses that can access the Data Collector web-based UI on port 18630. default-allow-ssh Range of IP addresses that can use SSH to access the Data Collector virtual machine on port 22 to run the Data Collector command line interface. -
To access the Data Collector UI, enter the following URL in the address bar of your browser:
http://<virtual machine IP address>:18630
-
Use the following default credentials to log in:
admin
/admin
.For information on administering Data Collector, such as viewing logs and restarting Data Collector, see Administration.