Webpage (HTTP/HTTPS) Data Collection

Last updated on 30 November, 2020

The webpage collector can be used to query data from any system via HTTP or HTTPS.

Note:  The webpage collector supports circular redirects, up to a maximum of 3 redirects.

Token substitutions can be used to make the collector generic (Typically, ##WILDVALUE## will be used in Active Discovery datasources, and be replaced with the port or ports that the webpage was discovered on; ##WILDVALUE## could also be used in the Request section to allow the same datasource to collect different pages. Of course, any property can be used at any place in the string, or a literal string can be used, too.)

Parameters

  • SSL: whether to use SSL for the request.
  • port: the TCP port to use for the HTTP or HTTPS traffic.
  • connect Timeout: time the collector will wait for a connection to be established
  • readTimeout: time the collector will wait for data to be returned in response to a request
  • request: the content to send in the HTTP or HTTPS request.  This must be a valid HTTP 1.0 or HTTP 1.1 request.

Use ‘\n’ to send a newline.
Do not include content-length.  This is automatically configured.

If sending an HTTP1.1 request, you must include the HOST header.  You can use a token for this header- i.e. ##HOSTNAME## to substitute the system the datasource is currently collecting against. e.g.:

Parameters

 If you have a multi-instance datasource with Active Discovery enabled, you may use the ##WILDVALUE## in the Request, instead of as the Port, e.g.:

Authentication

The webpage collector supports basic, MD5 and NTLM authentication to access protected web pages. Authentication is configured by defining properties on the host with the web site.  The properties http.userand http.pass are used to supply the username/password, and, for NTLM authentication, the property ntlm.domain is used to supply the NTLM domain to be used. No other configuration is needed – the page is first attempted to be accessed without authentication – if the server responds that authentication is required, the appropriate authentication method is used, with the relevant properties.

Note: NTLM authentication requires HTTP 1.1 requests, and that there is no Connection: close header.

It also requires that the collector be running with sufficient privileges – running as Local System is not sufficient.

Datapoints

  • Use Value of Status: If Status is selected for the Use Value Of for a datapoint, a post processor of ‘None’ should be selected. Then the datapoint will contain information about the http response:
    • 1 = HTTP returned code 200/OK.
    • 2 = HTTP returned code in 300 range
    • 3 = HTTP returned code in 400 range
    • 4 = HTTP returned code in 500 range
    • 5 = Network connection refused
    • 6 = invalid SSL certificate
  • Use Value of responseTime: a post processor of ‘None’ should be selected. The datapoint will contain the time, in milliseconds, for the http response.
  • Use Value of Output: this selection will interpret the output of the web page response. A post processor method appropriate to the page being interpreted should be selected. The most common post processors are:
  • NameValue: this will expect the Post-processor param column to contain a string to look foron the web page. The datapoint will be assigned the value found after an equall sign on the web page.
For example, if the web page returns content in this form:sales = 2501
queries = 12022

A datapoint with a post processor method of NameValue and a post processor parameter of “sales” would return the value 2501.

  • Regex: this will return the value returned by the backreference in the regular expression of the post processor parameters.
For example, if the web page returned:Apache status: accesses: 8988

A datapoint with a regex post processor, and post processor parameter of “.*accesses: (\d+)” would return 8988

  • textmatch: the text match post processor looks for the presence of the text specified in the post-processor parameter in the returned web page , and returns 1 if the text is present, and 0 if it is not.
In This Article