Why name resolution for ping may disagree with DNS

disagree


Most people know their hosts via DNS names (e.g. server1.lax.company.com) rather than IP addresses (192.168.3.45), and so enter them into their monitoring systems as DNS names.  Thus there is a strong requirement that name resolution works as expected, in order to make sure that the monitoring system is in fact monitoring what the user expects it to be.

Sometimes we get support requests about how the LogicMonitor collector is resolving a DNS name to an IP address incorrectly, but DNS is all set up as it should be, so something is wrong with the collector. However, the issue is simply in the interactions of how hosts resolve names, which is not always the same as how DNS resolves names.

The confusion lies in the fact that the tools people often use to validate their name resolution setup – host and nslookup – only use the name resolution system. They talk to the name servers listed in /etc/resolv.conf (or passed to them by their Active Directory configuration), and ask the name servers about what a particular host resolves as.
However, Windows and Linux do not just use the DNS system. They have other sources of resolving names – the /etc/hosts file on linux,WindowsSystem32driversetchosts on Windows, NIS, NetBIOS name resolution, caching systems like nscd – none of which are consulted by host or nslookup, but any of which may return conflicting information that the operating system may use.

As a simple example, you can see in the below that there is a local entry defining the address of foo.com to be 10.1.1.1:

 [[email protected]:~]$ cat /etc/hosts
 127.0.0.1 www.logicmonitor.com www.logicmonitor.com.localdomain www.logicmonitor.com4 www.logicmonitor.com4.localdomain4
 ::1 www.logicmonitor.com www.logicmonitor.com.localdomain www.logicmonitor.com6 www.logicmonitor.com6.localdomain6
 10.1.1.1 foo.com

While the ping program uses the locally configured address:

[[email protected]:~]$ ping foo.com
PING foo.com (10.1.1.1) 56(84) bytes of data.
^C
--- foo.com ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1725ms

The host and nslookup programs do not:

[[email protected]:~]$ host foo.com
 foo.com has address 23.21.224.150
 foo.com has address 23.21.179.138
 foo.com mail is handled by 1000 0.0.0.0.
 [[email protected]:~]$ nslookup foo.com
 Server: 216.52.126.1
 Address: 216.52.126.1#53
Non-authoritative answer:
 Name: foo.com
 Address: 23.21.224.150

A less obvious (but more common) case is when nscd (the name service cache daemon) is involved. Nscd may cache the results returned to the operating system, and return them as results for the same name lookup for a certain period of time.  Thus if a DNS record changes, and there is no entry in /etc/hosts, the operating system may continue to use a stale DNS record, as it has been cached by nscd. Nslookup will show the correct address, but the operating system (and hence the LogicMonitor collector) will not.

So the moral of the story? Know where the tool you are using is getting its information from. If it is nslookup or host, it is only querying the Domain Name system. The operating system (ping, telnet, etc) may well be using other sources of information.
[Mea Culpa: the debug tool for collectors that resolves hosts is called “!nslookup” – but it just passes the name resolution request through to the underlying operating system. You now know this is clearly a misnomer – it should really be !osnamelookup. We’ll fix that…]