I am interested in whether someone has found a robust solution for creating fault-tolerant log collection in their NetWitness Logs architecture. What I usually see are recommendations to configure a VLC to fail over to a second Log Decoder (Local Log Collector) in case of a failure, but this does not solve the issue that whenever I have problem with the VLC itself or when I want to upgrade the VLC, there will be nothing accepting the incoming logs.
We have tried to circumvent this by using an F5 load balancer in front of the VLCs, but if and when we would prefer to use TCP for Syslog forwarding where possible, we would lose the actual device.ip, which gets replaced by that of the F5 SNAT IP. As you might image, losing the real device.ip will then lead to all sorts of problems with ESM etcetera.
Has anyone found a decent solution (besides using UDP and an external load balancer) for this problem?
I've tested using an F5 VIP for UDP syslog and as you mentioned it works great, however for TCP we have that SNAT problem.
Instead we're looking at creating a round-robin Infoblox record to keep all the destination collector IP's in one A record and then just cycle through them as requested. A few problems and benefits of this approach.
The source IP is always maintained for the log source regardless of protocol.
A single destination FQDN for all configurations regardless of source.
Load balancing is achieved, albeit not in the most elegant way as we would have via LTM VIP.
No additional infrastructure we're dependent on that may have problems handling our throughput.
If we want to remove an IP we can update the A record, although it'll take time to replicate but we'll keep the TTL of that record lower than usual to help with replication across the environment.
No 'health' monitoring so to speak which a good failover design has, so if a collector goes down or has problems sources will still attempt to send to it, unless Infoblox has a solution to that as well.
Keep in mind I haven't tested the round-robin DNS record method yet, it just will provide enough benefit with little impact that we're going to explore it some more.
This could work and would be a no/low cost solution, however, what measures are in place to ensure the hosts in that 'DNS pool" are available? What happens (how does InfoBlox handle) when one of the pool members is down?