Sunday, June 22, 2014

Enterprise log managers: An unusual but vital tool

Ultimately, the goal of enterprise log management (ELM) is to get your most critical events escalated to your operations staff to react and respond with the appropriate actions. In today’s enterprise, you would be culling through millions of events if you were not relying on ELM to correlate that information and point to what is most critical.
You may be asking, “Isn’t this security information and event management (SIEM)?” It’s not. Well, not entirely. ELM and SIEM are interrelated. SIEM is more concerned with the larger view of your overall security landscape, whereas ELM is focused on a specific element of security: “What is happening where?”
SIEM correlates data across varying data sources and environments for a more holistic view. ELM is a subset and critical component of a SIEM system. Not all companies require a SIEM system. However, most companies would benefit from an ELM solution. For the purposes of this article, we’ll stick to ELM. For more information on SIEM, I encourage you to download ISACA’s free SIEM white paper.

 Corporate policies are put forth, as are the related controls, in an effort to deter or prevent undesirable activities. Translating the corporate policies into the solution and configuring the relationship between the policy, the controls and the data feeds from systems and applications that need to be monitored are foundational steps to build an ELM. 

A measure of the quality of an ELM technology is how easy it is to interface with your critical systems. “How many different components does it understand?” so to speak. “How much technical expertise is required in order to make it deliver value?”

Use cases and setup

Privileged access monitoring is a classic example in which ELM gathers logs from various systems and creates a direct workflow to the operations staff, enabling them to take an action against items considered inappropriate.
For example, a domain admin logged in after an allowed change window and failed to authenticate several times in a row – an example of a potential brute force attack. The system must correlate those events and initiate the appropriate workflow, whatever that may be.
The processes established around the solution are just as important. The log management solution is only as good as the processes and teams that support it. Typically, this requires an engineering staff and an operations staff. The engineers build and configure the ELM so the right alerts are coming through.
The operations staff is then able to take the alerts and, ideally, do the “right thing”. Of course, the less mature your existing processes and workflows, the more iterations will be required. The events you consider taggable – the events you are interested in – must tie back to corporate policy. The basic premise that “thou shalt not access that which you are not allowed to access” will guide the rules you develop.
Activity will fall into one of three categories: transactions you don’t care about; transactions you want to know about; and transactions you want to take immediate action on.
For example, you might have miskeyed your password while attempting to log in. That type of transaction is not necessarily one to be concerned about.
However, if there are a thousand more attempts in the next 60 seconds, you should know something is suspicious. This example is likely to be a hacker trying to gain brute-force access to your valuable data. Flag it and determine what part of the organisation should receive the system workflow.
ELM can provide value through non-security use cases as well. There could be transactional activity that indicates a problem, such as multiple acknowledgement requests being generated as a result of a system glitch.
The sheer volume could saturate the network, acting as a denial of service attack. The ELM could flag this type of activity when it occurs, so that remediations can begin to happen in a preventive manner, potentially averting an outage of a critical service.
A virus on the network provides an opportunity for a good ELM to demonstrate intelligence. As the tool logs virus-induced events and correlates them together as a single outbreak, operations will be able to target the affected population proactively.
This approach, as is usually the case, can save hundreds or thousands of hours by solving the problem instead of addressing each incident reactively. Obviously, this becomes a compelling value statement as ITIL has put forth for decades: the presence of multiple incidents occurring for similar reasons typically represent a problem that needs a solution.

Signs your company needs help 

Finally, while many of us assume what we're performing is "good enough" when it comes to streamlining log management, below are a few telltale signs that an organization's security log management process is in trouble. If more than one of these apply to your organization and the issues can't be rectified, it's time to call for outside help.
  • There is limited or no way of automating alerts of logs within the company.
    •  If there is no way to alert on particular log events, the company won't know when it's had an incident.
  • Administrators don't understand what is being logged.
    • Not knowing what logs are coming from the systems is an issue.
    • Not knowing what level of auditing is enabled on the system is also a problem. This might lead to a false sense of security.
    • Not being able to log custom applications. Many systems and applications have custom logs that need to be parsed and stored. Verify that this is possible and happening.
    • Forgetting to log third-party, cloud, mobile and virtualization systems.
  • Performance issues are observed.
    • Slow database that doesn't allow for flexible reporting or searching.
    • No drill-down capabilities when searching for logs; speed is an issue.
    • Use of out-of-date equipment or single points of failure in hardware that would allow logs to be lost.
    • Not calculating the proper events per second (EPS) and losing logs due to saturation.
  • Correlating logs for search purposes is an issue.
    • Without correlation, log managers will try to create a handful of alerts or reports at best. Without this functionality, they can't see deeper into logs to search for security incidents.
  • There's no process in place for monitoring and analyzing logs.
    • No process for adding new systems to the log manager.
    • Unaware of what logs are missing or which systems are not sending logs.
    • No audits of systems to verify that logs are being collected from all systems.

    Requisite Skills

    The primary skill associated with successfully deploying an ELM is being able to translate business use cases into the ELM tool’s language.
    If your environment deals with personally identifiable information, for example, privacy concerns are going to be one of the highest priorities. An understanding must exist of the systems generating the data and how those data relate to the company’s use cases.
    For example, we don’t want people logging on as a local administrator in an Active Directory domain environment; therefore, the ELM would need to alert on the appropriate event ID.
    As IT professionals, we know there will always be a technology that is not commonly known and will require additional work to develop the proper interface. The resources you assign as your solution delivery leads or engineers for an ELM deployment must understand how to translate your business logic into the technical speak of your IT landscape.

    Challenges

    Scalability is the first challenge and biggest concern in architecting the solution. Most likely there will be significant amounts of data logged. Data retention policies and growth must also be considered.
    Depending on your use cases, large portions of data may need to be held for very long periods of time. Therefore, consideration should be given to balance your company’s tolerance for risk with their taste for capital investment.
    ELM systems typically work one of two ways: data intensive, which gathers all data to be analysed later and thus need to scale accordingly; and limited collection, which has agents gather only the information considered “interesting.”
    In the case of the former, storage will be a greater concern; for the latter, processing capabilities will need to be stronger to reduce the chances of introducing latency into transaction processing time.
    Many ELM solutions do not use a communications protocol that provides delivery guarantee, and instead use protocols, such as UDP, which can result in some of the data getting lost. Technology and process verifications could be additional requirements to be factored into the design.
    Of course, well-defined expectations will determine the perceived success of the implementation. Implementing such a solution in a company that has limited policies and procedures will have little success, as there will be few rules to correlate the activity against.
    Define your solution delivery success criteria early and make sure what you choose is measurable. Consider using a governance and management framework such as COBIT 5 to guide the initiative.

    Conclusion

    Some ELMs come with standard rule sets that can accelerate implementation. Recognising efforts to refine rule sets to reflect your organisation’s corporate policies will drive the migration from focused manual intervention to true problem management.
    In this manner, not only will ELM implementers see a reduction in time spent resolving incidents, but their responsiveness will be seen as more proactive than reactive. As a result, these shops should see a reduction in incident management costs.
    And of course, when implemented correctly, security issues will reduce overall and compliance abilities will improve.


 

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.