Listening on multiple interfaces with Suricata

December 24th, 2010

A question I see quite often is, can I listen on multiple interfaces with a single Suricata instance? Until now the answer always was “no”. I’d suggest trying the “any”-pseudo interface (suricata -i any), with an bpf to limit the traffic or using multiple instances of Suricata. That last suggestion was especially painful, as one of the goals of Suricata is to allow a single process to process all packets using all available resources.

Last week I found some time to look at how hard adding support for acquiring packets from multiple interfaces would be. Turned out, not so hard! Due to Suricata’s highly modular threading design, it was actually quite easy. I decided to keep it simple, so if you want to add multiple interfaces to listen on, just add each separately on the command line, like so: suricata -i eth0 -i eth1 -i ppp0. This will create a so called “receive thread” for each of those interfaces.

I’ve added no internal limits, so in theory it should possible to add dozens. I just tested with 2 though, so be careful. Normally the thread name in logs and “top” for the pcap receive thread is “ReceivePcap”. This is still true if a single interface is passed to Suricata. In case more are passed to Suricata, thread names change to “RecvPcap-<int>”, e.g. RecvPcap-eth0 and RecvPcap-eth1. Untested, but it should work fine to monitor multiple interfaces from different types. Suricata sets the data link type in the interface-specific receive thread.

If you’re interested in trying out this new feature, there are a few limitations to consider. First, no Windows support yet. I hope this can be addressed later. Second, the case where two or more interfaces (partly) see the same traffic is untested. The problem here is that we’ll see identical packets going into the engine. This may (or may not, like I said, it’s untested) screw up the defrag, stream engines. Might cause duplicate alerts, etc. Addressing this is something that would probably require keeping a hash of packets so we can detect duplicates. This is probably quite computationally intensive, so it may not be worth it. I’m very much open to other solutions. Patches are even more welcome :)

So, for now use it only if interfaces see completely separate traffic. Unless you’re interested to see what happens if you ignore my warnings, in that case I’d like to know! The code is available right now in our current git master, and will be part of 1.1beta2.

Merry xmas everyone!

Suricata 1.1 beta 1 released

December 21st, 2010

Today we’ve released Suricata 1.1 beta 1, the first beta of the upcoming Suricata 1.1 release. The official release announcement is here on the OISF website.

The main focus of the new release has been to improve performance and to add support to the features the new ET/ETpro ruleset needs. ET and ETpro have rulesets specially tuned and geared for Suricata. We’re still missing some new rule keywords that are used by VRT, so in the 1.1 beta 2 release we’ll address that.

Other than that, I got quite a few patches waiting. We’ll be improving stream reassembly, inline mode, prelude output, and numerous other things.

Like always, please give this a try and let us know how it works for you!

Suricata development update

December 18th, 2010

The last months we’ve been working hard on improving Suricata. So hard actually, that we’ve drifted a bit from our original goal of doing a 1.0.3 “maintenance” release. Instead, the new release will be 1.1beta1. The change to 1.1 is to indicate the large number of changes, the beta1 is to … indicate the large number of changes :)

As you may know, Will Metcalf moved on to join Qualys. A significant loss to our project as Will was one of our founding members and is hard to replace in his role as QA lead. Not having a full time QA person on the team right now is a reason for us to decide we’re in need of a beta cycle for the next release.

So… what kind of improvements are we talking about?

  • Improved parsers, especially the DCERPC parser.
  • New keyword support: http_raw_header, http_stat_msg, http_stat_code.
  • Much improved fast_pattern support, including for http_uri, http_client_body, http_header, http_raw_header.
  • A new default pattern matcher, Aho-Corasick based, that uses much less memory.
  • Lots of small performance updates, including SSE3, SSE4.1 and SSE4.2 optimizations.
  • The signature bitmask prefiltering I wrote about before.
  • We support the reference.config supplied by ET(pro) and VRT now.

So… performance?!

Lots of mention of performance in this list. Did it improve? Yes! As some of you may have read, Npulse has demonstrated 10 Gbps IDS support for Suricata using Napatech (PDF) hardware support. This was on fast hardware, but nothing outrageous. To be honest, I didn’t expect to get there yet. But they did it. Based on a slightly modified Suricata 1.0.1 and about 7k signatures. Our own testing has shown that the code has improved quite a bit since then: ranging from 25% to 67% more packets per second throughput. Btw, native Napatech support is expected to go into our code base sometime in the next few weeks.

Whats left?

We have two major areas where we want more improvement. The first is the inline mode. Due to Suricata’s HTTP and other protocol parsers working statefully on top of the stream reassembly engine, currently all work is done on ack’d data. This means dropping attacks based on keywords such as http_uri is hard. We’re planning a number of changes to the stream engine to address this. More on that in a future post. The second area is the rule language. At this point we still miss a number of keywords to properly support mostly VRT signatures. Keywords like file_data.

Whats next?

The current git master is pretty much what Suricata 1.1beta1 is going to be. The actual release is planned for next week, probably Tuesday or Wednesday. If you can, help us out by trying it and report any issue to us!

Speeding up Suricata with tcmalloc

October 21st, 2010

‘tcmalloc’ is a library Google created as part of the google-perftools suite for speeding up memory handling in a threaded program. It’s very simple to use and does work fine with Suricata. Don’t expect magic from it, but it should give you a few percent more speed.

On Ubuntu, install the libtcmalloc-minimal0 package:

apt-get install libtcmalloc-minimal0

Then run Suricata as follows (on a single line):

LD_PRELOAD=”/usr/lib/libtcmalloc_minimal.so.0″ ./src/suricata -c suricata.yaml -i eth0

That is all there is to it. :)

Improving Suricata performance with bitmask based signature prefiltering

October 1st, 2010

The last weeks I’ve been spending quite a bit of time improving Suricata’s performance, making good progress. I did a lot of optimizations all over the code, but the most significant is a new way of prefiltering signatures for inspection. I’ll briefly explain the concept here.

But first a quick explanation of how Suricata selects signatures for inspection. When Suricata starts, it organizes signatures into groups, called SigGroupHead in the code. To reduce the number of signatures that need inspection for each packet, the grouping is done on quite a few properties: flow direction, protocol, src ip, dst ip, src port, dst port. Even though this grouping is quite aggressive, a single SigGroupHead can still contain many thousands of signatures. For example Emerging Threats web-client sigs will almost all end up in the same SigGroupHead.

To reduce the overhead of checking the signatures a more efficient prefiltering mechanism was added.

The bitmask prefilter

The basic concept is simple. Each signature creates a bitmask at engine initialization time, setting a bit for each “feature” it requires to match. Examples of such features are: needs payload, needs flowbit set, needs flow, needs http state.

Then at runtime, we create a mask for each packet. There we set flags for when the packet has a payload, has a flow associated with it, the flow has flowbits, etc. This operation is quite cheap as it needs to be done for each packet only once and requires only relatively simple checks.

The final step of this process is we compare the mask of each signature in a SigGroupHead against the mask of the packet.

if ((packetmask & sigmask) != sigmask)
skip_this_signature();

Using this filter, using flowbits becomes much more attractive. Most flows don’t have flowbits set, so this effectively excludes all signatures requiring flowbit from being checked almost all the time.

In the current git master (soon to become 1.0.3) this mask is only 8 bits wide of which only 5 are used. I’m experimenting with using more fine grained bitmasks.

SigGroupHead based masks

One idea I’m exploring currently is seeing if there is any use in additionally creating a single mask for a SigGroupHead. The idea here being that if many signatures in a group are alike, the SigGroupHead will have a strong mask and we can bypass all signature checking for a packet quite often. This would bypass pattern matching as well.

Preliminary results show that the idea works, but only for small & homogeneous rulesets. For a 38M pkt pcap, with just emerging-web.rules I see about 40% of the packets bypassing all signature checks. For emerging-all.rules it’s less that 1%, and for a larger ruleset (14k sigs) it’s 0%. So it may not be a viable optimization.

More conditions

I’m also experimenting with increasing the number of conditions. So far, I’ve defined about 20. This way all TCP signatures at least have some form of condition set. A single signature with mask 0 (no conditions set) kills the SigGroupHead based filtering, as it’s mask is determined by the lowest common denominator. So far I’m not seeing much if any gains from using more conditions.

Maybe the increased size of the mask to 32 bits undoes performance gains, or the added complexity of the mask creation at packet runtime is too expensive.

SIMD checks

On other thing I’m planning to explore is to see if SIMD can help speed up these bit checks. The SSE extensions should be able to do multiple checks at the same time. Here the mask size will become important as well. As SIMD currently works with 16 bytes at a time, for a 8 bit mask I could check 16 sigs at once, but for a 32 bit mask only 4 at once. I’m not sure it’ll be worth it though. CPU’s are quite good at doing bitwise operations, to SIMD instructions might not be faster at all.

The initial version of the bitmask based prefilter code is available now in the current git master. If you’re interested, please give a try and let me know how it works for you!

Suricata 1.0.2 released

September 2nd, 2010

After some well deserved vacation I’m getting back up to speed in Suricata development. Luckily most of our dev team continued to work in my absence, making today’s 1.0.2 release possible.

The main focus of this release was fixing the TCP stream engine. Judy Novak found a number of ways to evade detection. See her blog post describing the issues.

The biggest other change is the addition of a new application layer module. The SSH parser parses SSH sessions and stops detection/inspection of the stream after the encrypted part of the session has started. So this is mainly a module focused on reducing the number of packets that need inspection, just like the SSL and TLS modules.

As a bonus though, we introduced two rule keywords that match on the parsed SSH parameters:

ssh.protoversion will match against the ssh protocol version. I’ll give some examples.

ssh.protoversion:2.0

This will match on 2.0 exactly.

ssh.protoversion:2_compat

This will match on 2, but also 1.99 and other versions compatible to “2″.

ssh.protoversion:1.

The last example will match on all versions starting with “1.”, so 1.6, 1.7, etc.

ssh.softwareversion will match on the software version identifier. An example:

ssh.softwareversion:PuTTY

This will match only on session using the PuTTY SSH client.

Other changes include better HTTP accuracy, better IPS functionality.

For the next release we will focus on further improving overall detection accuracy, improving inline mode further, improving performance and specifically improving CUDA performance. As always, we welcome any feedback. Or if you are interested in helping out, please contact us!

Update: added a link to Judy Novak’s blog post on the TCP evasions.

Suricata 1.0.1 released

July 29th, 2010

After a 1.0 release that certainly didn’t go unnoticed, it’s now time for the first maintenance release. The main focus of this release was improving detection accuracy. A large number of false positives and false negatives were fixed. Read the full announcement here, the list of fixed issues here.

There are still a number of open issues with regard to accuracy. Those will be addressed in 1.0.2, scheduled for late August, early September. We’re working on improving CUDA, stream engine improvements and inline mode as well. Keep an eye on redmine for the open and fixed issues.

I’ll be taking some time off to recharge a bit, the last couple of months have been exhausting. Things are very exciting, so I can hardly wait to get back to improve our little Meerkat! Cheers! :)

On Suricata performance

July 22nd, 2010

Lots of fuzz in the media about Suricata’s performance versus Snort yesterday. Some claiming Suricata is much faster, others claiming Snort is much faster.

At this point I really don’t care much. What the Suricata development by the OISF has shown in my opinion is that we’ve managed to create a very promising new Open Source project out here. In little over a year, funded for about $600k by the US government and with heavy (and growing) industry support, we’ve produced a new IDS/IPS engine mostly compatible with Snort but build on a all new code base an incorporating some very interesting fresh ideas. We’re already seeing a community form around our project with a lot of support from that new community.

So about this performance fuzz. Who to believe? Is Suricata faster than Snort? Yes, no, ehhh, depends on how you look at it. Is Suricata faster than Snort on a single core cycle for cycle, tick for tick? No. It’s pretty clear we aren’t, I didn’t expect us to be either. But we scale. We’ve had reports of running on a 32 core box and scaling to use all cores. There Suricata is much faster. Like Martin Roesch wrote on the VRT blog one can set up Snort on a box to one have instance of Snort per core (or multiple per core). This is in fact the way many appliance builders get to high speeds with it. While this may be feasible for appliance builders, admins we talked to that run their own IDS/IPS think it’s a management nightmare.

As we’re a new project with a fresh codebase, there is going to be a lot of low hanging fruit in performance optimizations. I’ll give an example here. On a test pcap, with a reduced ruleset (about 10k rules), Suricata took about 400s to inspect. Then with a bigger ruleset (about 14k rules), it suddenly took 1600s! After a little bit of cache profiling it turned out that the part of the engine where the address part of a signature was inspected was horribly cache inefficient. In less than an afternoon I rewrote it to be more efficient. Result, the same test now completes in under 600s. This code is in the current git master and will be in 1.0.1.

My point here being that there will be lots of room for optimizations, and not just minor stuff. So far we’ve mostly focused on being accurate (we still have work to do here) and having the algorithms be correct. Hardly any tuning has been done. In our last OISF meeting we’ve gotten a few very interesting help offers for serious performance testing and tuning on some really big boxes, state of the art CUDA hardware, 10GBit labs, etc. So I expect a lot of progress in the months to follow.

It’s clear that we have work to do. What I’m really excited about is how fast that work is progressing, how much help we’re getting both from our brand new community and the industry, and the openness of our development process.

On a final note, during the development of this project we’ve found a lot of bugs and issues in other tools. Will Metcalf, who runs our QA, has been reporting many issues in Snort and VRT sigs to Sourcefire, in Emerging Threats sigs to the ET community. We’ve found bugs in other tools as well, for example in a neat library called libcap-ng. So everyone benefits from our work! :)

Suricata 1.0.0 released

July 1st, 2010

After many months of hard work by the development team of the OISF, we have just released the first stable release of Suricata: 1.0.0. I’m really proud we pulled it off to create this stable release and to do it on time.

I think it’s a good release too. Is it perfect? No, we have a list known issues that we will continue to work on. So expect a 1.0.1 and maybe more maintenance releases in the following weeks.

On July 16th we will be having a public meeting in San Francisco to discuss the next major development milestone. Everyone is welcome to join us there to bring in new ideas. If you can’t make it, no sweat, you can also send ideas to us privately or discuss them on our mailing lists.

Ohloh

June 30th, 2010

Ohloh is a pretty cool site for keeping track of projects and programmers. It’s an easy way to keep track of the development in a project and gives a nice indication of how actively it’s being developed. It has some social networkish features too, such as individual developers giving each other “kudos”.

The code analysis is pretty nice: it gives statistics on code base size, growth, comment ratio, languages used, etc. Per developer it tracks quite a few stats as well.

It also does a estimate of the cost of a project. For the Suricata project it currently estimates cost of 2.1 million USD. Actual cost are significantly less than that, less than half of that. So either we are severely underpaid or the calculation is off quite a bit :)

The per developer code statistics show that I’ve “touched” 131k lines of code out of 148k which confirms what I already knew: I need some vacation…

Anyway, check it out. Vuurmuur is on there, as are Snort and ModSecurity.

Oh by the way, Suricata 1.0 coming out tomorrow!