Posts Tagged ‘performance’

Suricata and PCRE performance

Wednesday, October 12th, 2011

Update: Will Metcalf pointed out I was missing the –enable-utf8 –enable-unicode-properties flags from PCRE, so added these & updated the numbers. Thanks Will.

In the Emerging Threats community the following if often heard: “PCRE is evil”. With this people refer to signatures that use “pure” PCRE matches, meaning without anchoring it to a content pattern match.

A while ago Will Metcalf initiated work to get Suricata to support a new PCRE feature by Herczeg Zoltán: SLJIT. Since then, support for this has found it’s way into the official PCRE release, currently at version 8.20-RC3.

I decided to run a quick benchmark to see how much difference there would be. The results are quite amazing. In my test I used an older Intel Core2 E6600 2.4Ghz on Ubuntu 10.10, a 416MB pcap full of badness (sandnet traffic) and a slightly older ruleset of 11.972 signatures.

The results:

suricata, OS default pcre (8.02)...................: 78s
suricata, pcre-8.20-RC3 (no jit), -O2..............: 80s
suricata, pcre-8.20-RC3 (no jit), -O3 -march=native: 72s
suricata, pcre-8.20-RC3 (jit), -O2.................: 53s

I played some more with GCC 4.6.1 and various optimization levels, but this was the best result so far. Quite surprising because in the past I saw some improvements from using the newer GCC over the OS default of 4.4.5.

Want to try the new PCRE without messing up your system?
./configure --prefix=/opt/pcre-8.20-RC3/ --enable-jit --enable-utf8 --enable-unicode-properties
make
sudo make install

Then recompile Suricata as well:
./configure --enable-pcre-jit --with-libpcre-libraries=/opt/pcre-8.20-RC3/lib/ --with-libpcre-includes=/opt/pcre-8.20-RC3/include/
make
sudo make install

You’ll need the Suricata code from git to take advantage of this.

Please give it a try, it’s free performance!

Suricata development update

Saturday, December 18th, 2010

The last months we’ve been working hard on improving Suricata. So hard actually, that we’ve drifted a bit from our original goal of doing a 1.0.3 “maintenance” release. Instead, the new release will be 1.1beta1. The change to 1.1 is to indicate the large number of changes, the beta1 is to … indicate the large number of changes :)

As you may know, Will Metcalf moved on to join Qualys. A significant loss to our project as Will was one of our founding members and is hard to replace in his role as QA lead. Not having a full time QA person on the team right now is a reason for us to decide we’re in need of a beta cycle for the next release.

So… what kind of improvements are we talking about?

  • Improved parsers, especially the DCERPC parser.
  • New keyword support: http_raw_header, http_stat_msg, http_stat_code.
  • Much improved fast_pattern support, including for http_uri, http_client_body, http_header, http_raw_header.
  • A new default pattern matcher, Aho-Corasick based, that uses much less memory.
  • Lots of small performance updates, including SSE3, SSE4.1 and SSE4.2 optimizations.
  • The signature bitmask prefiltering I wrote about before.
  • We support the reference.config supplied by ET(pro) and VRT now.

So… performance?!

Lots of mention of performance in this list. Did it improve? Yes! As some of you may have read, Npulse has demonstrated 10 Gbps IDS support for Suricata using Napatech (PDF) hardware support. This was on fast hardware, but nothing outrageous. To be honest, I didn’t expect to get there yet. But they did it. Based on a slightly modified Suricata 1.0.1 and about 7k signatures. Our own testing has shown that the code has improved quite a bit since then: ranging from 25% to 67% more packets per second throughput. Btw, native Napatech support is expected to go into our code base sometime in the next few weeks.

Whats left?

We have two major areas where we want more improvement. The first is the inline mode. Due to Suricata’s HTTP and other protocol parsers working statefully on top of the stream reassembly engine, currently all work is done on ack’d data. This means dropping attacks based on keywords such as http_uri is hard. We’re planning a number of changes to the stream engine to address this. More on that in a future post. The second area is the rule language. At this point we still miss a number of keywords to properly support mostly VRT signatures. Keywords like file_data.

Whats next?

The current git master is pretty much what Suricata 1.1beta1 is going to be. The actual release is planned for next week, probably Tuesday or Wednesday. If you can, help us out by trying it and report any issue to us!

Speeding up Suricata with tcmalloc

Thursday, October 21st, 2010

‘tcmalloc’ is a library Google created as part of the google-perftools suite for speeding up memory handling in a threaded program. It’s very simple to use and does work fine with Suricata. Don’t expect magic from it, but it should give you a few percent more speed.

On Ubuntu, install the libtcmalloc-minimal0 package:

apt-get install libtcmalloc-minimal0

Then run Suricata as follows (on a single line):

LD_PRELOAD=”/usr/lib/libtcmalloc_minimal.so.0″ ./src/suricata -c suricata.yaml -i eth0

That is all there is to it. :)

Improving Suricata performance with bitmask based signature prefiltering

Friday, October 1st, 2010

The last weeks I’ve been spending quite a bit of time improving Suricata’s performance, making good progress. I did a lot of optimizations all over the code, but the most significant is a new way of prefiltering signatures for inspection. I’ll briefly explain the concept here.

But first a quick explanation of how Suricata selects signatures for inspection. When Suricata starts, it organizes signatures into groups, called SigGroupHead in the code. To reduce the number of signatures that need inspection for each packet, the grouping is done on quite a few properties: flow direction, protocol, src ip, dst ip, src port, dst port. Even though this grouping is quite aggressive, a single SigGroupHead can still contain many thousands of signatures. For example Emerging Threats web-client sigs will almost all end up in the same SigGroupHead.

To reduce the overhead of checking the signatures a more efficient prefiltering mechanism was added.

The bitmask prefilter

The basic concept is simple. Each signature creates a bitmask at engine initialization time, setting a bit for each “feature” it requires to match. Examples of such features are: needs payload, needs flowbit set, needs flow, needs http state.

Then at runtime, we create a mask for each packet. There we set flags for when the packet has a payload, has a flow associated with it, the flow has flowbits, etc. This operation is quite cheap as it needs to be done for each packet only once and requires only relatively simple checks.

The final step of this process is we compare the mask of each signature in a SigGroupHead against the mask of the packet.

if ((packetmask & sigmask) != sigmask)
skip_this_signature();

Using this filter, using flowbits becomes much more attractive. Most flows don’t have flowbits set, so this effectively excludes all signatures requiring flowbit from being checked almost all the time.

In the current git master (soon to become 1.0.3) this mask is only 8 bits wide of which only 5 are used. I’m experimenting with using more fine grained bitmasks.

SigGroupHead based masks

One idea I’m exploring currently is seeing if there is any use in additionally creating a single mask for a SigGroupHead. The idea here being that if many signatures in a group are alike, the SigGroupHead will have a strong mask and we can bypass all signature checking for a packet quite often. This would bypass pattern matching as well.

Preliminary results show that the idea works, but only for small & homogeneous rulesets. For a 38M pkt pcap, with just emerging-web.rules I see about 40% of the packets bypassing all signature checks. For emerging-all.rules it’s less that 1%, and for a larger ruleset (14k sigs) it’s 0%. So it may not be a viable optimization.

More conditions

I’m also experimenting with increasing the number of conditions. So far, I’ve defined about 20. This way all TCP signatures at least have some form of condition set. A single signature with mask 0 (no conditions set) kills the SigGroupHead based filtering, as it’s mask is determined by the lowest common denominator. So far I’m not seeing much if any gains from using more conditions.

Maybe the increased size of the mask to 32 bits undoes performance gains, or the added complexity of the mask creation at packet runtime is too expensive.

SIMD checks

On other thing I’m planning to explore is to see if SIMD can help speed up these bit checks. The SSE extensions should be able to do multiple checks at the same time. Here the mask size will become important as well. As SIMD currently works with 16 bytes at a time, for a 8 bit mask I could check 16 sigs at once, but for a 32 bit mask only 4 at once. I’m not sure it’ll be worth it though. CPU’s are quite good at doing bitwise operations, to SIMD instructions might not be faster at all.

The initial version of the bitmask based prefilter code is available now in the current git master. If you’re interested, please give a try and let me know how it works for you!

On Suricata performance

Thursday, July 22nd, 2010

Lots of fuzz in the media about Suricata’s performance versus Snort yesterday. Some claiming Suricata is much faster, others claiming Snort is much faster.

At this point I really don’t care much. What the Suricata development by the OISF has shown in my opinion is that we’ve managed to create a very promising new Open Source project out here. In little over a year, funded for about $600k by the US government and with heavy (and growing) industry support, we’ve produced a new IDS/IPS engine mostly compatible with Snort but build on a all new code base an incorporating some very interesting fresh ideas. We’re already seeing a community form around our project with a lot of support from that new community.

So about this performance fuzz. Who to believe? Is Suricata faster than Snort? Yes, no, ehhh, depends on how you look at it. Is Suricata faster than Snort on a single core cycle for cycle, tick for tick? No. It’s pretty clear we aren’t, I didn’t expect us to be either. But we scale. We’ve had reports of running on a 32 core box and scaling to use all cores. There Suricata is much faster. Like Martin Roesch wrote on the VRT blog one can set up Snort on a box to one have instance of Snort per core (or multiple per core). This is in fact the way many appliance builders get to high speeds with it. While this may be feasible for appliance builders, admins we talked to that run their own IDS/IPS think it’s a management nightmare.

As we’re a new project with a fresh codebase, there is going to be a lot of low hanging fruit in performance optimizations. I’ll give an example here. On a test pcap, with a reduced ruleset (about 10k rules), Suricata took about 400s to inspect. Then with a bigger ruleset (about 14k rules), it suddenly took 1600s! After a little bit of cache profiling it turned out that the part of the engine where the address part of a signature was inspected was horribly cache inefficient. In less than an afternoon I rewrote it to be more efficient. Result, the same test now completes in under 600s. This code is in the current git master and will be in 1.0.1.

My point here being that there will be lots of room for optimizations, and not just minor stuff. So far we’ve mostly focused on being accurate (we still have work to do here) and having the algorithms be correct. Hardly any tuning has been done. In our last OISF meeting we’ve gotten a few very interesting help offers for serious performance testing and tuning on some really big boxes, state of the art CUDA hardware, 10GBit labs, etc. So I expect a lot of progress in the months to follow.

It’s clear that we have work to do. What I’m really excited about is how fast that work is progressing, how much help we’re getting both from our brand new community and the industry, and the openness of our development process.

On a final note, during the development of this project we’ve found a lot of bugs and issues in other tools. Will Metcalf, who runs our QA, has been reporting many issues in Snort and VRT sigs to Sourcefire, in Emerging Threats sigs to the ET community. We’ve found bugs in other tools as well, for example in a neat library called libcap-ng. So everyone benefits from our work! :)

Using Modsec2sguil for HTTP transaction logging

Wednesday, August 15th, 2007

Modsec2sguil is currently configured to send alerts to Sguil. ModSecurity can be configured to log any event or transaction, including 200 OK, 302 Redirect, etc. Modsec2sguil distinguishes between alerts and other events by only processing HTTP codes of 400 and higher. Since 0.8-dev2 there is a configuration directive to prevent certain codes, such as 404, from being treated as an alert.

Now I have the following idea. Since ModSecurity can log all events with details of request headers, response headers and POST message body, it may be interesting to just send all these events to Sguil. They should not be appearing as alerts, but having them in the database can perhaps be interesting. I know using flow data and full packet captures the same data can be accessed, but having it in the database makes querying it a lot easier and longer available.

Possible problems are mostly the performance hit the webserver may take for sending all these events to Sguil and the storage requirements in Sguil’s database. I estimate the events are about 1kb in size on average, so on a busy site this may cause the database to grow very rapidly. Of course this behavior would be optional so it can be disabled.

Any thoughts on this idea?