Archive for the ‘TCP’ Category

Improving Snort_inline’s NFQ performance

Wednesday, January 23rd, 2008

When using Snort_inline with NFQ support, it’s likely that at some point you’ve seen messages like these on the console: packet recv contents failure: No buffer space available. When the messages are appearing Snort_inline slows down significantly. I’ve been trying to find out why.

There are a number of setting that influence NFQ performance. One of them is the NFQ queue maximum length. This is a value in packets. Snort_inline takes an argument to modify the buffer length: –queue-maxlen 5000 (note: there are two dashes before queue-maxlen).

That’s not enough though. The following settings increase the buffer that NFQ seems to use for it’s queue. Since I’ve set it this high, I haven’t been able to get a single read error anymore:

sysctl -w net.core.rmem_default=’8388608′
sysctl -w net.core.wmem_default=’8388608′

The values are in bytes. The following values increase buffers for tcp traffic.

sysctl -w net.ipv4.tcp_wmem=’1048576 4194304 16777216′
sysctl -w net.ipv4.tcp_rmem=’1048576 4194304 16777216′

For more details see this page: http://www-didc.lbl.gov/TCP-tuning/linux.html

Setting these values fixed all my NFQ related slowdowns. The values probably work for ip_queue as well. If you use other values, please put them in a comment below.

Thanks to Dave Remien for helping me track this down!

New Snort_inline TCP window normalization code in SVN

Saturday, November 17th, 2007

A while ago I wrote about why the TCP window scaling normalization in Snort_inline was broken by design. I also wrote about a new solution I was working on and testing that would be uploaded to SVN soon. I just committed the patch to SVN. What it does is add two new options to stream4:

norm_window: normalize the TCP window (disabled by default). This is to protect Snort_inline from being forced to queue too many packets.
max_win_size: maximum size of the scaled TCP window. Packets increasing the window beyond the limit are modified.

This option is disabled by default, and the old wscale normalization code is removed, as are the options that configured it. It runs fine on my gateway, without noticeable slowdowns, but I haven’t done any benchmarking so far. Please try this and let me know how it works for you!

Window scaling normalization in Snort_inline broken by design

Tuesday, September 4th, 2007

After debugging some connection problems I found that the wscale normalization concept is flawed. I’ll describe here what is wrong with it and then move on to suggest a different solution I’m currently testing. The problem I was seeing is that some connections to some webservers stalled without an apparent reason.

First a quick reminder of why I originally came up with the wscale normalization. Stream4 originally doesn’t look at the window scaling value when determining the TCP window. This causes it to be wrong about the TCP window in about every connection, which is one of the reasons out of window packets are not dropped (this is actually a gaping evasion hole since these packets are not used in stream reassembly). This is why I decided to add window scaling support to the stream4inline extension. This works great and allows the admin to drop out of window packets. There is a problem associated with it though. The maximal window that is possible with wscaling is 1GB. This would mean that Snort_inline would in the worst case have to queue almost 1GB of data in it’s buffers for a single stream. To prevent this being used by an attacker to attack Snort_inline, I wanted give the admin the option to set a maximal wscale size.

So, why doesn’t replacing the wscale value in packets work? I’ll explain that now. First an example without normalization. Say we have client connecting to a server. The client sends in it’s SYN packet a window of 5840 and a wscale of 5. The server replies with a SYN/ACK with window 5792, wscale 9. Both have a unscaled window in their packet since the wscale won’t be used before both sides have received a packet with the wscale option enabled. The client sends an ACK completing the three way handshake, with a window of 183. That means a scaled window of 5856 (183 x 2^5). The client will now send an actual data packet, using the same window. The server ACK’s the data with a packet with a window of 16, meaning a scaled window of 8192 (16 x 2^9).

Now, what happens when we normalize? Consider the same connection, but now Snort_inline normalizes all wscale values above 2, to 2. The client sends in it’s SYN packet: window of 5840, wscale of 5, but due to the normalization, the server receives it as window of 5840, wscale of 2. The server replies with a SYN/ACK with window 5792, wscale 9, but the client receives it as window 5792, wscale 2. The problem here is that neither the client or the server know that it’s wscale value was been modified. Nor is there a way to make it known. So what then happens is this. When the server wants to say it has a window of 8192, it will send a packet with the window field set to 16 (16 * 2^9 = 8192). But, due to the normalization, it actually says it has a window of 64 (16 * 2^2). Likewise, when the client wants to tell the server it has a window of 5856 (window field set to 183), it actually says it has a window of 732 (183 * 2^2). This completely stalls connections. So why did I only see this on some rare connections? That is because most servers on the internet use low wscale values. The server I ran into issues with however, used a value of 9.

The solution I am now testing is normalizing the scaled window. With this idea Snort_inline takes the full scaled window into account and compares it with a maximum value. If it exceeds it, the window value in the packet is modified taking the wscale value into account. I’ve been running like this for about 2 weeks now, and so far I have seen no stalling connections anymore. There is however quite a drawback to this approach. The window size is a constantly changing value that is adapted in almost every TCP packet. Unlike the wscale normalization, that could be done by modifying the SYN and SYN/ACK packets, the new approach in the worst case has to modify and replace almost every single packet in a stream. This will take more resources from Snort_inline.

I’m interested in hearing other possible solutions to this problem or other drawbacks of my new solution. I will be checking my new solution into SVN soon. I will make sure it is disabled by default. To work around the broken wscale normalization just set it to it’s maximum value, so add ‘norm_wscale_max 14′ to your stream4 configuration line.

Snort_inline and out of order packets

Monday, July 30th, 2007

In Snort_inline’s stream4 modifications, one of the changes is that out of order TCP packets are treated differently from unmodified stream4. This can cause some new alerts to appear and some unexpected behaviour. So I’ll try to explain what happens here.

First of all let me explain quickly what out of order packets are. To put it simple, TCP packets are send out by the source host in a specific order but can arrive in a different order at the destination. Packetloss, link saturation, routing issues are among many things that can cause this. A Snort_inline specific issue is that when Snort_inline can’t keep up with the packets it needs to process, it will drop packets which causes packetloss. These packets will then have to be resent by the sending host.

Out of order packets become a problem when dealing with stream reassembly. Stream reassembly basically is putting all data from the packets in the right order to get the original data as it was sent. We can’t do stream reassembly if we don’t have all packets. Unmodified stream4 basically ignores gaps in the stream. Designed for passive listening for traffic, it has to deal with packetloss differently than Snort_inline.

Next, some definitions of this functionality in Snort_inline. Out-of-order packets: The number of packets that we have in queue that are out of order for a stream. This means they have a higher sequence number than the next in-sequence packet we are expecting. Out-of-order bytes: The number of bytes of the combined data of the out-of-order packets in the stream. Sequence number hole: A gap between two packets, that can be closed by one or more missing packets.

To prevent Snort_inline from using to much memory on bad connections or when an attacker sends lots of out of order packets, Snort_inline can enforce limits to protect itself. Snort_inline can even force a stream to be completely in-order by dropping all packets that are out of order. Sadly, this has a bad effect on the performance of the connections, so you can set certain limits that balance between performance and protection.

When Snort_inline hits these limits, it will (optionally) fire alerts that look like this:

(spp_stream4) TCP out-of-order packets limit reached for stream
(spp_stream4) TCP out-of-order bytes limit reached for stream
(spp_stream4) TCP sequence number holes limit reached for stream

You can disable the alerts by adding the following option to the preprocessor stream4 line: disable_ooo_alerts. The limits themselves can be adjusted by using the following options: max_seq_holes 2, max_ooo_pkts 25, max_ooo_bytes 7000. These are the values I currently use on my home gateway. I got the idea of implementing these limits from this paper by Vern Paxson. However, it seems to me that his suggestion that at max one sequence hole per stream (even per host) was a bit optimistic. Maybe DSL has more packetloss than the university links he studied.

By default Snort_inline uses the settings that were chosen a bit randomly, so they may not fit your usage. Like with the wscaling, please let me know in a comment what values you use!

TCP Window scaling in Snort_inline

Saturday, June 16th, 2007

The TCP window field in the TCP header is only 16 bits, so the maximum window size it can handle is only 64kb. A long time ago this was enough, but nowadays it isn’t, by far. Luckily, this is something the window scaling option fixes. Window scaling is very common these days. Your pc or laptop probably uses it by default. Snort’s stream4 however, does not support it. This means that when tracking and reassembling streams, Snort for most connections has no idea about what data is in window and which is out of window. To make matters worse, the packets that are in window when using wscaling, but appear out of window when the wscaling is not accounted for, are never used in the reassembly process. This makes Snort evadable.

One of the goals when creating the stream4inline modifications, was to be able to drop on all TCP anomalies stream4 detects. For this support for window scaling was added to Stream4, so Snort_inline would be able to drop out of window packets. There is however a big problem with window scaling. With window scaling the TCP window possibly increases to a maximum of 1GB (with the maximum wscale value of 14). Stream4 would thus theoretically have to queue up to 1GB of packet data, per stream. While this is something that is unlikely to happen during normal connections, it is possible. This could then be used by an attacker to attack Snort_inline itself.

To prevent this, I added an option to stream4inline that allows the administrator to set a maximum allowable wscale setting. Any higher setting will be normalized away. In these cases the packet is modified and the wscale lowered to the maximum that is allowed. The hosts talking to each other then think the other accepts only the lower wscale and accepts that setting. This can however have some unexpected consequences. If the link that Snort_inline deals with is high speed, high latency or both, setting the wscale value to low can result in serious performance degradation. Connections that are (way) slower than usual is how this issue shows. In these cases the wscale value needs to be increased.

The default value of Snort_inline 2.6.1.5 is a wscale of 2, which is quite low but works fine on my home DSL connection. To change the setting add ‘norm_wscale_max 5′ to your stream4 configuration line. This will allow for a wscale of up to 5. The maximum value is 14. I’d be interested in what values people use on what types and speeds of lines, so please let me know! We can use it to suggest values in the docs or to set a less insane default value :)

Memory leak fixed in stream4inline

Tuesday, May 22nd, 2007

A few days ago William told me that if he enabled stream4inline on a busy gateway, Snort_inline would consume all memory within hours. The problem went away when disabling stream4inline, so it made sense that the problem would be in there somewhere.

The first suspect was the reassembly cache. The reassembly cache is used to keep a per stream copy of the reassembled packet in memory. While being memory expensive, it greatly speeds up the sliding window stream reassembly process, especially with small packets. The reason for this being the first and primary suspect is that this is the only place where stream4inline code allocates memory. Reviewing the code however, showed no leaks and adding a debug counter to monitor the memory usage also showed that the leak was not in that code.

Next my investigation focused on parts where stream4 behaves differently in stream4inline mode. I initially focused on what happened when stream4 hit it’s memory limit: the memcap. When the configurable memcap is reached, stream4 nukes 5 random sessions. In stream4inline the option to truncate 15 of the sessions was added, where an attempt is made to clear the memory by removing stored packets no longer needed from a stream. If that fails, 5 random sessions are nuked anyway.

Reviewing the truncating of the sessions didn’t show anything obvious to me so I went on to the killing of the sessions. Descending down the code I finally reached the DropSession function, where the memory cleanup for a session is handled. Here it turned out that the DeleteSpd function, used to clear the stored packets in a stream, was not called in stream4inline mode. The reason for this mistake is that with Snort 2.6.1 support for UDP was added to stream4. The merge with the Snort_inline code went wrong because of extra checks added to the DropSession function.

The stupid thing is that when I did the merge, I was already in doubt about it as a comment showed:

/* XXX did I merge this right??? VJ */

Guess I know the answer now: No ;-)

Snort_inline and TCP Segmentation Offloading

Friday, April 20th, 2007

Since a short while I have a gigabit setup at home. My laptop has a e1000 Intel NIC, my desktop a Broadcom NIC.While playing with Snort_inline and netpipe-tcp, I noticed something odd. I got tcp packets that had the ‘Don’t Fragment’ option set, but were still bigger than the mtu size of the link. Snort_inline read packets of up to 26kb from the queue, and wireshark and tcpdump were seeing the packets as well. This was only for outgoing packets on the e1000 NIC. The receiving pc saw the packets split up in multiple packets that were honoring the mtu size. This got me thinking that some form of offloading must be taking place and indeed this was the case:

# ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: on

It can be disabled by using the following command:

# ethtool -K eth0 tso off

The large packets caused problems in my stream4inline modifications of Stream4. The code can’t handle the case where the packet is bigger than the sliding window size. So I have some work to do ;-)

Snort_inline patch updated to 2.6.1.2

Wednesday, January 17th, 2007

With the recent Snort vulnerabilities we had to make a choice if we would backport the fixes to our Snort_inline 2.6.0.2 patch or that we would upgrade to 2.6.1.2. Upgrading makes most sense since SourceFire improves Snort with every release, but since the upgrade process has been very painful the last couple of releases, we weren’t really looking forward to it.

Earlier I wrote about my testing with Subversion for Snort_inline, and I found out that using Subversion made the upgrade procedure much easier and much less time consuming. So upgrading it was. Generally there were little changes to the Snort_inline patch required.

One thing however, messed up the way the new stream4inline code works. A new option in Snort’s Stream4, which is enabled by default, is session dropping. The way it works is that when a packet is dropped, the session it belongs to is instructed to drop every packet from that session from that time on.

This makes sense in many cases, but not in all. In stream4inline, we have created options to drop out-of-order packets, out-of-window packets, bad reset packets, and more. Generally, in these cases we want just drop those individual packets, not kill off the session.

Especially killing the session on bad reset packets would be making it easier to kill sessions by third parties. One might argue that sessions writing outside of the window can be killed, but when looking at the out-of-order limits, this can’t be done.

The out-of-order limits are enforced not because it is bad traffic, but to prevent resource starvation attacks against Snort_inline’s stream reassebler. Out-of-order packets will have to be put in the right order before processing, taking CPU time. Also, they have to be queue’d so re-order, taking memory.

By setting out-of-order limits, the burden of getting the stream in order is on the sender of the packets. He will have to retransmit the right packets first, before sending more out-of-order packets. In this case, we don’t want InlineDrop() to kill the entire session. To deal with this, we introduced InlineDropPacketOnly(), that just drops the packet.

A official beta should be out RealSoonNow(tm) ;-)

Snort_inline 2.6 development update

Saturday, December 23rd, 2006

Development of Snort_inline 2.6 experienced a bit of a setback when William and I discovered that the new Stream4inline had some issues with detecting certain attacks. Since we are scanning the reassembled stream certain detection plugins didn’t work as expected. Basically every detection plugin that uses absolute offsets from the packet start is messed up when we scan the reassembled stream only.

This is because the start of the reassembled stream doesn’t match with the start of the last packet added to this stream. Most TCP sigs are using offsets match against the start of the stream, or relative matches. For example a rule like:

alert tcp any any -> $HTTP_SERVERS 80 (msg:”GET request”; content:”GET”; offset:0; depth:3; sid:12345678; rev:1);

matches against the start of the stream since ‘GET’ will be the first data on the stream. In this case the reassembled stream only scanning would have worked fine because the start of the reassembled stream would match the start of the stream. So offset:0 in the reassembled stream points to the stream start, which is what we want in this case.

Things get different however, when we try to match against midstream packets where the rule matches against the actual packet start. One might argue that this is a bad idea in most cases, and I agree. Since TCP moves data stream based and not packet based, hardly any assumptions can be done about packet sizes, etc. Most TCP rules don’t use this, so the problem is fairly limited. An example of a rule that does this is the eDonkey detection sigs in the Bleeding ruleset.

As a solution we came up with the following. We scan every packet individually and in it’s reassembled stream. This is certainly more expensive, but the only way to avoid the evasion problems. I think we can probably add an option to make this behaviour optional, so the admin can choose to be extra safe at the cost of some perfomance.

Update on Snort_inline 2.6.0.2 development

Friday, November 10th, 2006

I have spend the last week trying to find a very annoying bug that caused Snort_inline to go into 100% CPU on certain traffic. It kept working, only my P3 500Mhz home gateway slowed down to between 2kb/s and 25kb/s, while normally it handles the full 325kb/s for my DSL line at around 25% CPU.

Snort comes with a number of performance measurement options. In 2.6 –enable-perfprofiling was introduced. Also, –enable-profile builds Snort for use with gprof. Next to those you can use strace and ltrace with the -c option to see the ammount of time spend in the several functions.

I already knew the problem was related to my new Stream4 code, since running Snort_inline without the ’stream4inline’ option made the problem go away. So my performance debugging and code reviews were focussed on that code. However, the performance statistics showed no functions that took large ammounts of time in Stream4.

(more…)