Juniper SRX MTU / MSS / Fragmentation problems with Ipsec vpn tunnel
The MSS (Maximum Segment Size) is a TCP connection mechanism or parameter through which a TCP side informes the other side the maximum size tcp segment size it can receive for that specific connection.
MSS is set only during tcp 3-way handshake and is part of the "TCP Options" field of tcp header only for SYN packets. TCPDUMP example:
Code:
11:23:44.450420 IP 83.72.75.223.20614 > 95.233.23.13.80: Flags [S], seq 3483318893, win 65535, options [mss 1460,nop,wscale 4,sackOK,TS val 2479291524 ecr 0], length 0
11:23:44.451946 IP 95.233.23.13.80 > 83.72.75.223.20614: Flags [S.], seq 425754303, ack 3483318894, win 0, options [mss 1460], length 0
11:23:44.522227 IP 83.72.75.223.20614 > 95.233.23.13.80: Flags [.], ack 1, win 65535, length 0
11:23:44.523656 IP 95.233.23.13.80 > 83.72.75.223.20614: Flags [.], ack 1, win 5840, length 0
Under normal conditions, ethernet MTU is 1500 bytes and the MSS (MTU - ip header - tcp header) is 1460.
In any VPN configuration, due to ipsec encapsulation (and, in some cases, GRE), the MTU for the vpn interface is lowered to a safe value of i.e.: 1400 in my case.
(Excelent article from Cisco: Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC
http://www.cisco.com/en/US/tech/tk827/tk369/technologies_white_paper09186a00800d6979.shtml)
Some situations when TCP client is accessing an external host through an Ipsec vpn tunnel for the first time, it is unaware of the MTU/MSS parameters for that specific path.
Let's say our PC 10.0.0.1 tries to connect to a webserver on IP 95.233.23.13.
MTU on the segment between our pc and first hop is 1500.
First hop Juniper SRX-1 sends the packet through an Ipsec tunnel over to Juniper SRX-2.
Juniper SRX-2 does a source nat for traffic from vpn to internet.
Connection breakdown.
#1. PC sends a syn packet to 95.233.23.13 port 80 with MSS 1456 (1500-40ip/tcp headers-5 vlan encapsulation)
#2. Web server 95.233.23.13 responds with a SYN-ACK packet with MSS set to 1460 (1500 -40 ip/tcp headers)
#3. PC sends an ACK packet to 95.233.23.13 (3way handshake is complete)
#4. PC sends the HTTP GET / request to 95.233.23.13 and sets the PSH flag (PSH flag in TCP tells the receiving end of the connection to "push" all buffered data to the receiving #application)
#5. Web server 95.233.23.13 acknowledges the GET request with a packet with ACK set to the seq of packet (seq 1:111 of packet #4, ack 111 of packet #5)
#6. Web server 95.233.23.13 starts sending data in segments of 1456 bytes (the MSS that PC told the server it can receive)
#7. Web server 95.233.23.13 sends the rest of the HTTP response and sets the PSH flag (tells the client's tcp stack/kernel to send all buffered data over to browser).
This is where the problem occurs as packets with 1456 bytes segments and don't fragment bit set in IP header will not enter the Ipsec tunnel (unless clear-bit is set, but this is not the purpose of this article) as the tunnel interfaces have an MTU of 1400 bytes.
Packets #8 and #9 are the icmp fragmentation needed packets.
Thus, SRX-2 will drop these large segments and will send an "ICMP fragmentation needed" back to the webserver. Tcpdump example of such behavior:
Code:
11:09:49.671733 IP 83.72.75.223.23985 > 95.233.23.13.80: Flags [S], seq 1613827877, win 65535, options [mss 1456,nop,wscale 4,sackOK,TS val 2478456636 ecr 0], length 0
11:09:49.673251 IP 95.233.23.13.80 > 83.72.75.223.23985: Flags [S.], seq 1439513197, ack 1613827878, win 11680, options [mss 1460,nop,wscale 0], length 0
11:09:49.745460 IP 83.72.75.223.23985 > 95.233.23.13.80: Flags [.], ack 1, win 4125, length 0
11:09:49.748949 IP 83.72.75.223.23985 > 95.233.23.13.80: Flags [P.], seq 1:111, ack 1, win 4125, length 110
11:09:49.750956 IP 95.233.23.13.80 > 83.72.75.223.23985: Flags [.], ack 111, win 11570, length 0
11:09:49.871888 IP 95.233.23.13.80 > 83.72.75.223.23985: Flags [.], seq 1:1457, ack 111, win 11570, length 1456
11:09:49.872095 IP 95.233.23.13.80 > 83.72.75.223.23985: Flags [P.], seq 1457:2818, ack 111, win 11570, length 1361
11:09:49.879392 IP 83.72.75.223 > 95.233.23.13: ICMP 10.0.0.1 unreachable - need to frag (mtu 1500), length 36
11:09:49.879397 IP 83.72.75.223 > 95.233.23.13: ICMP 10.0.0.1 unreachable - need to frag (mtu 1500), length 36
To a normal user this will seem like internet is very slow and that specific website doesn't load (as packets that won't fit the tunnel get dropped).
To fix this, Junos can be configured to override MSS parameter of packets entering Ipsec/GRE (in/out for GRE) tunnels (this is done on both devices SRX-1 and SRX-2).
Code:
# set security flow tcp-mss ?
Possible completions:
> all-tcp Enable MSS override for all packets
+ apply-groups Groups from which to inherit configuration data
+ apply-groups-except Don't inherit configuration data from these groups
> gre-in Enable MSS override for all GRE packets coming out of an IPSec tunnel
> gre-out Enable MSS override for all GRE packets entering an IPsec tunnel
> ipsec-vpn Enable MSS override for all packets entering IPSec tunnel
# set security flow tcp-mss ipsec-vpn mss 1360
And here's the tcpdump example:
Code:
11:23:44.450420 IP 83.72.75.223.20614 > 95.233.23.13.80: Flags [S], seq 3483318893, win 65535, options [mss 1360,nop,wscale 4,sackOK,TS val 2479291524 ecr 0], length 0
11:23:44.451946 IP 95.233.23.13.80 > 83.72.75.223.20614: Flags [S.], seq 425754303, ack 3483318894, win 0, options [mss 1360], length 0
11:23:44.522227 IP 83.72.75.223.20614 > 95.233.23.13.80: Flags [.], ack 1, win 65535, length 0
11:23:44.523656 IP 95.233.23.13.80 > 83.72.75.223.20614: Flags [.], ack 1, win 5840, length 0