Juniper SRX NAT64 behavior in relation to DF (Don’t Fragment) bit on incoming IPv4 packets
Juniper SRX NAT64 option natv6v4 no-v6-frag-headerNAT64, like all the other NAT technologies, translates IP headers. This particular NAT technology translates IPv6 headers to IPv4 headers and back (return traffic).
Since IPv6 header contains different fields than IPv4 header, RFC 6145 was created to define translation guidelines between the two protocols. One particular interesting guideline on translating IPv6 to IPv4 and back is “4. Translating from IPv4 to IPv6”. Quote:
Quote:
4. Translating from IPv4 to IPv6
When an IP/ICMP translator receives an IPv4 datagram addressed to a
destination towards the IPv6 domain, it translates the IPv4 header of
that packet into an IPv6 header. The original IPv4 header on the
packet is removed and replaced by an IPv6 header, and the transport
checksum is updated as needed, if that transport is supported by the
translator. The data portion of the packet is left unchanged. The
IP/ICMP translator then forwards the packet based on the IPv6
destination address.
+-------------+ +-------------+
| IPv4 | | IPv6 |
| Header | | Header |
+-------------+ +-------------+
| Transport- | | Fragment |
| Layer | ===> | Header |
| Header | | (if needed) |
+-------------+ +-------------+
| | | Transport- |
~ Data ~ | Layer |
| | | Header |
+-------------+ +-------------+
| |
~ Data ~
| |
+-------------+
Figure 2: IPv4-to-IPv6 Translation
Path MTU discovery is mandatory in IPv6, but it is optional in IPv4.
IPv6 routers never fragment a packet -- only the sender can do
fragmentation.
When an IPv4 node performs path MTU discovery (by setting the Don't
Fragment (DF) bit in the header), path MTU discovery can operate end-
to-end, i.e., across the translator. In this case, either IPv4 or
IPv6 routers (including the translator) might send back ICMP Packet
Too Big messages to the sender. When the IPv6 routers send these
ICMPv6 errors, they will pass through a translator that will
Li, et al. Standards Track [Page 6]
RFC 6145 IPv4/IPv6 Translation April 2011
translate the ICMPv6 error to a form that the IPv4 sender can
understand. As a result, an IPv6 Fragment Header is only included if
the IPv4 packet is already fragmented.
However, when the IPv4 sender does not set the DF bit, the translator
MUST ensure that the packet does not exceed the path MTU on the IPv6
side. This is done by fragmenting the IPv4 packet (with Fragment
Headers) so that it fits in 1280-byte IPv6 packets, since that is the
minimum IPv6 MTU. The IPv6 Fragment Header has been shown to cause
operational difficulties in practice due to limited firewall
fragmentation support, etc. In an environment where the network
owned/operated by the same entity that owns/operates the translator,
the translator MAY provide a configuration function for the network
administrator to adjust the threshold of the minimum IPv6 MTU to a
value that reflects the real value of the minimum IPv6 MTU in the
network (greater than 1280 bytes). This will help reduce the chance
of including the Fragment Header in the packets.
When the IPv4 sender does not set the DF bit, the translator SHOULD
always include an IPv6 Fragment Header to indicate that the sender
allows fragmentation. The translator MAY provide a configuration
function that allows the translator not to include the Fragment
Header for the non-fragmented IPv6 packets.
The above states that, when IPv4 server does NOT set the DF (Don’t Fragment) bit on the SYN/ACK packet, the translator (SRX in this case) MUST convert it to an IPv6 fragment (next-header: fragment).
Let’s see two tcpdumps, one between IPv6 host to SRX and the other between SRX and IPv4 server:
Code:
20:01:54.343554 IP6 (hlim 64, next-header: TCP (6), length: 40) 2001::2.48155 > 64:ff9b::c0a8:8199.22: S, cksum 0xaa99 (correct), 3014960686:3014960686(0) win 5760 <mss 1440,sackOK,timestamp 2198826273 0,nop,wscale 7>
20:01:54.345309 IP6 (hlim 62, next-header: Fragment (44), length: 48) 64:ff9b::c0a8:8199 > 2001::2: frag (0x0000d468:0|40) 22 > 48155: S 1861712712:1861712712(0) ack 3014960687 win 65535 <mss 1440,nop,wscale 6,sackOK,timestamp[|tcp]>
Code:
18:01:57.668809 IP (tos 0x0, ttl 62, id 274, offset 0, flags [DF], proto TCP (6), length 60)
172.16.201.140.48531 > 192.168.129.153.22: Flags [S], cksum 0xc0ce (correct), seq 3014960686, win 5760, options [mss 1440,sackOK,TS val 2198826273 ecr 0,nop,wscale 7], length 0
18:01:57.668852 IP (tos 0x0, ttl 64, id 54376, offset 0, flags [none], proto TCP (6), length 60)
192.168.129.153.22 > 172.16.201.140.48531: Flags [S.], cksum 0x6936 (incorrect -> 0x584d), seq 1861712712, ack 3014960687, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 3509502850 ecr 2198826273], length 0
As you can see on the SYN/ACK from server (192.168.129.153.22 > 172.16.201.140.48531), the DF bit is NOT set (
flags [none]). The resulting SYN/ACK between SRX and IPv6 host is enclosed in an IPv6 fragment header (
next-header: Fragment (44), length: 48) 64:ff9b::c0a8:8199 > 2001::2).
Now, this will cause issues in networks or hosts where IPv6 fragments are discarded. To overcome this, Juniper has introduced the following option (that is not documented it seems):
# set security nat natv6v4 no-v6-frag-header .
What this Juniper SRX configuration does is to enclose the TCP packet in an IPv6 packet without a fragmentation header:
Code:
20:18:34.887748 IP6 (hlim 64, next-header: TCP (6), length: 40) 2001::2.45103 > 64:ff9b::c0a8:8199.22: S, cksum 0x5baa (correct), 4078084663:4078084663(0) win 5760 <mss 1440,sackOK,timestamp 2199817350 0,nop,wscale 7>
20:18:34.894632 IP6 (hlim 62, next-header: TCP (6), length: 40) 64:ff9b::c0a8:8199.22 > 2001::2.45103: S, cksum 0x5f02 (correct), 4281969091:4281969091(0) ack 4078084664 win 65535 <mss 1440,nop,wscale 6,sackOK,timestamp 2508903567 2199817350>
The SYN/ACK packet (next-header: TCP (6), length: 40) 64:ff9b::c0a8:8199.22 > 2001::2.45103) has an IPv6 attribute next-header set to TCP, avoiding IPv6 fragment.
Note 1: tcpdump with tcp filters on IPv6 fragments will not work, same as it won’t work on IPv4 fragments.Note 2: MX MS-MPC/MS-PIC NAT64 has a similar feature to avoid fragmentation on the IPv6 side: https://www.juniper.net/documentation/en_US/junos14.1/topics/task/configuration/nat-stateful-nat64-configuring.html .