Re: [RFC/PATCH] "safer ipv4 reassembly" (fwd)
From: Arthur Kepner <hidden>
Date: 2005-06-28 22:11:39
On Sun, 26 Jun 2005, Herbert Xu wrote:
..... Thanks for writing this patch Arthur.
Likewise, thanks for reviewing it.
quoted
+struct ipc { + struct hlist_node node; + u32 saddr; + u32 daddr; + u8 protocol; + atomic_t refcnt; /* how many ipqs hold refs to us */ + atomic_t seq; /* how many ip datagrams for this + * (saddr,daddr,protocol) since we + * were created */ + struct timer_list timer; + struct rcu_head rcu;Is RCU worth it here? The only time we'd be taking the locks on this is when the first fragment of a packet comes in. At that point we'll be taking write_lock(&ipfrag_lock) anyway. The only other use of RCU in your patch is ip_count. That should be changed to be done in ip_defrag instead. At that point you can simply find the ipc by deferencing ipq, so no need for __ipc_find and hence RCU. The reason you need to change it in this way is because you can't make assumptions about ip_rcv_finish being the first place where a packet is defragmented. With connection tracking enabled conntrack is the first place where defragmentation occurs.
Right, I see that now. (I'm not well acquainted with the conntrack code...) One reason I used RCU for the "ipc" structures is that I wanted to be able to find find them (in ip_rcv_finish()) without locking. Since ip_rcv_finish() is the wrong place to do that, that reason is invalid. There is a (big) advantage to doing this in ip_defrag() - this becomes a no-op for non-fragmented datagrams. The disadvantage is that there could be a situation where you receive: 1) first fragment of datagram X [for a particular (src,dst,proto)] 2) a zillion non-fragmented datagrams [for the same (src,dst,proto)] 3) last fragment of datagram X [for (src,dst,proto)] and no "disorder" would be detected for the datagrams associated with (src,dst,proto), even though the ip id could have wrapped in the meantime. This seems like a very uncommon case, however.
quoted
+#define IPC_HASHSZ IPQ_HASHSZ +static struct { + struct hlist_head head; + spinlock_t lock; +} ipc_hash[IPC_HASHSZ];I'd store ipc entries in the main ipq hash table since they can use the same keys for lookup as ipq entries. You just need to set protocol to zero and map the user to values specific to ipc for ipc entries. One mapping would be to set the top bit of user for ipc entries, e.g. #define IP_DEFRAG_IPC 0x80000000 ipc->user = ipq->user | IP_DEFRAG_IPC; Of course you also need to make sure that the two structures share the leading elements. You can then use the user field to distinguish between ipc/ipq entries.
Hmmm, let me think about combining the ipc/ipq structures, and also the related question of whether to use RCU for the ipc structures. I'll try to spin another version of the patch before the end of the week. -- Arthur