Re: [PATCH v9 net-next 01/23] net/tcp: Prepare tcp_md5sig_pool for TCP-AO
From: Dmitry Safonov <hidden>
Date: 2023-08-11 21:03:02
Also in:
lkml
Hi Eric, On 8/8/23 10:58, Eric Dumazet wrote:
On Wed, Aug 2, 2023 at 7:27 PM Dmitry Safonov [off-list ref] wrote:
[..]
quoted
- hash = crypto_alloc_ahash("md5", 0, CRYPTO_ALG_ASYNC); - if (IS_ERR(hash)) - return; - - for_each_possible_cpu(cpu) { - void *scratch = per_cpu(tcp_md5sig_pool, cpu).scratch; - struct ahash_request *req; - - if (!scratch) { - scratch = kmalloc_node(sizeof(union tcp_md5sum_block) + - sizeof(struct tcphdr), - GFP_KERNEL, - cpu_to_node(cpu)); - if (!scratch) - return; - per_cpu(tcp_md5sig_pool, cpu).scratch = scratch; - } - if (per_cpu(tcp_md5sig_pool, cpu).md5_req) - continue; - - req = ahash_request_alloc(hash, GFP_KERNEL); - if (!req) - return; - - ahash_request_set_callback(req, 0, NULL, NULL); - - per_cpu(tcp_md5sig_pool, cpu).md5_req = req; + scratch_size = sizeof(union tcp_md5sum_block) + sizeof(struct tcphdr); + ret = tcp_sigpool_alloc_ahash("md5", scratch_size); + if (ret >= 0) { + tcp_md5_sigpool_id = ret;tcp_md5_alloc_sigpool() can be called from multiple cpus, yet you are writing over tcp_md5_sigpool_id here without any spinlock/mutex/annotations ?
Yeah, it's writing-only: as long as there was a TCP-MD5 key in the system, sigpool_id returned by tcp_sigpool_alloc_ahash() would stay the same. The only time when it writes a different value - iff all previous MD5 keys were released.
KCSAN would eventually file a report, and a compiler might very well transform this to tcp_md5_sigpool_id = random_value; <window where readers might catch garbage> tcp_md5_sigpool_id = ret;
Agree, haven't thought about the optimizing compiler nightmares. Will add WRITE_ONCE(). [..]
quoted
+ +/** + * sigpool_reserve_scratch - re-allocates scratch buffer, slow-path + * @size: request size for the scratch/temp buffer + */ +static int sigpool_reserve_scratch(size_t size) +{ + struct scratches_to_free *stf; + size_t stf_sz = struct_size(stf, scratches, num_possible_cpus());This is wrong. num_possible_cpus() could be 2, with two cpus numbered 0 and 8. Look for nr_cpu_ids instead.
Hmm, but it seems num_possible_cpus() was exactly what I wanted? As free_old_scratches() does: : while (stf->cnt--) : kfree(stf->scratches[stf->cnt]); So, that's just the number of previously allocated scratches, not per-CPU index.
quoted
+ int cpu, err = 0; + + lockdep_assert_held(&cpool_mutex); + if (__scratch_size >= size) + return 0; + + stf = kmalloc(stf_sz, GFP_KERNEL); + if (!stf) + return -ENOMEM; + stf->cnt = 0; + + size = max(size, __scratch_size); + cpus_read_lock(); + for_each_possible_cpu(cpu) { + void *scratch, *old_scratch; + + scratch = kmalloc_node(size, GFP_KERNEL, cpu_to_node(cpu)); + if (!scratch) { + err = -ENOMEM; + break; + } + + old_scratch = rcu_replace_pointer(per_cpu(sigpool_scratch, cpu), + scratch, lockdep_is_held(&cpool_mutex)); + if (!cpu_online(cpu) || !old_scratch) { + kfree(old_scratch); + continue; + } + stf->scratches[stf->cnt++] = old_scratch; + }I wonder why you are not simply using something around __alloc_percpu()
I presume that was a heritage of __tcp_alloc_md5sig_pool() that allocated scratches this way. In pre-crypto-cloning versions per-CPU ahash_requests were allocated with alloc_percpu(). But now looking closer, if I read pcpu allocator code correctly, it does pcpu_mem_zalloc() for populating per-CPU chunks which depending on pcpu_chunk_struct_size may be allocated with vmalloc() or kzalloc(). IOW, as long as CONFIG_NEED_PER_CPU_KM != n, which is: : config NEED_PER_CPU_KM : depends on !SMP || !MMU The memory, returned by pcpu allocator has no guarantee of being from direct-map. Which I presume defeats the purpose of having any scratch/temporary area for sg/crypto. Am I missing something?
quoted
+ cpus_read_unlock(); + if (!err) + __scratch_size = size; + + call_rcu(&stf->rcu, free_old_scratches); + return err; +} + +static void sigpool_scratch_free(void) +{ + int cpu; + + for_each_possible_cpu(cpu) + kfree(rcu_replace_pointer(per_cpu(sigpool_scratch, cpu), + NULL, lockdep_is_held(&cpool_mutex)));I wonder why bothering about freeing some space ? One scratch buffer per cpu is really small.
When all TCP-MD5/TCP-AO keys removed, why not free some memory? [..]
quoted
+ /* slow-path */ + mutex_lock(&cpool_mutex); + ret = sigpool_reserve_scratch(scratch_size); + if (ret) + goto out; + for (i = 0; i < cpool_populated; i++) { + if (!cpool[i].alg) + continue; + if (strcmp(cpool[i].alg, alg)) + continue; + + if (kref_read(&cpool[i].kref) > 0) + kref_get(&cpool[i].kref); + else + kref_init(&cpool[i].kref);This looks wrong. kref_init() should be called once, after allocation, not based on kref_read().
Well, as long as it's the slow-path and cpool_mutex was grabbed, this essentially re-allocating an entry that had it's last reference released, but wasn't yet destroyed by __cpool_free_entry(). I think, if I interpreted it correctly, it was suggested-by Jakub: https://lore.kernel.org/all/20230106175326.2d6a4dcd@kernel.org/T/#u (local) [..]
quoted
+static void __cpool_free_entry(struct sigpool_entry *e) +{ + crypto_free_ahash(e->hash); + kfree(e->alg); + memset(e, 0, sizeof(*e)); +} + +static void cpool_cleanup_work_cb(struct work_struct *work)Really this looks over complicated to me. All this kref maintenance and cpool_mutex costs :/
Hmm, unsure: I can remove this sigpool thing and just allocate a crypto tfm for every new setsockopt(). Currently, this allocates one tfm per hash algorithm, and without it it would be one tfm per setsockopt()/key. The supported scale currently is thousands of keys per-socket, which means that with sizeof(struct crypto_ahash) == 104, that will increase memory consumption by hundreds of kilobytes to megabytes. I thought that's a reasonable thing to do. Should I proceed TCP-AO patches without this manager thing?
quoted
+{ + bool free_scratch = true; + unsigned int i; + + mutex_lock(&cpool_mutex); + for (i = 0; i < cpool_populated; i++) { + if (kref_read(&cpool[i].kref) > 0) { + free_scratch = false; + continue; + } + if (!cpool[i].alg) + continue; + __cpool_free_entry(&cpool[i]); + } + if (free_scratch) + sigpool_scratch_free(); + mutex_unlock(&cpool_mutex); +}
[..]
Thanks,
Dmitry