Re: [RFC PATCH 03/13] xen-netback: implement TX persistent grants
From: Joao Martins <hidden>
Date: 2015-05-22 10:25:16
On 19 May 2015, at 17:23, Wei Liu [off-list ref] wrote:
On Tue, May 12, 2015 at 07:18:27PM +0200, Joao Martins wrote:quoted
Introduces persistent grants for TX path which follows similar code path as the grant mapping. It starts by checking if there's a persistent grant available for header and frags grefs and if so setting it in tx_pgrants. If no persistent grant is found in the tree for the header it will resort to grant copy (but preparing the map ops and add them laster). For the frags it will use the^ laterquoted
tree page pool, and in case of no pages it fallbacks to grant map/unmap using mmap_pages. When skb destructor callback gets called we release the slot and persistent grant within the callback to avoid waking up the dealloc thread. As long as there are no unmaps to done the dealloc thread will remain inactive.This scheme looks complicated. Can we just only use one scheme at a time? What's the rationale for using this combined scheme? Maybe you're thinking about using a max_grants < ring_size to save memory?
Yes, my purpose was to allow a max_grants < ring_size to save amount of memory mapped. I did a bulk transfer test with iperf and the max amount of grants in tree was <160 TX gnts, without affecting the max performance; tough using pktgen fills the tree completely. The second reason is to handle the case for a (malicious?) frontend providing more grefs than the max allowed in which I would fallback to grant map/unmap.
Only skim the patch. I will do detailed reviews after we're sure this is the right way to go.quoted
Results show an improvement of 46% (1.82 vs 1.24 Mpps, 64 pkt size) measured with pktgen and up to over 48% (21.6 vs 14.5 Gbit/s) measured with iperf (TCP) with 4 parallel flows 1 queue vif, DomU to Dom0. Tests ran on a Intel Xeon E5-1650 v2 with HT disabled. […] int xenvif_init_queue(struct xenvif_queue *queue) { int err, i;@@ -496,9 +524,23 @@ int xenvif_init_queue(struct xenvif_queue *queue).ctx = NULL, .desc = i }; queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE; + queue->tx_pgrants[i] = NULL; + } + + if (queue->vif->persistent_grants) { + err = init_persistent_gnt_tree(&queue->tx_gnts_tree, + queue->tx_gnts_pages, + XEN_NETIF_TX_RING_SIZE); + if (err) + goto err_disable; } return 0; + +err_disable: + netdev_err(queue->vif->dev, "Could not reserve tree pages."); + queue->vif->persistent_grants = 0;You can just move the above two lines under `if (err)'. Also see below.
In the next patch this is also common cleanup path. I did it this way to avoid moving this hunk around.
quoted
+ return 0; } void xenvif_carrier_on(struct xenvif *vif)@@ -654,6 +696,10 @@ void xenvif_disconnect(struct xenvif *vif)} xenvif_unmap_frontend_rings(queue); + + if (queue->vif->persistent_grants) + deinit_persistent_gnt_tree(&queue->tx_gnts_tree, + queue->tx_gnts_pages);If the init function fails on queue N (N>0) you now leak resources.
Correct. I should return -ENOMEM (on init) and not 0.
quoted
} }diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 332e489..529d7c3 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c@@ -269,6 +269,11 @@ static inline unsigned long idx_to_kaddr(struct xenvif_queue *queue,return (unsigned long)pfn_to_kaddr(idx_to_pfn(queue, idx)); } +static inline void *page_to_kaddr(struct page *page) +{ + return pfn_to_kaddr(page_to_pfn(page)); +} + #define callback_param(vif, pending_idx) \ (vif->pending_tx_info[pending_idx].callback_struct)@@ -299,6 +304,29 @@ static inline pending_ring_idx_t pending_index(unsigned i)return i & (MAX_PENDING_REQS-1); } +/* Creates a new persistent grant and add it to the tree. + */ +static struct persistent_gnt *xenvif_pgrant_new(struct persistent_gnt_tree *tree, + struct gnttab_map_grant_ref *gop) +{ + struct persistent_gnt *persistent_gnt; + + persistent_gnt = kmalloc(sizeof(*persistent_gnt), GFP_KERNEL);xenvif_pgrant_new can be called in NAPI which runs in softirq context which doesn't allow you to sleep.
Silly mistake, I will fix this. The whole point of the tree page pool was that we aren't allowed to sleep both here and in ndo_start_xmit (in patch #6).
quoted
[…] +/* Checks if there's a persistent grant available for gref and + * if so, set it also in the tx_pgrants array that keeps the ones + * in use. + */ +static bool xenvif_tx_pgrant_available(struct xenvif_queue *queue, + grant_ref_t ref, u16 pending_idx, + bool *can_map) +{ + struct persistent_gnt_tree *tree = &queue->tx_gnts_tree; + struct persistent_gnt *persistent_gnt; + bool busy; + + if (!queue->vif->persistent_grants) + return false; + + persistent_gnt = get_persistent_gnt(tree, ref); + + /* If gref is already in use we fallback, since it would + * otherwise mean re-adding the same gref to the tree + */ + busy = IS_ERR(persistent_gnt); + if (unlikely(busy)) + persistent_gnt = NULL; +Under what circumstance can we retrieve a already in use persistent grant? You seem to suggest this is a bug in RX case.
A guest could share try to share the same mapped page in multiple frags, in which case I fallback to map/unmap. I think this is a limitation in the way we manage the persistent gnts where we can only have a single reference of a persistent grant inflight.
quoted
[…] MAX_PENDING_REQS); index = pending_index(queue->dealloc_prod);@@ -1691,7 +1939,10 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)smp_wmb(); queue->dealloc_prod++; } while (ubuf); - wake_up(&queue->dealloc_wq); + /* Wake up only when there are grants to unmap */ + if (dealloc_prod_save != queue->dealloc_prod) + wake_up(&queue->dealloc_wq); + spin_unlock_irqrestore(&queue->callback_lock, flags); if (likely(zerocopy_success))@@ -1779,10 +2030,13 @@ int xenvif_tx_action(struct xenvif_queue *queue, int budget)xenvif_tx_build_gops(queue, budget, &nr_cops, &nr_mops); - if (nr_cops == 0) + if (!queue->vif->persistent_grants && + nr_cops == 0)You can just move nr_cops to previous line. Wei.