Re: [PATCH] lightnvm: pblk: Introduce hot-cold data separation

From: Heiner Litz <hidden>
Date: 2019-05-01 20:20:18
Also in: lkml

Javier, Igor,
you are correct. The problem exists if we have a power loss and we
have an open gc and an open user line and both contain the same LBA.
In that case, I think we need to care about the 4 scenarios:

1. user_seq_id > gc_seq_id and user_write after gc_write: No issue
2. user_seq_id > gc_seq_id and gc_write > user_write: Cannot happen,
open user lines are not gc'ed
3. gc_seq_id > user_seq_id and user_write after gc_write: RACE
4. gc_seq_id > user_seq_id and gc_write after user_write: No issue

To address 3.) we can do the following:
Whenever a gc line is opened, determine all open user lines and store
them in a field of pblk_line. When choosing a victim for GC, ignore
those lines.

Let me know if that sounds good and I will send a v2
Heiner

On Tue, Apr 30, 2019 at 11:19 PM Javier González [off-list ref] wrote:

quoted

On 26 Apr 2019, at 18.23, Heiner Litz [off-list ref] wrote:

Nice catch Igor, I hadn't thought of that.

Nevertheless, here is what I think: In the absence of a flush we don't
need to enforce ordering so we don't care about recovering the older
gc'ed write. If we completed a flush after the user write, we should
have already invalidated the gc mapping and hence will not recover it.
Let me know if I am missing something.

I think that this problem is orthogonal to a flush on the user path. For example

   - Write to LBA0 + completion to host
   - […]
   - GC LBA0
   - Write to LBA0 + completion to host
   - fsync() + completion
   - Power Failure

When we power up and do recovery in the current implementation, you
might get the old LBA0 mapped correctly in the L2P table.

If we enforce ID ordering for GC lines this problem goes away as we can
continue ordering lines based on ID and then recovering sequentially.

Thoughts?

Thanks,
Javier

quoted

On Fri, Apr 26, 2019 at 6:46 AM Igor Konopko [off-list ref] wrote:

quoted

On 26.04.2019 12:04, Javier González wrote:

quoted

On 26 Apr 2019, at 11.11, Igor Konopko [off-list ref] wrote:

On 25.04.2019 07:21, Heiner Litz wrote:

quoted

Introduce the capability to manage multiple open lines. Maintain one line
for user writes (hot) and a second line for gc writes (cold). As user and
gc writes still utilize a shared ring buffer, in rare cases a multi-sector
write will contain both gc and user data. This is acceptable, as on a
tested SSD with minimum write size of 64KB, less than 1% of all writes
contain both hot and cold sectors.

Hi Heiner

Generally I really like this changes, I was thinking about sth similar since a while, so it is very good to see that patch.

I have a one question related to this patch, since it is not very clear for me - how you ensure the data integrity in following scenarios:
-we have open line X for user data and line Y for GC
-GC writes LBA=N to line Y
-user writes LBA=N to line X
-we have power failure when both line X and Y were not written completely
-during pblk creation we are executing OOB metadata recovery
And here is the question, how we distinguish whether LBA=N from line Y or LBA=N from line X is the valid one?
Line X and Y might have seq_id either descending or ascending - this would create two possible scenarios too.

Thanks
Igor

You are right, I think this is possible in the current implementation.

We need an extra constrain so that we only GC lines above the GC line
ID. This way, when we order lines on recovery, we can guarantee
consistency. This means potentially that we would need several open
lines for GC to avoid padding in case this constrain forces to choose a
line with an ID higher than the GC line ID.

What do you think?

I'm not sure yet about your approach, I need to think and analyze this a
little more.

I also believe that probably we need to ensure that current user data
line seq_id is always above the current GC line seq_id or sth like that.
We cannot also then GC any data from the lines which are still open, but
I believe that this is a case even right now.

quoted

Thanks,
Javier

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help