Thread (1 message) 1 message, 1 author, 2020-08-20

Re: Memory window support for rdma_rxe

From: Bob Pearson <hidden>
Date: 2020-08-20 21:51:16

Possibly related (same subject, not in this thread)

On 8/20/20 2:27 PM, Bob Pearson wrote:
On 8/20/20 2:41 AM, Leon Romanovsky wrote:
quoted
On Wed, Aug 19, 2020 at 11:36:54AM -0500, Bob Pearson wrote:
quoted
On 8/19/20 12:02 AM, Leon Romanovsky wrote:
quoted
On Tue, Aug 18, 2020 at 10:39:46PM -0500, Bob Pearson wrote:
quoted
This a cleaned up resend of an earlier patch set. This set of patches
implements the memory windows verbs and local send operations. Each of these
has been tested at a basic level and regressions tests have been run to
see that basic rxe functionality is OK.
Can you please submit the series together with standard cover-letter
(git format-patch --cover-letter ..) that include diffstat and patch
list.

It is helpful to see the whole picture of expected changes.

Does it pass rdma-core pyverbs tests?

Thanks
Leon,

Thanks for the comments. They are helpful. I haven't worked on rxe or anything else in Linux for about 6-7 years so there are a lot of things that have changed. I have a few questions that you may be able to answer.

The build robot seems to be catching things that make in the kernel tree is missing (I think.) Is there a way to check if patches will work before sending them in an email? The most recent attempt had a stray variable declaration left over from some other change but I never saw a compiler warning.
You can catch most (90%) of errors reported by kbuild if you use
latest GCC compiler to prepare your patches. Latest Fedora (32) has
it. Compile your code with allyesconfig, allmodconfig and allnoconfig.

Rest of errors you can find with smatch and sparse tools.
quoted
I had used --compose rather than --cover-letter and wondered how people got those nice [PATCH 0/N] messages. I'll give it a try.

I've never come to terms with Python (white space shouldn't carry syntax IMHO) and have no idea what pyverbs is doing. How do you run the tests you mention?
https://github.com/linux-rdma/rdma-core/blob/master/Documentation/testing.md#how-to-run-rdma-cores-tests
Bottom line:
1. Download rdma-core
2. Compile on the system with your rxe device, use build.sh script in
source code
3. Run the tests directly from the source code
./build/bin/run_tests.py -v
quoted
I tried to get git send-email to put a version number into the subject lines with -v2 which it happily accepts but it does nothing. In the end I had to edit each email one at a time. Is there an easier way to get e.g. [PATCH v3 xx/yy]?
It is done during format-patch stage, my command line for the series is;
git format-patch --cover-letter -M -C -v X --subject-prefix "PATCH $TARGET" -o /tmp/
                                     ^^^^ version                 ^^^^ rdma-next or rdma-rc
quoted
Thanks for the help,

Bob Pearson
Interesting. I fairly easily got the tests working but have found bugs in error cases in the response state machine that I'll have to fix. The test behaves badly (perhaps on purpose) by deallocating the MWs and then banging away sending writes to the now defunct MW. The responder should nak the rkey violation but doesn't. The cause of that is that do_complete assumes that no errors ever occur and skips out if there isn't receive wqe to complete bypassing the ACKNOWLEDGE state. This should also have been seen for MRs if anyone ever did the same thing.

Bob 
The run_tests.py tests are mostly running. There are four test cases that always fail (AH, and mcast) but have nothing to do with MWs. And there are occasional other failures from INIT->RTR QP transition timeouts failures. These are not reproducible and occur on various tests. I do not believe this has anything to do with the MW code either. It never gets there when it happens to be a MW case.

There were three issues with the MW code that are fixed now. One was a use before set of a pointer, one was a difference of interpretation of the IBA specs (I wasn't allowing invalidation of a MW unless it was valid), the last was was the missing acks described above.

Do you know if this is normal behavior for rxe?

I am going to post v3 patch set now.

Bob
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help