Re: [RFC] textsearch infrastructure et al v2
From: Thomas Graf <tgraf@suug.ch>
Date: 2005-05-28 12:35:42
* jamal [ref] 2005-05-28 07:59
I dont have opinions on this since you and Pablo seem to agree on it. What I remember is that libssearch (or whatever that thing Harald was looking at) did it differently (callback invoked and it return a code which said to continue or not).
Same for my infrastructure with the difference that libqsearch uses a single input buffer so no chance for non-linear data.
Also the design should (I think you do already, just double checking) - should be wary of optimizing for a specific algorithmn; i see you have KMP but not Boyer-Moore for example and i am not sure what the repurcassions of above approach are etc etc.
This is a weakness of the current implementation, it could be addresses via the other method that I proposed. The current patch doesn't allow for random access over fragment borders which means that Boyer-Moore would require a temporary buffer with a size of the pattern length in order to do random access over multiple fragments. However, I haven't seen any infrastructure yet that can handle non-linear data with bm wihout a complete prefetch in the beginning though. You could still implement a bm by either using a naive search at the borders or simply define the limitation that you can only look at the first fragment so you'd have to linearize.
quoted
So basically we save the state of the segment fetching instead of saving the state of the search algorithm which would be way too complex for things like regular expression string matching.Can you explain this a little more. Didnt quiet understand. If you are going across skbs, dont you need to save state?
Most people look at it from the angle of iterate over text segments and apply a text search algorithm. I inverted this and said, start the text search algorithm and iterate over all text segments inside the algorithm. This makes many things a lot easier because we only have to save the state of the iterator. This brings up another limitation though, we cannot stop a search in progress and continue later on.
I think the best approach would be to first linearize then search.
This is what I'm trying to avoid. ;->
On ingress as well you are also better off to assemble fragments first on that side before doing text searches.
We can still do this optionally, just because we support it doesn't mean we have to use it.
Perhaps implementing an action like the one used by openbsd to "normalize" packets before a string match. i.e classifier (pkt header) --> normalize --> classifier(string match)-> ?
Infact one could argue that if you are to scan a virus you may need to assemble more than one skb on ingress (essentially a message).
http://oss.sgi.com/archives/netdev/2005-05/msg00298.html See the section after <<coffee break>>
The only other comment i have is on the patch you called naive regexp; I think you should probably call it something else instead of regexp since you invented it.
I was never good at names, any suggestions?