Re: [RFC 1/9] lib/string: Add function to trim duplicate WS
From: Tony Asleson <hidden>
Date: 2020-01-03 14:30:21
Also in:
linux-fsdevel, linux-scsi
On 1/2/20 4:52 PM, Tony Asleson wrote:
On 12/23/19 5:28 PM, Matthew Wilcox wrote:quoted
On Mon, Dec 23, 2019 at 04:55:50PM -0600, Tony Asleson wrote:quoted
+/** + * Removes leading and trailing whitespace and removes duplicate + * adjacent whitespace in a string, modifies string in place. + * @s The %NUL-terminated string to have spaces removed + * Returns the new length + */This isn't good kernel-doc. See Documentation/doc-guide/kernel-doc.rst Compile with W=1 to get the format checked.Indeed, I'll correct it.quoted
quoted
+size_t strim_dupe(char *s) +{ + size_t ret = 0; + char *w = s; + char *p; + + /* + * This will remove all leading and duplicate adjacent, but leave + * 1 space at the end if one or more are present. + */ + for (p = s; *p != '\0'; ++p) { + if (!isspace(*p) || (p != s && !isspace(*(p - 1)))) { + *w = *p; + ++w; + ret += 1; + } + }I'd be tempted to do ... size_t ret = 0; char *w = s; bool last_space = false; do { bool this_space = isspace(*s); if (!this_space || !last_space) { *w++ = *s; ret++; } s++; last_space = this_space; } while (s[-1] != '\0');That leaves a starting and trailing WS, how about something like this? size_t strim_dupe(char *s) { size_t ret = 0; char *w = s; bool last_space = false; do { bool this_space = isspace(*s); if (!this_space || (!last_space && ret)) {Mollie Fitzgerald *w++ = *s; ret++; } s++; last_space = this_space; } while (s[-1] != '\0'); if (ret > 1 && isspace(w[-2])) { w[-2] = '\0'; ret--; } ret--; return ret; }
This function was added so I could strip out extra spaces in the vpd 0x83 string representation, to shorten them before they get added to the structured syslog message. I'm starting to think this is a bad idea as anyone that might want to write some code to use the kernel sysfs entry for a device and search for it in the syslog would have to perturb the id string the same way. I think this change should just be removed from the patch series and leave the IDs as they are. If we really wanted a shorter ID, a better approach would be use a hash of the ID string, but that introduces another level of complexity that isn't helpful to end users. -Tony