Thread (30 messages) 30 messages, 6 authors, 2020-01-10

Re: [RFC 1/9] lib/string: Add function to trim duplicate WS

From: Tony Asleson <hidden>
Date: 2020-01-03 14:30:21
Also in: linux-fsdevel, linux-scsi

On 1/2/20 4:52 PM, Tony Asleson wrote:
On 12/23/19 5:28 PM, Matthew Wilcox wrote:
quoted
On Mon, Dec 23, 2019 at 04:55:50PM -0600, Tony Asleson wrote:
quoted
+/**
+ * Removes leading and trailing whitespace and removes duplicate
+ * adjacent whitespace in a string, modifies string in place.
+ * @s The %NUL-terminated string to have spaces removed
+ * Returns the new length
+ */
This isn't good kernel-doc.  See Documentation/doc-guide/kernel-doc.rst
Compile with W=1 to get the format checked.
Indeed, I'll correct it.
quoted
quoted
+size_t strim_dupe(char *s)
+{
+	size_t ret = 0;
+	char *w = s;
+	char *p;
+
+	/*
+	 * This will remove all leading and duplicate adjacent, but leave
+	 * 1 space at the end if one or more are present.
+	 */
+	for (p = s; *p != '\0'; ++p) {
+		if (!isspace(*p) || (p != s && !isspace(*(p - 1)))) {
+			*w = *p;
+			++w;
+			ret += 1;
+		}
+	}
I'd be tempted to do ...

	size_t ret = 0;
	char *w = s;
	bool last_space = false;

	do {
		bool this_space = isspace(*s);

		if (!this_space || !last_space) {
			*w++ = *s;
			ret++;
		}
		s++;
		last_space = this_space;
	} while (s[-1] != '\0');
That leaves a starting and trailing WS, how about something like this?

size_t strim_dupe(char *s)
{
	size_t ret = 0;
	char *w = s;
	bool last_space = false;

	do {
		bool this_space = isspace(*s);
		if (!this_space || (!last_space && ret)) {Mollie Fitzgerald
			*w++ = *s;
			ret++;
		}
		s++;
		last_space = this_space;
	} while (s[-1] != '\0');

	if (ret > 1 && isspace(w[-2])) {
		w[-2] = '\0';
		ret--;
	}

	ret--;
	return ret;
}
This function was added so I could strip out extra spaces in the vpd
0x83 string representation, to shorten them before they get added to the
structured syslog message.  I'm starting to think this is a bad idea as
anyone that might want to write some code to use the kernel sysfs entry
for a device and search for it in the syslog would have to perturb the
id string the same way.

I think this change should just be removed from the patch series and
leave the IDs as they are.

If we really wanted a shorter ID, a better approach would be use a hash
of the ID string, but that introduces another level of complexity that
isn't helpful to end users.

-Tony
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help