over 4 years ago

It should be noted that true Common Lisp somewhat lacks in several important
parts of string-processing, and it shows sometime. Today I needed to heavily
process large body of regular text and will write here some functions which
are AFAIK considered "standard" in modern languages and which not so easily
accessible and/or amazingly intuitive in CL.

In all following code snippets token input stands for input string.

  1. Trimming string from spaces, tabs and newlines

    ([string-trim][1] '(#\Space #\Newline #\Return #\Linefeed #\Tab) input)) 
    

    All named characters are listed in Hyperspec, 13.1.7 Character Names.

  2. Replacing by regular expressions

    Provided by CL-PPCRE package.

    In next snippet I remove all tokens enclosed in square brackets from the input string:

    (ql:quickload :cl-ppcre) 
    ([cl-ppcre:regex-replace-all][4] "\\[[^]]+\\]" input "")
    

    Honestly, I don't know when you can need simple regex-replace and not regex-replace-all. Also, note the double-escaping of special symbols (\\[ instead of \[).

  3. Splitting string by separator symbol

    Provided by CL-UTILITIES package.

    In next snippet I split the input string by commas:

    (ql:quickload :cl-utilities) 
    ([cl-utilities:split-sequence][6] #\, input)
    
  4. Making same modification on every string in given list

    In next snippet I trim spaces around all strings in list input-list:

    ([map][7] 'list 
      (lambda (input) 
        (string-trim " " input)) 
      input-list)
    

    However, way better is to wrap the transformation for the string in separate function and call the mapping referencing just the name of transformation:

    (defun trim-spaces (input) 
        "Remove trailing and leading spaces from input string" 
        (string-trim '(#\Space) input))
    (map 'list #'trim-spaces input)
    

    Do not forget that string is just a sequence of a characters, and all sequence-operating functions can work on strings in either "abcd" form or '(#\a #\b #\c #\d) form. This applies only to sequence-operating functions, however.

  5. Removing the characters from string by condition

    In the next snippet I leave only the alphanumeric characters in the input string:

    ([remove-if-not][8] #'[alphanumericp][9] input)
    

    There are remove-if also.

    As with map, you can make arbitrary complex predicates either with lambdas or wrapping them in separate functions.

 
comments powered by Disqus