Score:1

Bash string split by delimiter constrained by number of characters

in flag

I want to split a long text separated by spaces through bash, but I fail. The command below splits into characters, but not delimiters.

echo "The quick fox jumped over the lazy dog" | fold -w 10
echo "The quick fox jumped over the lazy dog" | sed -e 's/.\{9\}/&\n/g'

It would be nice to have it for some user bash interaction.

Input syntax

format_text 10 "The quick fox jumped over the lazy dog"

Output:

The quick 
fox jumped 
over the 
lazy dog

You must notice that the third line would cut "l" letter from "lazy" off if not for spacing spacing rule.

Update: The current result is good, There is some issue with the work slicer I cannot get by myself: it does not break words before the limit is transpassed.

#!/bin/bash

printHeader () {
    declare -i line_length=$3
    
    # Upper and lower fences 
    local upper_command="print \"$1\" *" 
    local upper_fence="$(python -c "$upper_command $line_length")"
    
    local lower_command="print \"$2\" *"
    local lower_fence="$(python -c "$lower_command $line_length")"
    
    # Slice words by some chracter counter
    local regex_counter="s/(.{$line_length}) /\1\n/g"
    
    # Complete line with dots and a pipe
    local res="$line_length - length"
    local repeat_pattern='$(repeat res \".\"; echo)'
    local fill_command="{res=($res); printf \"%s%s|\n\", $0, $repeat_pattern}"

    echo "$upper_fence"

    sed -r "$regex_counter" <<< $4

    echo "$lower_fence"
}

printHeader "#" "#" 10 "The quick fox jumped over the lazy dog"

Current output without final token:

##########
The quick fox
jumped over
the lazy dog
##########
Bruno Henrique Peixoto avatar
in flag
I added the examples as you suggested
Bruno Henrique Peixoto avatar
in flag
Great question! It is an open question for the problem. We may tag the line with a <<-- at the end, or something like that. Or maybe break the word relentlessly.
Bruno Henrique Peixoto avatar
in flag
But lets take the case where the number of characters is greater than the greatest word. It seems reasonable for a natural language text,
bac0n avatar
cn flag
`| fmt -w 11` .. (think you have to count the newline too)
Bruno Henrique Peixoto avatar
in flag
Sublime answer. It is already ok for me! In case I want to put some delimiter to denote the line limit, does the code changes much?
Score:2
cn flag
sed -r 's/([^ .]+ [^ .]+) /\1\n/g' <<< "The quick fox jumped over the lazy dog"
The quick
fox jumped
over the
lazy dog

The character set [^ .]+ means one or more + characters of any kind . excluding the ^ whitespaces. So the capture group ([^ .]+ [^ .]+) matches for patterns as string string. The whole regular expression has an additional whitespace at the end ([^ .]+ [^ .]+) (it could be included in the capture group in order to preserve it).

With sed by using the substitute s command we replace the matched pattern by the content of the first capture group \1 and a new line character \n instead of the whitespace. By the flag g we repeat the command to the end of each line. The -r option activates the extended regular expressions.


Update - this is the actual answer:

sed -r 's/(.{8}) /\1\n/g' <<< "How do we know it is going to match the pre-defined number of characters?"
How do we
know it is
going to
match the
pre-defined
number of
characters?

In this example we capture strings with length at least 8 characters (including whitespaces) followed by a whitespace. We can check the actual length of the output lines in a way as this:

sed -r 's/(.{8}) /\1\n/g' <<< "How do we know it is going to match the pre-defined number of characters?" \
    | awk '{print length}'
9
10
8
9
11
9
11

And by the help of the answers of the question How to use printf to print a character multiple times? [awk] we can achieve the desired result.

sed -r 's/(.{8}) /\1\n/g' <<< "How do we know it is going to match the pre-defined number of characters?" \
    | awk '{rest=(12 - length); printf "%s%s|\n", $0, substr(".........", 1, rest)}'
How do we...|
know it is..|
going to....|
match the...|
pre-defined.|
number of...|
characters?.|

In case you want to break the words remove the final whitespace from the above regular expression /(.{8})/. Here is an example where the max line length will be exactly 10 characters or less, where the second sed command will trim the whitespaces around each new line.

sed -r 's/(.{10})/\1\n/g' <<< "How do we know it is going to match the pre-defined number of characters?" \
    | sed -r 's/(^ | $)//g' \
    | awk '{rest=(10 - length); printf "%s%s|\n", $0, substr(".........", 1, rest)}'
How do we.|
know it is|
going to..|
match the.|
pre-define|
d number o|
f characte|
rs?.......|
Bruno Henrique Peixoto avatar
in flag
How do we know it is going to match the pre-defined number of characters?
pa4080 avatar
cn flag
Hi, @BrunoHenriquePeixoto. I've updated the answer with little joke of your question.
Bruno Henrique Peixoto avatar
in flag
GREAT! The last last wish, genius. May you put a cherry on the top by delimiting the last character (either {max_val} or {max_val+1}). May be some symbol | or #, does not matter.
pa4080 avatar
cn flag
@BrunoHenriquePeixoto, I didn't understand this requirement. Probably you need a second expression as: `sed -r -e 's/(.{8}) /\1\n/g' -e 's/(.)$/\|\1/'` but I'm not sure. Or if you want to modify each new line the laziest way is by a second processing as: `sed -r 's/(.{8}) /\1\n/g' in-file.txt | sed -r 's/(.)$/\|\1/'`
Bruno Henrique Peixoto avatar
in flag
Example: Given your profile name "pa4080" (6 digits), the maximum number of 10 line digits and delimiter pipe '|', and dots for trailing space, the output must be "pa4080....|", without the double quotes.
pa4080 avatar
cn flag
@BrunoHenriquePeixoto, please check the update :)
Bruno Henrique Peixoto avatar
in flag
You should be on stack oveflow hall of fame
Bruno Henrique Peixoto avatar
in flag
The substr snippet seems a little sloppy. Some repeat routine suits better. We did a great job!
Bruno Henrique Peixoto avatar
in flag
There is an issue with the sed we implemented. Take a look at the main body of the post.
pa4080 avatar
cn flag
Hi, @BrunoHenriquePeixoto, if you want to break the word try to remove the final whitespace from the regex: `/(.{8}) /` => `/(.{8})/`. I've added an update to the answer.
Bruno Henrique Peixoto avatar
in flag
I want to give more cotes to the answer! :(! THANKS
Bruno Henrique Peixoto avatar
in flag
votes, hugs, money, calories, credit, stars, whatever counts
pa4080 avatar
cn flag
@BrunoHenriquePeixoto, you can just upvote it by clicking at the arrow up :)
Bruno Henrique Peixoto avatar
in flag
I will upvote every day of my life.
Bruno Henrique Peixoto avatar
in flag
Your reward: https://github.com/brunolnetto/engage
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.