A similar question was asked here but the input and the pattern are different.
The goal is to turn a string with a numbered list into chunks that hold the content without the numbers.
Input:
var string = "3. line A<br>4. line B<br>5. line C<br>6. line3. garbage<br>7. line<br>8. line END";
Regex:
var arr = string.split(/(^\d+\.)|(<br>\d+\.)/).filter(x => x);
Expected output:
array["line A", "line B", "line C", "line3. garbage", "line", "line END"]
But the output in Javascript is instead:
0: "3."
1: " line A"
2: "<br>4."
3: " line B"
4: "<br>5."
5: " line C"
6: "<br>6."
7: " line3. garbage"
8: "<br>7."
9: " line"
10: "<br>8."
11: " line END"
Why is the pattern included in the results? In PHP the pattern is excluded, what is expected, but not in Javascript.
How to remove the pattern from the final result?
If you're looking for pure regex solution I would suggest this regex for .match
:
/(?<=\s|^)[a-z][\w.-]*(?:\s+[\w.-]+)*/gmi
var string = "3. line A<br>4. line B<br>5. line C<br>6. line3. garbage<br>7. line<br>8. line END";
var m = string.match(/(?<=\s|^)[a-z][\w.-]*(?:\s+[\w.-]+)*/gmi);
console.log(m);
Explanation:
(?<=\s|^)
: Assert that previous character is whitespace or line start[a-z]
: Match a letter (ignore case)[\w.-]*
: Match 0 or more of word characters or dot or hyphen(?:\s+[\w.-]+)*
: Match 0 or more words separated by 1+ whitespaces