Matching Newlines or Another Character in Ruby Regular Expressions
While working on a recent project, I ran into an interesting problem: how can I make a regular expression match either the end-of-line, or another character?
The context is this: I have a list of things that I want to get the values of, which are separated by a certain character, but is not terminated by the same character. For example, consider the string “a thing I want to match; another thing I want to match; a third thing I want to match”. How can we use .scan to get an array [["A thing I want to match"], ["another thing I want to match"], ["a third thing I want to match"]]
?
If you’re still learning regular expressions or not familiar with the subtleties of Ruby regular expressions, you might try the following, with the intention that [;$]
match either ;
or the end of line, usually represented by $
.
1
|
|
However, within a character class, $ always matches the character $ - see the first sentence of the first answer on this stack overflow question. Which means that the above regex will only match the first two targeted strings.
How to solve this? As far as I can tell, there isn’t an easy way to do it with regular expressions - the easiest way around this is to just make use of another Ruby string method, .split. Just call string.split(";")
, then use a regex on each of the split strings to do more filtering if need be.
Happy coding!