Regex Debugging: Catastrophic Backtracking
Regular Expressions are incredibly powerful, but poorly written patterns can result in a ReDoS (Regular Expression Denial of Service) attack. Before pushing a pattern to production, you should test it thoroughly in a Regex Tester.
What is Catastrophic Backtracking?
Most modern regex engines (like those in JavaScript, Python, and Java) use a backtracking approach. If the engine matches a portion of the string but fails later, it "backtracks" to try alternative combinations.
When you use nested quantifiers (like + or *), the number of possible combinations the engine must check grows exponentially based on the string length.
The Classic Vulnerability
Consider the pattern: ^(a+)+$
And the target string: aaaaaaaaaaaaaaaaaaaaaaaaaaaaX
The engine will match all the as, hit the X, and fail. It will then backtrack, trying to match the first group of as as a single token, and the rest as another. For a string of just 30 characters, the engine will attempt millions of permutations before finally realizing the string is invalid, completely locking up the CPU thread.
How to Fix It
- Avoid Nested Quantifiers: Rewrite patterns to avoid
(a+)+or(.*)*. - Use Atomic Groups (if supported): Atomic groups
(?>...)prevent the engine from backtracking once a match is found. Note: JavaScript does not natively support atomic groups. - Be Specific: Instead of using
.*, use negated character classes. For example, instead of<.*>, use<[^>]*>. This vastly reduces the branching logic the engine must perform.