Using the regular expression backtracking engine for prime number magic
I just saw the most impressive, intelligent use of the regular expression backtracking engine and simply have to re-post it in order to give my own decryption of the infinitely simple expression. In short, it is a regular expression, that only matches non-prime sequences of 1′s. And it goes like this:
/^1?$|^(11+?)\1+$/
I will try to explain it in a simple way and then do some benchmarking on this simple expression, a slightly optimized version of the expression as well as compare it to more optimal algorithms.
I can explain it very simply. The first part of the expression check the special cases of zero or one 1′s. The second part checks whether the number of 1′s is a multiple of another substring of 1′s. It does that by un-greedily matching the shortest possible string of 1′s larger than one (the shortest possible is initially of course two 1′s) as the first group and then checks whether the remaining number of 1′s is evenly divisible by this by a factor larger that zero – that is, if there were six 1′s, the remaining number of 1′s after having matched the first two would be four, which is evenly divisible by two by a factor larger than zero. If we tried to match nine 1′s, the first attempt at matching two 1′s in the first group would fail, as the remaining seven 1′s is not evenly divisible by this. But the regular expression backtracking engine would then backtrack to matching three 1′s in the first group, and then the remaining six 1′s would be evenly divisible by this number, and the pattern matches – thus 9 is not prime as expected.
The penalty? Well, try to match eleven 1′s, the first group would initially be set to two 1′s, and the remaining expression fails, the initial group would then be set to three 1′s, and the remaining expression would still fail, then the initial expression would be set to four 1′s (which is just stupid prime-checking-wise – 2 was not a factor, then how could 4 be?) and it would still fail – and so on until the engine would match all eleven 1′s as the first group and check if the remaining zero 1′s is evenly divisible (by a factor larger than zero of course) which it isn’t and only then would the expression fail. It is thus highly expensive, as there is of course no need to check if any number higher than the square root of the number to check divides the number evenly, if all numbers below the square root doesn’t.
To do some speed-testing, let’s try to check the number 9017 (which is the product of the somewhat large primes 71 and 127 and thus not prime) and the number 9029 (which is next prime after 9017). And for the fun of it, let’s test it in JavaScript right here in the browser:
function check1(number) { var t1 = new Date().getTime(); var str = ""; for (var i = 0; i<number; i++) str += "1"; var t2 = new Date().getTime(); var result = /^1?$|^(11+?)\1+$/.test(str); var t3 = new Date().getTime(); return "Time of generation: "+(t2-t1)+" ms, time of checking: "+(t3-t2)+" ms, result: "+number+" is "+(result ? "not " : "")+"prime"; } check1(9017); // "Time of generation: 2 ms, time of checking: 6 ms, result: 9017 is not prime" check1(9029); // "Time of generation: 2 ms, time of checking: 64 ms, result: 9029 is prime"
64 ms for failing to validate 9029 as not prime is way too long, so let’s simply change “(11+?)” to “(1{2,x}?)” with x being the square root of the given number truncated to an integer:
function check2(number) { var t1 = new Date().getTime(); var str = ""; for (var i = 0; i<number; i++) str += "1"; var sqrt = Math.floor(Math.sqrt(number)); var re = new RegExp("^1?$|^(1{2,"+sqrt+"}?)\\1+$",""); var t2 = new Date().getTime(); var result = re.test(str); var t3 = new Date().getTime(); return "Time of generation: "+(t2-t1)+" ms, time of checking: "+(t3-t2)+" ms, result: "+number+" is "+(result ? "not " : "")+"prime"; } check2(9017); // "Time of generation: 2 ms, time of checking: 11 ms, result: 9017 is not prime" check2(9029); // "Time of generation: 2 ms, time of checking: 12 ms, result: 9029 is prime"
This actually increased the time to check for non-primes (as “{a,b}?” is more expensive than simply “+?” as the recursion needs to keep track of the limits) but greatly reduced the time it took to check for primes, as it only checks up until 95 1′s (95 = floor(sqrt(9029))) and not all the way up to 9029 1′s.
But just for the fun of it, let’s compare with more common check’s – like Sieve of Eratosthenes:
function check3(number) { var t1 = new Date().getTime(); var nonprimes = []; var sqrt = Math.floor(Math.sqrt(number)); var pointer = 2; while (pointer <= sqrt && nonprimes[number] !== true) { for (var i = pointer; i <= number; i += pointer) nonprimes[i] = true; while (++pointer <= sqrt && nonprimes[pointer] === true); } var result = nonprimes[number] == true; var t2 = new Date().getTime(); return "Time of checking: "+(t2-t1)+" ms, result: "+number+" is "+(result ? "not " : "")+"prime"; } check3(9017); // "Time of checking: 1 ms, result: 9017 is not prime" check3(9029); // "Time of checking: 1 ms, result: 9029 is prime"
It is pretty clear, that the regular expression does not in any way compare to an optimal algorithm – but the original regular expression referenced in the top is none the less a very intelligent hack on the regular expression engine mechanics.
No related posts.
Category: JavaScript, Regular Expressions, Trends Comment »
