Does The Faker Package Pass Luhn?

How good is the fake generation?

November 16, 2022

While taking a class about anonymizing data for public release, Rebeca Gonzalez discussed using the Faker python package to substitute real values for fake values to protect users. One of the functions listed is to generate fake credit card numbers for data.If you’re not aware, there is a patented algorithm which must pass for a generated Credit Card number written by Hans Peter Luhn from IBM. The Algorithm is not complicated and we’ll work through it now.

Luhn’s Algorithm: 1. The algorithm first should split the number into two parts: the digits and the check value. 2. For the digits and moving left to right: 1. If Odd position then multiply by 2 and return that number; if bigger than 10 return a modulo of it. 2. If Even then simply return that number. 3. Sum the digits after the above computation. 4. Take the modulo 10 of the sum and subtract it from 10. 5. Compare the last digit against the excluded digit and they should match.

We will skip the first step since it will be an easy split for now. We’ll want a function which deals with step 2.

# Apply to each Digit
def luhnSingleDigit( digit, even = True):
    if even:
       # if even, modify and return
        n =  int(digit) * 2
        if n == 10: return 0
        if n >= 10: return 1 + n % 10
        return n
        # If position is odd, return itself
        return int(digit)

luhnSingleDigit(8), luhnSingleDigit(8, even = False)
(7, 8)

Now we’ll need to iterate through each digits and so we’ll need a fake credit card number. I will reverse the order of the digits since it will be easier than trying to count backwards when we iterate through the digits.

from faker import Faker
fake = Faker()
cc = fake.credit_card_number()

# reverse the digits
digits, parity = cc[:-1:][::-1], cc[-1]
digits, parity 
('56086526834343', '5')

We will iterate through the digits via list comprehension in python - which is how we’re going to do next. You can have an if-else included in a list comprehension which we’ll need to tell the code which flagged version to use.

# Test the filter and make sure it works
[0 if i % 2 == 1 else x for i, x in enumerate(range(20))]
[0, 0, 2, 0, 4, 0, 6, 0, 8, 0, 10, 0, 12, 0, 14, 0, 16, 0, 18, 0]

Now we’ll throw the function in there to test it.

[ luhnSingleDigit(x, even = False) if i % 2 == 1 else luhnSingleDigit(x) for i, x in enumerate(digits) ]
[0, 6, 0, 8, 3, 5, 4, 6, 7, 3, 8, 3, 8, 3]

Next we have to sum them all together and take the last digit.

theSum = sum([ luhnSingleDigit(x, even = False) if i % 2 == 1 else luhnSingleDigit(x) for i, x in enumerate(digits) ])

Lastly, we do the modulo 10, subtract 10 from the computed sum and compare it to the parity digit

theSum, 10 - theSum % 10, 10 - theSum % 10 == parity
(64, 6, False)

It was not valid credit card number!
Let’s get those steps into a function and time it to see how long it takes. And, let’s get about 100 of these numbers to check just in case it was a fluke.

def confirmLuhn(number):
    digits, parity = cc[:-1:][::-1], cc[-1]
    theSum = sum([
        luhnSingleDigit(x, even = False) if i % 2 == 1
        else luhnSingleDigit(x)
        for i, x in enumerate(digits) ]

    return 10 - theSum % 10 == parity

One thing I noticed from the output in the lectures was that there were different length fake card numbers; is that the case here too?

import numpy as np
numbers = [ fake.credit_card_number() for _ in range(100)]
np.mean([len(n) for n in numbers])

That’s correct. Some of them are simply too short to be a valid credit card number. I’m not sure why this is but that’s beyond the scope of this post so we will simply filter them and move on.

filteredNumbers = list(filter( lambda x: len(x) == 16, numbers))

Ouch. That’s a lot misses. Ok, on to the main event!

%timeit sum([confirmLuhn(c) for c in filteredNumbers])
179 µs ± 820 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
sum([confirmLuhn(c) for c in filteredNumbers])

So, it looks like there were no valid credit card numbers in the list but at least it was fast. Good to know that the numbers being generated are not constricted to valid numbers.