Simple Introduction to Memcached

And Maybe a Real World Use
python
code
data
scientist
analyst
db
cache
Published

November 10, 2022

While taking a class to discover what is the currently most popular NoSQL databases for different use cases, I was informed that there is this technology called Memcached. From the Arch Wiki: > Memcached (pronunciation: mem-cashed, mem-cash-dee) is a general-purpose distributed memory caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source (such as a database or API) must be read.
Source

This was a technology developed at Live Journal to help with - well - caching commonly used values. This is interesting but why bring any of this up? Well, this is because NoSQL Database use a Key - Value pair to look up the matching values and that happens to be how this works as well. However, there are some hard limitations with this and especially related to the size of what is allowed to be cached; Looking around the default looks to be 1MB and you can configure it up to 1GB but that’s it.

I was thinking about how you could apply this to Data Science and it’s pretty limited. For one, the only useful stuff to share across sessions would be either the data you are using or the actual trained model itself. Since the data to be used would assuredly be larger than the configured limit that is not of much use - at least for most interesting problems. And, as there is a glut of tooling for hosting applications online you are very unlikely to need to setup a cache for the model. The tooling online does this really for you with instances and such.

But, I did have an interesting idea about what I could use this for. After you’ve worked on problems, you’re bound to have functions that have been written to solve common problems. Keeping these means finding that code, then copying it into your project and finally using it for what you want. You could build a python package just for yourself but that seems overkill unless it’s a general topic to share with others.

What if these simply functions could simply be a network share library? For example, date formats are something that I tend to need to convert with Python data frames. And, sadly there are no nice date format functions like there is in R; I do miss the R lubridate functions which has functions to convert a date into commonly needed formats: such as ymd(date) would convert the date into a Year-Month-Day format for display. I wrote a few lambda functions in python to do this for me and I would want them accessible while I do data exploration.

So, how would we go about doing this? First we’d need to install memcached for your Operating System; I have already done this but the guide from this Real Python goes over how you would do it for your own system. Mine being Manjaro, it didn’t include it and I had to find it on the Arch Wiki. Make sure to start the service and then we’ll start this off.

from pymemcache.client import base
# init a client; make sure it is already running:
client = base.Client(('localhost', 11211))

Using this is very simple and there really are only two functions to care about: get() and set(). If we wanted to set a value then we tell the client what the key, value pair is.

client.set('turtles', 'Turtles')
client.get('turtles')
b'Turtles'

And, that’s really all there is to using this from Python!

I would like to point out that the results are encoded as byte type. This is not a problem for that text but is a problem as soon as you need to operate on the values.

client.set('someNumber', 42)

iLike = client.get('turtles')
count = client.get('someNumber')

print(f'I had {count} {iLike} but when I got 2 more I had {count +2} {iLike}')
TypeError: can't concat int to bytes

We can solve this with a cast in this case at least.

client.set('someNumber', 42)

iLike = client.get('turtles')
count = client.get('someNumber')

print(f'I had {count.decode()} {iLike.decode()} but when I got 2 more I had {int(count) +2} {iLike.decode()}')
I had 42 Turtles but when I got 2 more I had 44 Turtles

So, can we take a lambda function and put it in memcached?

f = (lambda x: print(f'{x} likes turtles'))
client.set('iLike', f)
client.get('iLike')
b'<function <lambda> at 0x7f9f70829000>'

It accepts it! That’s the good news. The bad news is that since it was converted it no longer works as a function.

f("He"), client.get('iLike')("He")
He likes turtles
TypeError: 'bytes' object is not callable

You cannot just decode it and get what we want.

client.get('iLike').decode()("He")
TypeError: 'str' object is not callable

We can work around this by serializing the object and then deserialize it on the other side. We’ll need to use dill and pickle; you may need to install the dill package since it is not part of the standard library but it is a requirement for this to work.

s = dill.dumps(f)
client.set('cereal', s)
dill.loads(client.get('cereal'))("He")
He likes turtles

Now we can implement the function I want as a Network Shared Library!

from datetime import datetime
aDate = datetime.now()

# My custom function:
ymd = (lambda x: "{y}/{m}/{d}".format(y=x.year, m=x.month, d=x.day ))
s = dill.dumps(ymd)

# Store in 'network library'
client.set('ymd', s)
undo = (lambda key: dill.loads(client.get(key)))

undo('ymd')(aDate)
'2022/11/10'

There you go! If you have a spare Rasberry Pi or something then you too can have a small library of custom functions shareable over your home network to use!