Testing javascript with python

I was recently tasked with adding Mailcheck.js to some of our production pages and I want to describe a bit of the process I went through because I did some things a bit differently and had some fun along the way.

Lets start with a PSA - do not simply drop Mailcheck onto your website as is! In my opinion / findings the default algorithm is way too greedy - aka it will mostly suggest all emails should be ____@gmail.com. It is worth taking the time to tweak mailcheck for your particular userbase, on one wants to see a correction for their proper email address!

The first thing I did was dumped a ton of emails from our database to create a dataset to work with. I could have used Node to write some scripts to test out the Mailcheck behaviour but Python is just so much more convient for doing numerical analysis. Plus it’s what our data team uses so I could leverage some of their knowledge and code. So now for the fun part - I ended up using PyV8 (a python wrapper for calling out to Google’s V8 javascript engine). With this setup I was able to slice and dice through our production emails using python and pandas calling the exact javascript mailcheck algorithm and collecting my results. After tweaking the algorithm I could take the settings and new js code and put it in production.

Check out this wacky franken script that got the job done (pandas not included):

import PyV8

def init_mailcheck():
  global ctxt
  ctxt = PyV8.JSContext()
  ctxt.enter()
  ctxt.eval(open("mailcheck.js").read())


def run_sift3Distance(s1,s2):
  script = "Mailcheck.mailcheck.sift3Distance('%s','%s')" %(s1,s2)
  return ctxt.eval(script)


def run_splitEmail(email):
  script = "Mailcheck.mailcheck.splitEmail('%s')" %(email)
  return ctxt.eval(script)


def run_mailcheck(email):
  script = """ Mailcheck.mailcheck.run({
         email: "%s",
       })
   """ % (email)
  result =  ctxt.eval(script)
  if result:
    try:
      result = result.address + '@' + result.domain
    except(AttributeError):
       pass

  return result

if __name__=="__main__":
  init_mailcheck()
  print run_mailcheck("kevinhughes27@gmil.com")
  # >>> @kevinhughes27@gmail.com

A python library for Incremental PCA (pyIPCA)

I extracted some of the useful code and nifty examples from the background of my Thesis as a python library for your enjoyment. PCA or Principal Component Analysis is a pretty common data analysis technique, incremental PCA lets you perform the same type of analysis but uses the input data one sample at a time rather than all at once.

The code fully conforms to the scikit-learn api and you should be able to easily use it anywhere you are currently using one of the sklearn.decomposition classes. In fact this library is sort of on the waiting list for sklearn.

IPCA on 2D point cloud shaped like an ellipse

Check it out if you’re interested and holla at sklearn if you want this feature! github.com/kevinhughes27/pyIPCA

Weekend Project - Install SteamOS

This weekend I championed my way through installing SteamOS (the Debian distro by Valve that will be the installed on the upcoming Steam boxes). I had to do some pretty crazy stuff to get it working including dropping out of the automated install to manually inject grub-pc and then compiling the drivers for my wireless card. All in all it was a triumph!

steam-os-2

and then finally:

steam-os-1

This was an early beta release but they made some weird choices - like handicapping the basic Debian installer by fully automating it and only supporting efi. I was actually a bit disappointed when I finally finished because the end result is not really different from simply installing Ubuntu and setting Steam big picture mode to auto start, I am not sure what exactly I was expecting though. SteamOS is much more for OEMs than the DIY crowd at the moment but I can see that Valve is super invested in Linux at this point with a ton of additions to their own repositories. Good things are going to come of this I can feel it!

* Edit *

Almost all the hacking I had to do has been wrapped in Ye Olde SteamOSe 

* Edit 2 *

Wow Valve released an updated version of the beta addressing a lot of the problems Ye Olde SteamOSe addressed and they allegedly collaborated to get this done! This is why Valve is going to win the next generation - working with the community. Full story here