As I am getting into some of the more mathy parts of my thesis I’ve ended up doing a lot of work porting MATLAB code to python and I wanted to share my experience and wisdom for others who might try this as well.
First of all why am I doing this? Well I prefer working with open tools and as a friend of mine recently pointed out if you are using proprietary software to “prove” something it can’t really be called valid because you can’t confirm what the software is really doing. A good point indeed. Now I know there is Octave (which if you are a MATLAB user I highly recommend you check out it’s 99.9% compatible and of course totally free and open source) which I’ve used quite a bit, in fact that’s what I was using to run the MATLAB code while porting. But I prefer having my code in python because I find MATLAB code harder to read and the architecture of code in MATLAB just isn’t quite to my liking. Also it’s much handier to have everything in python for when I bring the application back to computer vision because I’ll have OpenCV to work with. And finally as a personal preference I also like working with my “samples as rows” in my data matrices probably because of the computer vision I do.
So if you’re like me and you’re porting MATLAB to NumPy and you’ve found your way here then lets review the basic task at hand. We’re actually after 3 main goals here and it’s worth pausing and considering them separately:
1) Port from MATLAB to NumPy
2) Switch from “by cols” mentality to “by rows” mentality
3) Clean up the architecture and probably restructure as a class
Step 1 - Having done this a few times the first thing I recommend is this: write the code line for line the same only using NumPy, yeah I know you want to get in there and make it nicely formatted as a class or other but trust me do a one to one conversion first and leave it as “by cols”. You can use def myFunc(*args) in python to mimic MATLAB’s nargin functionality. There is a great document on the NumPy website that highlights the main differences and what to use where, notably how NumPy handles by element operations and dot products is different so read up! I should also mention that for step 1 (we’re leaving it as by col for now) if you are doing any reshaping of the data using built-in functions NumPy defaults to row major while MATLAB uses column major. You can force NumPy to prioritize columns in reshape by using the “F” option in either flatten or reshape.
Afterwards you’ll want to test your output to make sure it gives the same result. If you end by having to work backwards and determine where the python code deviates numerically here are a few simple things that can help:
Python/NumPy |
MATLAB/Octave |
savetxt('data.csv',X,delimiter=',') |
csvwrite('data.csv',X) |
X=loadtxt('data.csv',delimiter=',') |
X=csvread('data.csv') |
print X |
disp(X) |
raw_input() |
pause() |
Also work with the same input! That should be obvious. Then work your way through using print statements, saving data out and waiting for user input if required. NumPy and MATLAB print slightly differently but it is pretty easy to compare across. (I used to have a formatted print statement for one of them that matched the other but I lost it damn, shouldn’t be too tough to redo if you need it though, there also may be something online already I never checked).
Step 2 - switch from “by cols” to “by rows” now by this I mean that in a data matrix containing samples MATLAB users tend to think of each column as a sample while other people tend to thing of each row as a sample. It’s really just a different way of arranging the data but it can be tricky to revert. When you are switching col to row paradigm you’ll need to switch some of the equations around - get out a pen and paper and write out a simple case of the math for both row wise and col wise and check it through. It took me a very long time to find a simple ordering bug that resulted in a square matrix rather than a scalar. gah!
Step 3 - Now finally go nuts and format it nicely! make it a class encapsulate it whatever floats your boat! I’ve also found that a lot of MATLAB code was clearly written to test if something works and thus when you actually go to use it it’s quite awkward so I’ve found the usability goes up a lot with python.
And that’s it I hope this helps someone out there. I mainly just really wanted to write this up because I just successfully finished a tricky port and was really stoked about it. I’ve got one more to do so maybe I’ll refine this post a bit shortly!
tl;dr NumPy »> MATLAB