Test with big data sets

Every so often I re-learn this lesson: Make sure you test your code with the same amount of data that your users will use.

Developing with small data sets is fine, and most of the time that is what you want to do as you work out the kinks of the code. But when it is time to ship the code, you must test with a large data set.

For rent my resume.com I’ve been testing one portion of it with 4 different size documents. This is a fluke however, I chose the documents based on the contents not the size. Today I made a simple change that turned out to have some unforeseen ripple effects.

When I write unit tests, they seem to fall into two categories: Tests where I just do a simple check on the size of a return (i.e. did I get 21 items in the list?), and tests where I check the contents of the resulting data.

As a side note, you should really do both kinds of tests, not one or the other. Simply checking to see that the right number of things was returned is no substitute for making sure that the correct data was actually returned!

Today I was re-running my pyunit tests and one of the four failed because the size of the returned list wasn’t what was expected. By coincidence this happened to be the biggest data set I was testing with. If I had not had this test, I would have thought everything was ok, but it truth my modified function is returning questionable data!

There are of course other benefits of using larger data sets, mainly seeing how your code works under stress. What works fine for 10 items might not be so great with 1000 items. Testing with a large data set at the end will help catch these problems and will also help you adhere to Knuths’ advice to not optimize code prematurely.

So, now I’m off to learn how to trace through python code to find out how such a seemingly simple “fix” to my code could so subtly break it…

More on breaking functional fixedness

Here’s a fun little video that poses an interesting question. If you had $5 and 2 hours, what could you do to raise the most money?

Start up studies: A pop quiz

Aside from various illegal schemes, the people in the story came up with some fairly inventive ideas. I hate to use the phrase “out-of-the-box” but that really sums up the thinking approach the participants used.

Having said that, I thought the restaurant idea was better than the “winning” idea. Why? It provided a service of tangible value to a larger group of people, and is something that is probably reproducible (i.e. you could probably do that over and over).

And as the presenter pointed out, sometimes we put constraints on a problem that are totally of our own making. Breaking free from those can lead to some really interesting (or profitable in this case) solutions!

See also: Overcoming functional fixedness

Rate-my-resume.com is now live

In my last post, I put up a link to a little project I’ve been working on. I finally got around to giving it a proper name. (re)Introducing:

Rate-my-resume.com

Now if you are wondering if your resume is a good match for a particular job posting, you can use my site to find out! At the moment I’m giving the score in terms of 0 (being a total non-match) to 100 (being the absolute perfect match). In this economy, the more your resume reflects the skills listed in a particular job, the more likely your resume will be looked at seriously.

If you run your resume through and it gives you a low score, look at your resume and the job posting and try and figure out what keywords are in the job posting that are not in your resume. Then, assuming you have the necessary experience, put those keywords into your resume! Be sure to add them in a way that makes sense to a person, after all humans (especially HR people) don’t like to read fragments and words peppered into someone’s resume.

Try out the site with your resume and see how you rank!

p.s. Python rocks!

Matching resumes to jobs

Have you ever looked at a job posting and tried to figure out if you are a good match for that job?

I’ve written a Google App Engine application to try and help people figure that out. Paste in a copy of your resume and a copy of the job description, and it will try and figure out how well of a match you would be for that job.

Check it out: http://app.ironboundsoftware.com

I’m really impressed with the Google App Engine environment (go Python!) and had fun writing this. Hopefully this will help people out in their job hunt. Times are tough, and hopefully this little application will help someone get into the perfect job for them.

Try it out and let me know what you think!

Java Set

Having a list of items is pretty useful. Sometimes its really useful to have no duplicates in that list. Java helps you to do this via the Set interface and its various implementations.

The Set interface basically defines a class that will hold a set of objects, and in the process not allow duplicates. (If you are ok with having duplicate items look into using something like ArrayList.) Like other things in the Collection family, Sets have an iterator() method that will provide you with an iterator so you can access the items being held by the set.

One example of a Set implementation (and probably one of the most common implementations of Set) is the HashSet. HashSet simply stores objects passed to it via the add() method according to its own internal heuristic.

If you need to control the order that items are read from the Set (i.e. the objects should come out in alphabetical order) then the TreeSet class is the weapon of choice. TreeSet uses a Comparator that you can set (optionally) that will allow the set to order the objects as they are added. This is enormously useful when your code receives some data from some source (a database, a web source, etc.) and you want to make sure that the data is sorted and is unique.

Sets pop up in several places in Java, one of the most notable is in Maps. Since a map is a key-value data structure, the keys should be unique. As a result, if you call keySet() on a Map you will get a collection of the keys for that map, and it will be a Set object.

Hero of the week: Stewart Butterfield

Not only did he help create a kick-ass useful website (flickr), but he knows how to respond to mis-directed emails:

http://valleywag.gawker.com/5288759/flickr-founder-calls-nuked-user-a-dick

Lightweight TDD

The more I used Unit Testing (particularly JUnit) the more I like it. It is a great way of tracking progress in your code, and more importantly making sure you haven’t broken something in the process.

I’m not a huge fan of traditional Test Driven Design (TDD) though. My biggest complaint is writing a battery of tests before writing the actual code feels like putting the cart before the horse. I’ve been experimenting with it, and most of the time I’ve foundĀ  that if my code is structured “correctly” (i.e. a well defined API/interfaces, dependency injection, etc.) TDD will work pretty well.

However I have found that I like to use TDD one test case at a time. Basically I will get the basic framework of my class(es) together, and then as I refine the capabilities of the class, add in a few tests to catch one or two conditions. Then I work on my code to make sure that it is performing as expected. Once everything is going well the tests pass and it is time to move on to theĀ  next part of the class.

The big advantage for me in this is that I only have to worry about getting a small number of tests to pass instead of all (or a large number) of them. By breaking the tasks down into smaller pieces I find that I spend less time “dreaming” about how my code would work (and writing tests that don’t accomplish much or have to be re-written as reality sinks in). Instead I’m able to focus on a single problem and solving it.

This is a similar approach as to what is advocated in the unit testing community: When a bug is found, create a test that exposes it. Then fix the code and the test should prove that the underlying problem is gone. I really like that approach and I have begun doing that as often as I can. So far it has really paid off in terms of making sure my code doesn’t have bad case of “but-I-already-fixed-that!” type of bugs.

Comma Separated Values

Question: How much does python rock?

Answer: More and more every day.

Today I was writing (for what seems like the millionth time) a little script to read CSV (Comma Separated Values) file. After running into the same issues over and over (picking a delimiter, escaping delimiters, etc.) I decided my sanity is worth the 30 seconds it would take to see if someone else has already written a CSV library. It turns out python has one built in. Since 2.3. D’oh.

import csv

lines = csv.reader(’myfile.csv’)

That’s all that’s needed to read in a csv file and have it properly handle the delimiters, even when they are inside of escaped text (i.e. something like “$3,000″ will be read as $3000 instead of $3 and 000).

Python rocks again.

Lasik: Still loving it 2 years later

About two years ago I finally decided I was tired of my scratched up glasses and that I would get Lasik eye surgery. I was talking about this with some friends recently and thought I should do a post to talk about how things are now that I’m a few years away from it. I did a lot of research on Lasik before my operation, and a lot of what I found talked about the procedure itself and the immediate time after. I thought I would write this blog post and talk about how things are a while after the surgery.

In short: Pretty good!

As a computer programmer I’m rather attached to my eye sight. I was concerned that staring at screens all day might be in for a rough ride because you will have “dry eyes” after the surgery. Some days were tough, but using the preservative-free eye drops (like the doctor suggests) really helped with this. For me the dry eye problem went away pretty quickly, I would say within a month or two I was only having to put the drops in a few times a week as opposed to a few times a day.

The doctors said my eyes were in good shape and it really showed in the recovery phase. The only thing that seems to heal slowly for me was my night vision.

Loss of night vision is a common side effect of Lasik. My night vision has gotten better since the surgery, but it took almost a year, and it still doesn’t feel quite the same as it did pre-surgery. The flip side of this is that in low-light conditions, I feel like I can make out some details a little bit sharper than I could before. I know that sounds odd, but it seems like as long as there’s some good like like a quarter-moon or so, I feel like I can see better than I could with my glasses in that same condition.

So, all in all I’m pretty happy with the way things turned out. Some people do suffer from negative side effects for longer or more intensely than others, but I think that as long as you follow your doctors instructions:

  • Moisturize (your doctor will tell you how, usually with preservative free eye drops)
  • Don’t rub your eyes! :)
  • Use the medicines and cremes as directed by the doctor.
  • Take your vitamins, eat your Wheaties, and get lots of rest.

So that is pretty much my follow up report. I’m glad I did it and I encourage others to talk to their eye doctor if they are thinking about it. I had my surgery at LasikPlus here in Atlanta Ga., and the staff was great and very helpful. Check them out if you are thinking about it!

How to kill momentum instantly

  1. Get the itch to finish a stale project
  2. Carve out some time to work on stale project
  3. Upgrade your software stack to make sure you are current*
  4. Spend hours figuring out something simple isn’t working
  5. Curse, debug, repeat
  6. Curse more when you realize that step 3 was where the train went off the rails

Yes, I made the mistake to upgrading the Google App Engine Launcher only to discover that for some reason it doesn’t play nice with the Django Helper project (or apparently the latest version of Django).

*sigh*

I know, I can go and continue development without those tools, but I was really looking forward to playing with the AppEngine (using a Django project).

Some days you are the pigeon, some days you are the statue.