There are some great computer vision kaggle competitions that you can use to test and develop your skills. In general, you'll find competitions easiest for exercising your lesson 1 skills where:
- The images are full color, and of similar size to imagenet (224x224), since if they are very different it will be harder to make fine-tuning from imagenet work
- The task is a classification problem (i.e. deciding on which class each image belongs to), since that is what we've learnt to do so far, and is directly supported by our vgg16 object
Note that to download data from kaggle to your server, and to upload submissions to kaggle, it's easiest to use the Kaggle CLI. Although most of these competitions are now over, you can still submit to them to see how you would have gone. In general, if you can get in the top 25%, you're doing very well; and in the top 50% is a reasonable baseline to aim for at first.
- Dogs vs. Cats Redux: Kernels Edition - This is the new version of the competition that we've been looking at. See if you can create an end-to-end process yourself that gets in the top 50% of the competition
- State Farm Distracted Driver Detection - This competition is very similar in structure to Dogs vs Cats, so you should find it not too hard to get a reasonable result. Although the competition is over, you can still submit to it to find out where you would have placed
- Right Whale Recognition - Getting a basic entry in shouldn't be too hard, but getting a good result will require thoughtful preprocessing
- Galaxy Zoo - The Galaxy Challenge - Classify the morphologies of distant galaxies in our Universe. A good choice when you're ready to push yourself to the next level, but still relatively easy to handle
- Painter by Numbers - This is a great choice for folks who want to push themselves a little further. We would strongly suggest using the pre-processed dataset provided in this kaggle forum thread. The basic goal of assigning a painter to each painting shouldn't be too hard, but then you need to go a step further and use this to decide which paintings are by the same painter, which will require some thoughtful model design
- Diabetic Retinopathy Detection - You'll need to handle larger images than we've used so far to get a good result in this competition
- National Data Science Bowl - build an algorithm for measuring and monitoring plankton populations. Requires working with greyscale rather than full color images; other than that, it is similar to our work so far
- Yelp Restaurant Photo Classification - Given restaurant images, predict 9 different attributes, such as "good for lunch", "outdoor seating", "takes reservations", and "has alcohol." There are 6 GB each of training and test data
- Draper Satellite Image Chronology
- Second Annual Data Science Bowl - create an algorithm to automatically measure end-systolic and end-diastolic volumes in cardiac MRIs. Requires working with 3d data. The basic approaches we've learnt already will work, but they need a lot of tweaking to handle this task
- Grand Challenge for Biomedical Image Analysis has a number of medical image datasets, including the Kaggle Ultrasound Nerve Segmentation which has 1 GB each of training and test data. We haven't learnt how to do segmentation yet, so this competition is best for people who are prepared to do some self-study beyond our curriculum so far
Here are other sources of datasets, which include some image datasets:
- Awesome Deep Learning Datasets
- AWS public data sets (Includes SpaceNet data)
- Apparel classification with Style
- Quora Question/Answer Pairs
For those with more deep learning background, you may be interested in the following blog posts (related to the above datasets and competitions):
- Interview with the 1st place winner in the Yelp Restaurant Photo competition.
- How to (almost) win Kaggle Competitions Blog post with 10 tips from a 5-time (almost) winner.