Kaggle is a website located at http://kaggle.com that markets itself as being the place to do data science projects.
It has five primary offerings:
- Competitions - Kaggle hosts many different types of competitions, with a range of problems requiring different levels of skill to solve. These range from featured competitions which are full-scale challenges and often offer prizes up to as high as a million dollars, all the way to getting started competitions which are relatively easy and serve as a good entry into the competition format.
- Datasets - Many people end up on the Kaggle website when using search engines in pursuit of datasets. There is a wealth of datasets from a range of domains which can support a variety of interesting projects. What strengthens this offering is that the datasets include good documentation, a built-in discussion system which can act as a useful knowledge-base, and easily accessible code from community contributors which do interesting things with the data itself.
- Kernels - To conduct your data science experiments you'll often need an environment with an installation of your programming language of choice, complete with some kind of scientific stack. With Kaggle Kernel's, you can get started on the actual data science without having to worry about maintaining your own local installation - It operates entirely through the web browser! This has the benefit of being a cloud environment which facilitates reproducible code and collaboration. Of course, once you have a good understanding of your needs, it will always be better to invest in your own hardware and local environment.
- Discussion - The website is backed by a strong community, and this is evident in the discussion areas offered by the website. You can visit one of the many forum areas, e.g. Questions & Answers which is for requesting technical advice from other data scientists, to a feedback area where you can request new features or discuss existing ones.
- Learn - You may want to pick up a new skill related to data science, or simply re-cap on an existing one. Kaggle offers free and short courses in data science, from learning SQL, to gaining some Data Visualisation knowledge to get useful representations of your data and results.
Let's go through the process of signing up to Kaggle and firing up a Kernel to execute a Hello World program in Python.
Direct your browser to http://kaggle.com and register your account using one of the sign-up approaches, if you agree to their terms. You will need to activate your account via a verification e-mail which should arrive immediately.
In the top navigation, click Kernels. You will be presented with a list of public Kernels submitted and maintained by the community - this may be an interesting source of knowledge for you, to see how other data scientists do things.
In the top right, click the New Kernel button, which will allow you to pick between two types of development environment: Script and Notebook. These behave differently in the way the code is executed and how variables are handled during runtime. Let's pick Notebook for the time-being, but you may wish to explore Script too.
Once you have selected your Kernel type, you will be taken to a new notebook loaded into the Kaggle Kernel environment. Here you can enter your code directly, whether it is in Python or R which can be toggled using a dropdown in the interface. On the right-hand side you will see various widgets which include Session Information, Versioning, Information, Data Sources, Settings, Documentation, and a link to the API.
To implement and run our Hello World program, let's first remove the code in the existing cell. You can click inside the first cell, select all the code, and hit delete/backspace on your keyboard. Now you're ready to write your program. Type in the following code:
and click the Play button to the left of the cell to execute it. You should see the output appear below the cell, "Hello World"!
That's all there is to getting your Hello World program running within a Kaggle Kernel. If you need to brush up on your Python skills, you can also check out one of the free courses mentioned earlier in this article: https://www.kaggle.com/learn/python.
Adding a new cell
To add a new code cell and get it's output, then we need to click on the blue "+" buttons. The one with an arrow up adds a cell above the current cell, and the one with an arrow down adds a cell below the current cell:
In the next article we'll have a look at using a Kaggle Kernel for some machine learning tasks.