AWS install

From Deep Learning Course Wiki
Jump to: navigation, search

We have a video walkthrough available for setting up an AWS GPU server. It uses the new 'P2' instance type, which is a very good choice for deep learning. The videos assumes that you're starting from scratch - showing you how to set up your AWS account, request the necessary permissions from Amazon, install the AWS command line interface (CLI), and set up the server. If you already have an AWS account, you can skip to the 2 minute mark.

The video references getting setup scripts from platform.fast.ai; however, we are now storing the most up to date versions in the setup folder of the course github repo. We recommend just copying the script you need (if you choose to clone the course repo on your computer, be aware that you won't be running the notebooks from there and you will need to clone it a 2nd time on your cloud GPU instance once you have it set up).

The AWS installation requires that you have access to create GPU instances, and shows how to request this access. To check whether you request has been successful, have a look at the 'g2' and 'p2' lines in this link.

Important note

If you need to use an AWS server prior to Amazon giving you access to GPU instances, you can use a free 't2.large' instance. This is also useful for testing and prototyping, since you can get everything working on a free instance, and then switch to the paid instance when you want to run on a large dataset. To do this, follow the same steps as in the video above for setting up a P2 instance, but download the script https://github.com/fastai/courses/blob/master/setup/setup_t2.sh setup_t2.sh] instead of the script setup_p2.sh shown in the video. (Note that the 'nvidia-smi' command will not work in this case, since you will not have a GPU.)

Once you create an instance

We have also provided a handy script to create aws aliases. You can use the script in the following way to create the aliases, start an instance, and connect:

source aws-alias.sh
aws-start
aws-ssh

To stop your instance (important when you are done, so that you don't get charged), run:

aws-stop


Should you use AWS?

AWS servers are charged for each hour you are using them - generally about $0.90/hour. You need to remember to go to the AWS console and stop your instance when you're not using it, otherwise you'll continue to get charged for it.

If you want a server that you pay a fixed monthly fee (about $200) for, and don't have to worry about starting and stopping, check out the OVH install page.

Other Regions

The AWS p2 is only available in regions us-west-2 (Oregon), us-east-1 (N Virginia), and eu-west-1 (Ireland). This means that you will be remotely using a server in one of these locations, and if you live on the other side of the world, your latency may be high. We have created a separate AMI (Amazon Machine Instance) and install script for each of these regions. The video and course materials use us-west-2 as the default. If you live in Europe, use the setup_p2_ireland.sh install script.

We will create AMIs for additional regions as AWS makes the p2 more available.

Common Problems

  • InstanceLimitExceeded: You are not approved for g2 or p2 instances yet, so you will need to use a t2 instance instead. See the Important note above.
  • No such file or directory error when running aws configure in Cygwin: See awscli in cygwin for solution. This error occurs when awscli has been installed into Window's Anaconda installation of Python, instead of into Cygwin Python
  • InvalidKeyPair.NotFound error is thrown for 2 different errors:
    • setup_p2.sh: line 13: /Users/.ssh/aws-key-fast-ai.pem: Permission denied Try rm ~/.ssh/aws-key-fast-ai.pem and re-run the script (this can occur when you've run the script previously, so there is already a key).
    • setup_p2.sh: /home/user/.ssh/aws-key-fast-ai.pem: No such file or directory You don't have a ~/.ssh directory. To create one and set the correct permissions:
      cd ~
      mkdir .ssh
      chmod 700 ~/.ssh
      
      Then try running the setup script again
  • An error occurred (AddressLimitExceeded) when calling the AllocateAddress operation: The maximum number of addresses has been reached: An elastic IP address is allocated each time you run the setup script. You need to release the unused elastic IPs. In addition to having a limit, Amazon charges for unused elastic IP addresses, but doesn't charge for ones that are being used. Go to the AWS user guide, and read the section on "Releasing an Elastic IP Address"
  • If you continue getting VpcLimitExceeded or AddressLimitExceeded even after deleting, check the top right corner of your console to make sure that you are in the correct region (in our case, Oregon).
    EC2 Management Console region.png
    AWS only shows you instances from one region at a time, and you could be deleting VPCs or ElasticIPs for a different region.
  • When you type nvidia-smi, you get the message Failed to initialize NVML: Driver/library version mismatch. Solution: Restart your instance.
  • In the t2 instance, if you get an error trying to install git that cites liberror~perl, follow the steps in this link

Starting Over

If your AWS has ended up in a weird state, you may want to delete things and start over. See this guide on Starting Over with AWS.

Spot instances

Spot instances are much cheaper versions of the normal on-demand ones. However, they come with a few quirks. See the guide on AWS Spot instances.

For experts only

If you're already familiar with AWS, you may be interested in the following info:

  • We've provided a script that sets up a VPC, subnet, elastic IP, and P2 instance: [1]
  • The script uses an AMI that we have provided that includes everything you need - ami-bc508adc
  • The AMI was set up using the script [2] - have a look at this script if you want to see what has been set up for you