|
19 | 19 | "It's a Python based scientific computing package targeted at two sets of audiences:\n",
|
20 | 20 | "\n",
|
21 | 21 | "- A replacement for numpy to use the power of GPUs\n",
|
22 |
| - "- a deep learning research platform that provides maximum flexibility and speed" |
| 22 | + "- a deep learning research platform that provides maximum flexibility and speed\n", |
| 23 | + "\n", |
| 24 | + "**If you want to complete the full tutorial, including training a neural network for image classification, you have to install the `torchvision` package.**" |
23 | 25 | ]
|
24 | 26 | },
|
25 | 27 | {
|
|
88 | 90 | "x.size()"
|
89 | 91 | ]
|
90 | 92 | },
|
| 93 | + { |
| 94 | + "cell_type": "markdown", |
| 95 | + "metadata": {}, |
| 96 | + "source": [ |
| 97 | + "*NOTE: `torch.Size` is in fact a tuple, so it supports the same operations*" |
| 98 | + ] |
| 99 | + }, |
91 | 100 | {
|
92 | 101 | "cell_type": "code",
|
93 | 102 | "execution_count": null,
|
|
293 | 302 | "## Autograd: automatic differentiation\n",
|
294 | 303 | "\n",
|
295 | 304 | "The `autograd` package provides automatic differentiation for all operations on Tensors. \n",
|
296 |
| - "It is a define-by-run framework, which means that your backprop is defined by how your code is run. \n", |
| 305 | + "It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different. \n", |
297 | 306 | "\n",
|
298 | 307 | "Let us see this in more simple terms with some examples.\n",
|
299 | 308 | "\n",
|
300 | 309 | "`autograd.Variable` is the central class of the package. \n",
|
301 |
| - "It wraps a Tensor, and afterwards you can run tensor operations on it, and finally call `.backward()`\n", |
| 310 | + "It wraps a Tensor, and supports nearly all of operations defined on it. Once you finish your computation you can call `.backward()` and have all the gradients computed automatically.\n", |
302 | 311 | "\n",
|
303 |
| - "You can access the raw tensor through the `.data` attribute, and after computing the backward pass, a gradient w.r.t. this variable is accumulated into `.grad` attribute.\n", |
| 312 | + "You can access the raw tensor through the `.data` attribute, while the gradient w.r.t. this variable is accumulated into `.grad`.\n", |
304 | 313 | "\n",
|
305 | 314 | "\n",
|
306 | 315 | "\n",
|
307 | 316 | "There's one more class which is very important for autograd implementation - a `Function`. \n",
|
308 | 317 | "\n",
|
309 |
| - "`Variable` and `Function` are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each variable has a `.creator` attribute that references a `Function` that has created the `Variable` (except for Variables created by the user - these have `creator=None`).\n", |
| 318 | + "`Variable` and `Function` are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each variable has a `.creator` attribute that references a `Function` that has created the `Variable` (except for Variables created by the user - their `creator is None`).\n", |
310 | 319 | "\n",
|
311 | 320 | "If you want to compute the derivatives, you can call `.backward()` on a `Variable`. \n",
|
312 |
| - "If `Variable` is a scalar (i.e. it holds a one element tensor), you don't need to specify any arguments to `backward()`, however if it has more elements, you need to specify a `grad_output` argument that is a tensor of matching shape.\n" |
| 321 | + "If `Variable` is a scalar (i.e. it holds a one element data), you don't need to specify any arguments to `backward()`, however if it has more elements, you need to specify a `grad_output` argument that is a tensor of matching shape.\n" |
313 | 322 | ]
|
314 | 323 | },
|
315 | 324 | {
|
|
523 | 532 | "outputs": [],
|
524 | 533 | "source": [
|
525 | 534 | "import torch.nn as nn\n",
|
| 535 | + "import torch.nn.functional as F\n", |
| 536 | + "# Some more python helpers\n", |
| 537 | + "import functools\n", |
| 538 | + "import operator\n", |
526 | 539 | "\n",
|
527 | 540 | "class Net(nn.Container):\n",
|
528 | 541 | " def __init__(self):\n",
|
529 | 542 | " super(Net, self).__init__()\n",
|
530 | 543 | " self.conv1 = nn.Conv2d(1, 6, 5) # 1 input image channel, 6 output channels, 5x5 square convolution kernel\n",
|
531 |
| - " self.pool = nn.MaxPool2d(2,2) # A max-pooling operation that looks at 2x2 windows and finds the max.\n", |
532 | 544 | " self.conv2 = nn.Conv2d(6, 16, 5)\n",
|
533 | 545 | " self.fc1 = nn.Linear(16*5*5, 120) # an affine operation: y = Wx + b\n",
|
534 | 546 | " self.fc2 = nn.Linear(120, 84)\n",
|
535 | 547 | " self.fc3 = nn.Linear(84, 10)\n",
|
536 |
| - " self.relu = nn.ReLU()\n", |
537 | 548 | "\n",
|
538 | 549 | " def forward(self, x):\n",
|
539 |
| - " x = self.pool(self.relu(self.conv1(x)))\n", |
540 |
| - " x = self.pool(self.relu(self.conv2(x)))\n", |
541 |
| - " x = x.view(-1, 16*5*5)\n", |
542 |
| - " x = self.relu(self.fc1(x))\n", |
543 |
| - " x = self.relu(self.fc2(x))\n", |
| 550 | + " x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # Max pooling over a (2, 2) window\n", |
| 551 | + " x = F.max_pool2d(F.relu(self.conv2(x)), 2) # If the size is a square you can only specify a single number\n", |
| 552 | + " x = x.view(-1, self.num_flat_features(x))\n", |
| 553 | + " x = F.relu(self.fc1(x))\n", |
| 554 | + " x = F.relu(self.fc2(x))\n", |
544 | 555 | " x = self.fc3(x)\n",
|
545 | 556 | " return x\n",
|
| 557 | + " \n", |
| 558 | + " def num_flat_features(self, x):\n", |
| 559 | + " return functools.reduce(operator.mul, x.size()[1:])\n", |
546 | 560 | "\n",
|
547 | 561 | "net = Net()\n",
|
548 | 562 | "net"
|
|
610 | 624 | "source": [
|
611 | 625 | "> #### NOTE: `torch.nn` only supports mini-batches\n",
|
612 | 626 | "The entire `torch.nn` package only supports inputs that are a mini-batch of samples, and not a single sample. \n",
|
613 |
| - "For example, `nn.Conv2d` will take in a 4D Tensor of `nSamples x nChannels x Height x Width` \n", |
614 |
| - "*This is done to simplify developer code and eliminate bugs*" |
| 627 | + "For example, `nn.Conv2d` will take in a 4D Tensor of `nSamples x nChannels x Height x Width`.\n", |
| 628 | + "\n", |
| 629 | + "> *If you have a single sample, just use `input.unsqueeze(0)` to add a fake batch dimension.*" |
615 | 630 | ]
|
616 | 631 | },
|
617 | 632 | {
|
618 | 633 | "cell_type": "markdown",
|
619 | 634 | "metadata": {},
|
620 | 635 | "source": [
|
621 |
| - "##### Review of what you learnt so far:\n", |
| 636 | + "### Recap of all the classes you've seen so far:\n", |
| 637 | + "\n", |
| 638 | + "* `torch.Tensor` - A **multi-dimensional array**.\n", |
| 639 | + "* `autograd.Variable` - **Wraps a Tensor and records the history of operations** applied to it. Has the same API as a `Tensor`, with some additions like `backward()`. Also **holds the gradient** w.r.t. the tensor.\n", |
| 640 | + "* `nn.Module` - Neural network module. **Convenient way of encapsulating parameters**, with helpers for moving them to GPU, exporting, loading, etc.\n", |
| 641 | + "* `nn.Container` - `Module` that is a **container for other Modules**.\n", |
| 642 | + "* `nn.Parameter` - A kind of Variable, that is **automatically registered as a parameter when assigned as an attribute to a `Module`**.\n", |
| 643 | + "* `autograd.Function` - Implements **forward and backward definitions of an autograd operation**. Every `Variable` operation, creates at least a single `Function` node, that connects to functions that created a `Variable` and **encodes its history**.\n", |
| 644 | + "\n", |
| 645 | + "##### At this point, we covered:\n", |
622 | 646 | "- Defining a neural network\n",
|
623 | 647 | "- Processing inputs and calling backward.\n",
|
624 | 648 | "\n",
|
|
670 | 694 | " -> loss\n",
|
671 | 695 | "```\n",
|
672 | 696 | "\n",
|
673 |
| - "So, when we call `loss.backward()`, the whole graph is differentiated w.r.t. the loss, and all Variables in the graph will have their `.grad` Tensor accumulated with the gradient.\n", |
| 697 | + "So, when we call `loss.backward()`, the whole graph is differentiated w.r.t. the loss, and all Variables in the graph will have their `.grad` Variable accumulated with the gradient.\n", |
674 | 698 | " "
|
675 | 699 | ]
|
676 | 700 | },
|
|
727 | 751 | "```python\n",
|
728 | 752 | "learning_rate = 0.01\n",
|
729 | 753 | "for f in net.parameters():\n",
|
730 |
| - " f.data.sub_(f.grad * learning_rate)\n", |
| 754 | + " f.data.sub_(f.grad.data * learning_rate)\n", |
731 | 755 | "```\n",
|
732 | 756 | "\n",
|
733 | 757 | "However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.\n",
|
|
822 | 846 | "transform=transforms.Compose([transforms.ToTensor(),\n",
|
823 | 847 | " transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n",
|
824 | 848 | " ])\n",
|
825 |
| - "trainset = torchvision.datasets.CIFAR10(root='/Users/soumith/code/pytorch-vision/test/cifar', \n", |
826 |
| - " train=True, download=True, transform=transform)\n", |
| 849 | + "trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)\n", |
827 | 850 | "trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, \n",
|
828 | 851 | " shuffle=True, num_workers=2)\n",
|
829 | 852 | "\n",
|
830 |
| - "testset = torchvision.datasets.CIFAR10(root='/Users/soumith/code/pytorch-vision/test/cifar', \n", |
831 |
| - " train=False, download=True, transform=transform)\n", |
| 853 | + "testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)\n", |
832 | 854 | "testloader = torch.utils.data.DataLoader(testset, batch_size=4, \n",
|
833 | 855 | " shuffle=False, num_workers=2)\n",
|
834 | 856 | "classes = ('plane', 'car', 'bird', 'cat',\n",
|
|
1163 | 1185 | "metadata": {},
|
1164 | 1186 | "source": [
|
1165 | 1187 | "#### Training on the GPU\n",
|
1166 |
| - "The idea is pretty simple. \n", |
1167 |
| - "Just like how you transfer a Tensor on to the GPU, you transfer the neural net onto the GPU." |
| 1188 | + "Just like how you transfer a Tensor on to the GPU, you transfer the neural net onto the GPU.\n", |
| 1189 | + "This will recursively go over all modules and convert their parameters and buffers to CUDA tensors." |
1168 | 1190 | ]
|
1169 | 1191 | },
|
1170 | 1192 | {
|
|
1207 | 1229 | "- [More tutorials](https://github.com/pytorch/tutorials)\n",
|
1208 | 1230 | "- [Chat with other users on Slack](pytorch.slack.com/messages/beginner/)"
|
1209 | 1231 | ]
|
1210 |
| - }, |
1211 |
| - { |
1212 |
| - "cell_type": "code", |
1213 |
| - "execution_count": null, |
1214 |
| - "metadata": { |
1215 |
| - "collapsed": true |
1216 |
| - }, |
1217 |
| - "outputs": [], |
1218 |
| - "source": [] |
1219 | 1232 | }
|
1220 | 1233 | ],
|
1221 | 1234 | "metadata": {
|
1222 | 1235 | "kernelspec": {
|
1223 |
| - "display_name": "Python 2", |
| 1236 | + "display_name": "Python 3", |
1224 | 1237 | "language": "python",
|
1225 |
| - "name": "python2" |
| 1238 | + "name": "python3" |
1226 | 1239 | },
|
1227 | 1240 | "language_info": {
|
1228 | 1241 | "codemirror_mode": {
|
1229 | 1242 | "name": "ipython",
|
1230 |
| - "version": 2 |
| 1243 | + "version": 3 |
1231 | 1244 | },
|
1232 | 1245 | "file_extension": ".py",
|
1233 | 1246 | "mimetype": "text/x-python",
|
1234 | 1247 | "name": "python",
|
1235 | 1248 | "nbconvert_exporter": "python",
|
1236 |
| - "pygments_lexer": "ipython2", |
1237 |
| - "version": "2.7.12" |
| 1249 | + "pygments_lexer": "ipython3", |
| 1250 | + "version": "3.5.2" |
1238 | 1251 | }
|
1239 | 1252 | },
|
1240 | 1253 | "nbformat": 4,
|
|
0 commit comments