Fix typos regarding SGD, and make links specific #2

jayanthkoushik · 2017-01-12T03:28:29Z

No description provided.

soumith · 2017-01-12T03:29:36Z

awesome! thanks jayanth

* Create torchserve_with_ipex_2.rst * create torchserve-ipex-images-2 * add torchserve-ipex-images-2 png * add png * update wording * update wording * update wording * update wording * add tutorial to matrix and left nav * Delete placeholder * link github * tutorial -> blog Co-authored-by: Svetlana Karslioglu <[email protected]> * tutorial -> blog Co-authored-by: Svetlana Karslioglu <[email protected]> * grammar fix Co-authored-by: Svetlana Karslioglu <[email protected]> * grammar fix Co-authored-by: Svetlana Karslioglu <[email protected]> * grammar fix Co-authored-by: Svetlana Karslioglu <[email protected]> * blog -> tutorial Co-authored-by: Svetlana Karslioglu <[email protected]> * un-tuned -> untuned, submetircs -> sub-metrics * blog -> tutorial, we'll -> we will * with torch.autograd.profiler.emit_itt() * grammar fix Co-authored-by: Svetlana Karslioglu <[email protected]> * grammar fix Co-authored-by: Svetlana Karslioglu <[email protected]> * we'll -> we will Co-authored-by: Svetlana Karslioglu <[email protected]> * we'll -> we will Co-authored-by: Svetlana Karslioglu <[email protected]> * we'll -> we will Co-authored-by: Svetlana Karslioglu <[email protected]> * we've -> we have Co-authored-by: Svetlana Karslioglu <[email protected]> * we'll -> we will Co-authored-by: Svetlana Karslioglu <[email protected]> * we'll -> we will Co-authored-by: Svetlana Karslioglu <[email protected]> * 2 -> two Co-authored-by: Svetlana Karslioglu <[email protected]> * we'll -> we will Co-authored-by: Svetlana Karslioglu <[email protected]> * e.g., -> for example, refer to -> see Co-authored-by: Svetlana Karslioglu <[email protected]> * etc -> and more * we'll -> we will Co-authored-by: Svetlana Karslioglu <[email protected]> * we'll -> we will Co-authored-by: Svetlana Karslioglu <[email protected]> * we'll we will Co-authored-by: Svetlana Karslioglu <[email protected]> * un-tuned -> untuned Co-authored-by: Svetlana Karslioglu <[email protected]> * take-aways -> conclusion Co-authored-by: Svetlana Karslioglu <[email protected]> * blog -> tutorial, we've -> we have Co-authored-by: Svetlana Karslioglu <[email protected]> * fix linking Co-authored-by: Svetlana Karslioglu <[email protected]> * fix png sizes * my lin <url>__ * (1) add content under each heading (2) fix heading syntax * update * blog -> tutorial Co-authored-by: Svetlana Karslioglu <[email protected]>

Should prevent crashes during NCCL initialization. If `data_parallel_tutorial.py` is executed without this option it would segfault in `ncclShmOpen` while executing `nn.DataParallel(model)` For posterity: ``` % nvidia-smi Fri Jun 16 20:46:45 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla M60 Off | 00000000:00:1B.0 Off | 0 | | N/A 41C P0 37W / 150W | 752MiB / 7680MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla M60 Off | 00000000:00:1C.0 Off | 0 | | N/A 36C P0 38W / 150W | 418MiB / 7680MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla M60 Off | 00000000:00:1D.0 Off | 0 | | N/A 41C P0 38W / 150W | 418MiB / 7680MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla M60 Off | 00000000:00:1E.0 Off | 0 | | N/A 35C P0 38W / 150W | 418MiB / 7680MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+ % NCCL_DEBUG=INFO python data_parallel_tutorial.py Let's use 4 GPUs! c825878acf65:32373:32373 [0] NCCL INFO cudaDriverVersion 12010 c825878acf65:32373:32373 [0] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0> c825878acf65:32373:32373 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation NCCL version 2.14.3+cuda11.7 c825878acf65:32373:32443 [0] NCCL INFO NET/IB : No device found. c825878acf65:32373:32443 [0] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0> c825878acf65:32373:32443 [0] NCCL INFO Using network Socket c825878acf65:32373:32445 [2] NCCL INFO Using network Socket c825878acf65:32373:32446 [3] NCCL INFO Using network Socket c825878acf65:32373:32444 [1] NCCL INFO Using network Socket c825878acf65:32373:32446 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2 c825878acf65:32373:32445 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 c825878acf65:32373:32443 [0] NCCL INFO Channel 00/02 : 0 1 2 3 c825878acf65:32373:32444 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 c825878acf65:32373:32443 [0] NCCL INFO Channel 01/02 : 0 1 2 3 c825878acf65:32373:32443 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 Bus error (core dumped) (lldb) bt * thread #1, name = 'python', stop reason = signal SIGBUS * frame #0: 0x00007effcd6b0ded libc.so.6`__memset_avx2_erms at memset-vec-unaligned-erms.S:145 frame #1: 0x00007eff3985e425 libnccl.so.2`ncclShmOpen(char*, int, void**, void**, int) at shmutils.cc:52 frame #2: 0x00007eff3985e377 libnccl.so.2`ncclShmOpen(shmPath="/dev/shm/nccl-7dX4mg", shmSize=9637888, shmPtr=0x00007efe4a59ac30, devShmPtr=0x00007efe4a59ac38, create=1) at shmutils.cc:61 frame #3: 0x00007eff39863322 libnccl.so.2`::shmRecvSetup(comm=<unavailable>, graph=<unavailable>, myInfo=<unavailable>, peerInfo=<unavailable>, connectInfo=0x00007efe57fe3fe0, recv=0x00007efe4a05d2f0, channelId=0, connIndex=0) at shm.cc:110 frame #4: 0x00007eff398446a4 libnccl.so.2`ncclTransportP2pSetup(ncclComm*, ncclTopoGraph*, int, int*) at transport.cc:33 frame #5: 0x00007eff398445c0 libnccl.so.2`ncclTransportP2pSetup(comm=0x0000000062355ab0, graph=0x00007efe57fe6a40, connIndex=0, highestTransportType=0x0000000000000000) at transport.cc:89 frame #6: 0x00007eff398367cd libnccl.so.2`::initTransportsRank(comm=0x0000000062355ab0, commId=<unavailable>) at init.cc:790 frame #7: 0x00007eff398383fe libnccl.so.2`::ncclCommInitRankFunc(job_=<unavailable>) at init.cc:1089 frame #8: 0x00007eff3984de07 libnccl.so.2`ncclAsyncJobMain(arg=0x000000006476e6d0) at group.cc:62 frame #9: 0x00007effce0bf6db libpthread.so.0`start_thread + 219 frame #10: 0x00007effcd64361f libc.so.6`__GI___clone at clone.S:95 ```

Fix typos regarding SGD, and make links specific

ada521c

soumith merged commit 72bf654 into pytorch:master Jan 12, 2017

This was referenced Sep 24, 2022

[maskedtensor] Overview tutorial [1/4] #2050

Merged

[maskedtensor] Sparsity tutorial [2/4] #2051

Merged

[maskedtensor] Adagrad sparse semantics [3/4] #2052

Merged

[maskedtensor] Advanced semantics [4/4] #2053

Merged

zhuhaozhe added a commit to zhuhaozhe/tutorials that referenced this pull request Jun 8, 2023

fix format (pytorch#2)

c617f8f

zhuhaozhe added a commit to zhuhaozhe/tutorials that referenced this pull request Jun 9, 2023

fix format (pytorch#2)

c06809f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix typos regarding SGD, and make links specific #2

Fix typos regarding SGD, and make links specific #2

jayanthkoushik commented Jan 12, 2017

soumith commented Jan 12, 2017

Fix typos regarding SGD, and make links specific #2

Fix typos regarding SGD, and make links specific #2

Conversation

jayanthkoushik commented Jan 12, 2017

soumith commented Jan 12, 2017