WIP: Resolve "Upgrade to pytorch-1.0.1"
Closes #17 (closed)
Merge request reports
Activity
mentioned in issue #20 (closed)
added 1 commit
- 1c87ea37 - [recipe] changed hardcoded pytorch and torchvision version, replaced with jinja variables
- Resolved by Amir MOHAMMADI
39 - torchvision >=0.2.1 38 - pytorch {{ pytorch }} 39 - torchvision {{ torchvision }} 40 40 run: 41 41 - python 42 42 - setuptools 43 - numpy >=1.11 43 - numpy 44 44 - docopt 45 - pytorch =0.4.1 46 - torchvision >=0.2.1 47 - matplotlib 48 - h5py 49 - tensorflow >=1.4 45 - pytorch 46 - torchvision @heusch it's better to pin everything here to make sure your packages don't break in future. Something like:
- {{ pin_compatible('numpy') }} - {{ pin_compatible('docopt') }} - {{ pin_compatible('pytorch') }} - {{ pin_compatible('torchvision') }} ...
changed this line in version 4 of the diff
38 - pytorch {{ pytorch }} 39 - torchvision {{ torchvision }} 40 40 run: 41 41 - python 42 42 - setuptools 43 - numpy >=1.11 43 - numpy 44 44 - docopt 45 - pytorch =0.4.1 46 - torchvision >=0.2.1 47 - matplotlib 48 - h5py 49 - tensorflow >=1.4 45 - pytorch 46 - torchvision 47 - matplotlib {{ matplotlib }} changed this line in version 4 of the diff
added 1 commit
- 7a162672 - [recipe] replaced jinja variable with pin_compatible
38 38 - pytorch {{ pytorch }} 39 39 - torchvision {{ torchvision }} 40 40 run: 41 - python 42 - setuptools 43 - numpy 44 - docopt 45 - pytorch 46 - torchvision 47 - matplotlib {{ matplotlib }} 41 - python {{ pin_compatible('python') }} 42 - setuptools {{ pin_compatible('setuptools') }} added 1 commit
- b77b60b0 - [recipe] changed the requirements to be compliant to what is stated in bob.devtools doc ..
After a bunch of trials and errors (mostly) on pinning with jinja variables and pin_compatible stuff in the conda recipe (inlcuding what is advised here: https://www.idiap.ch/software/bob/docs/bob/bob.devtools/master/templates.html#conda-recipe), I decided to step back for a while .. Feel free to keep on trying ;)
Maybe @andre.anjos has a solution to this ?
I manually pinned the version in the master branch (it needs to pass the CI for reasons related to BATL) and got the following error in the CI:
Traceback (most recent call last): File "/local/builds/bob/bob.learn.pytorch/miniconda/conda-bld/bob.learn.pytorch_1556527043921/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/bin/train_cnn.py", line 7, in <module> from bob.learn.pytorch.scripts.train_cnn import main File "/local/builds/bob/bob.learn.pytorch/miniconda/conda-bld/bob.learn.pytorch_1556527043921/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/site-packages/bob/learn/pytorch/scripts/train_cnn.py", line 42, in <module> import torch File "/local/builds/bob/bob.learn.pytorch/miniconda/conda-bld/bob.learn.pytorch_1556527043921/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/site-packages/torch/__init__.py", line 102, in <module> from torch._C import * ImportError: /local/builds/bob/bob.learn.pytorch/miniconda/conda-bld/bob.learn.pytorch_1556527043921/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/site-packages/torch/lib/libtorch.so.1: undefined symbol: nvrtcGetProgramLogSize
Apparently, it is related to the usage of the library on the CPU and/or GPU ...
There is a related issue (with a solution) on PyTorch's GitHub https://github.com/pytorch/pytorch/issues/14973. Basically, linking to
libnvrtc.so
andlibcuda.so
. I don't think it's possible to do that on the CI (or is it ?). A possible workaround is presented here: https://stackoverflow.com/questions/55665606/how-to-fix-importerror-home-lib-libtorch-so-1-undefined-symbol-nvrt, but requires pytorch packages that are currently not in conda defaults channels ...So, how do we proceed with that ?
Ok, apparently the tentative workaround (dicussed via email and using pytorch-cpu) does not work either. I hence suggest to revert bob-devel back to pytorch 0.4.1, when everything was working smoothly ... And we should keep it until a new (and CI compatible) version of pytorch will be released in conda defaults channels
@ageorge @amohammadi @andre.anjos @tlaibacher What do you think ?
@heusch downgrading bob-devel is a lot of work. We have updated our source code to new dependencies. For example we had to change our code to update to click 7, ffmpeg 4, and tensorflow 1.13. If reverted, some of our packages will break.
What you guys really need is a conda package for bob.learn.pytorch that works with both pytorch and pytorch-cpu. Here it is: bob.learn.pytorch-0.0.4b0-py36ha615f63_34.tar.bz2
mentioned in issue #17 (closed)