Skip to content
Snippets Groups Projects

WIP: Resolve "Upgrade to pytorch-1.0.1"

Closed Guillaume HEUSCH requested to merge 17-upgrade-to-pytorch-1-0-1 into master
2 unresolved threads

Closes #17 (closed)

Merge request reports

Pipeline #29642 failed

Pipeline failed for ace6dfb0 on 17-upgrade-to-pytorch-1-0-1

Approval is optional

Closed by Guillaume HEUSCHGuillaume HEUSCH 5 years ago (May 3, 2019 10:43am UTC)

Merge details

  • The changes were not merged into .

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • 39 - torchvision >=0.2.1
    38 - pytorch {{ pytorch }}
    39 - torchvision {{ torchvision }}
    40 40 run:
    41 41 - python
    42 42 - setuptools
    43 - numpy >=1.11
    43 - numpy
    44 44 - docopt
    45 - pytorch =0.4.1
    46 - torchvision >=0.2.1
    47 - matplotlib
    48 - h5py
    49 - tensorflow >=1.4
    45 - pytorch
    46 - torchvision
  • 38 - pytorch {{ pytorch }}
    39 - torchvision {{ torchvision }}
    40 40 run:
    41 41 - python
    42 42 - setuptools
    43 - numpy >=1.11
    43 - numpy
    44 44 - docopt
    45 - pytorch =0.4.1
    46 - torchvision >=0.2.1
    47 - matplotlib
    48 - h5py
    49 - tensorflow >=1.4
    45 - pytorch
    46 - torchvision
    47 - matplotlib {{ matplotlib }}
  • added 1 commit

    • 7a162672 - [recipe] replaced jinja variable with pin_compatible

    Compare with previous version

  • Amir MOHAMMADI
    Amir MOHAMMADI @amohammadi started a thread on commit 7a162672
  • 38 38 - pytorch {{ pytorch }}
    39 39 - torchvision {{ torchvision }}
    40 40 run:
    41 - python
    42 - setuptools
    43 - numpy
    44 - docopt
    45 - pytorch
    46 - torchvision
    47 - matplotlib {{ matplotlib }}
    41 - python {{ pin_compatible('python') }}
    42 - setuptools {{ pin_compatible('setuptools') }}
  • added 1 commit

    • 54560aae - [recipe] fix pin_compatible stuff

    Compare with previous version

  • added 1 commit

    • b77b60b0 - [recipe] changed the requirements to be compliant to what is stated in bob.devtools doc ..

    Compare with previous version

  • added 1 commit

    • d5c12df4 - [recipe] removed jinja variable for h5py

    Compare with previous version

  • added 1 commit

    • ace6dfb0 - [recipe] removed pin_compatible for docopt

    Compare with previous version

  • Author Maintainer

    After a bunch of trials and errors (mostly) on pinning with jinja variables and pin_compatible stuff in the conda recipe (inlcuding what is advised here: https://www.idiap.ch/software/bob/docs/bob/bob.devtools/master/templates.html#conda-recipe), I decided to step back for a while .. Feel free to keep on trying ;)

    Maybe @andre.anjos has a solution to this ?

  • Author Maintainer

    I manually pinned the version in the master branch (it needs to pass the CI for reasons related to BATL) and got the following error in the CI:

    Traceback (most recent call last):
      File "/local/builds/bob/bob.learn.pytorch/miniconda/conda-bld/bob.learn.pytorch_1556527043921/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/bin/train_cnn.py", line 7, in <module>
        from bob.learn.pytorch.scripts.train_cnn import main
      File "/local/builds/bob/bob.learn.pytorch/miniconda/conda-bld/bob.learn.pytorch_1556527043921/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/site-packages/bob/learn/pytorch/scripts/train_cnn.py", line 42, in <module>
        import torch
      File "/local/builds/bob/bob.learn.pytorch/miniconda/conda-bld/bob.learn.pytorch_1556527043921/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/site-packages/torch/__init__.py", line 102, in <module>
        from torch._C import *
    ImportError: /local/builds/bob/bob.learn.pytorch/miniconda/conda-bld/bob.learn.pytorch_1556527043921/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.6/site-packages/torch/lib/libtorch.so.1: undefined symbol: nvrtcGetProgramLogSize

    Apparently, it is related to the usage of the library on the CPU and/or GPU ...

    There is a related issue (with a solution) on PyTorch's GitHub https://github.com/pytorch/pytorch/issues/14973. Basically, linking to libnvrtc.so and libcuda.so. I don't think it's possible to do that on the CI (or is it ?). A possible workaround is presented here: https://stackoverflow.com/questions/55665606/how-to-fix-importerror-home-lib-libtorch-so-1-undefined-symbol-nvrt, but requires pytorch packages that are currently not in conda defaults channels ...

    So, how do we proceed with that ?

  • Author Maintainer

    Ok, apparently the tentative workaround (dicussed via email and using pytorch-cpu) does not work either. I hence suggest to revert bob-devel back to pytorch 0.4.1, when everything was working smoothly ... And we should keep it until a new (and CI compatible) version of pytorch will be released in conda defaults channels

    @ageorge @amohammadi @andre.anjos @tlaibacher What do you think ?

  • @heusch downgrading bob-devel is a lot of work. We have updated our source code to new dependencies. For example we had to change our code to update to click 7, ffmpeg 4, and tensorflow 1.13. If reverted, some of our packages will break.

    What you guys really need is a conda package for bob.learn.pytorch that works with both pytorch and pytorch-cpu. Here it is: bob.learn.pytorch-0.0.4b0-py36ha615f63_34.tar.bz2

  • mentioned in issue #17 (closed)

  • Please register or sign in to reply
    Loading