#

rclone – transfer files to/from cloud storage

Introduction

FAS and several other Harvard schools have access to Google Apps for Harvard, which includes Google Drive with unlimited storage. FAS RC cluster users can use Google Drive as cloud backup storage for important data files on FAS RC file systems. Files in Google Drive can be shared with internal and external collaborators.

rclone is a convenient and performant command-line tool for transferring files and synchronizing directories directly between FAS RC file systems and Google Drive (or other supported cloud storage).

If you are eligible, and don't already have a Google Apps for Harvard account, see the Google Apps for Harvard Getting Started page. If you require help or support for your Harvard Google account or for Google Drive itself, please contact HUIT (ithelp@harvard.edu).

Configuring rclone

rclone must be configured before first use. rclone can be granted access to a limited scope of Google Drive (e.g., read-only, or only files rclone creates). The following rclone config command grants the rclone application access to only files/folders it creates in Google Drive; to grant full (read/write) access to your Google Drive, omit the scope drive.file arguments.  See the Scopes section of the rclone Google Drive documentation for more info on other scopes.

[fasrcuser@boslogin02 ~]$ module load rclone
[fasrcuser@boslogin02 ~]$ rclone config create gdrive drive config_is_local false scope drive.file
2019/10/17 17:03:49 NOTICE: Config file "/n/home10/fasrcuser/.config/rclone/rclone.conf" not found - using defaults
Remote config
Use auto config?
* Say Y if not sure
* Say N if you are working on a remote or headless machine
Auto confirm is set: answering No, override by setting config parameter config_is_local=true
If your browser doesn't open automatically go to the following link: https://accounts.google.com/o/oauth2/auth?access_type=offline&client_id=123456789012.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&state=abcdef123456789abcdef123456789ab
Log in and authorize rclone for access
Enter verification code>

Next:

  1. Copy the URL that begins with https://accounts.google.com/ into a web browser window.
  2. Sign in with your Google Apps for Harvard account credentials.
  3. Click "Allow" to allow rclone to access your Google Drive.
  4. Copy the resulting verification code, and paste into the terminal window containing the FASRC cluster SSH session.

The resulting rclone configuration will be stored in your FAS RC home directory at ${HOME}/.config/rclone/rclone.conf . This file includes an OAuth2 access token; rclone access using this token may be revoked from your Google Account Third-party apps with account access page.

Using rclone

rclone supports many subcommands (see the complete list of rclone subcommands). A few commonly-used subcommands (assuming a Google Drive configured as gdrive):

Listing / moving / deleting objects
rclone command analogous Unix command
rclone lsf gdrive:fasrc/subfolder ls fasrc/subdir
rclone lsf --format stp --separator ' ' gdrive:fasrc/subfolder ls -l fasrc/subdir
rclone mkdir gdrive:fasrc/subfolder mkdir fasrc/subdir
rclone move gdrive:fasrc/subfolder1/file1 gdrive:fasrc/subfolder2/ mv fasrc/subdir/file1 fasrc/subdir
rclone rmdir gdrive:fasrc/subfolder rmdir fasrc/subdir
rclone delete gdrive:fasrc/file rm fasrc/file
rclone purge gdrive:fasrc/subfolder rm -r fasrc/subdir

Transferring data

Small data transfers may be done on FAS RC cluster login nodes, while large data transfers should be done within an interactive job so that data transfer is done from a compute node; e.g.:

srun -p test --pty --mem 1G -t 6:00 /bin/bash

Operands with the gdrive: prefix (assuming a Google Drive has been configured as gdrive) access Google Drive storage, while operands without gdrive: refer to a path on the FAS RC file system.

rclone copy gdrive:sourcepath destpath
rclone copy sourcepath gdrive:destpath

If sourcepath is a file, copy it to destpath.
If sourcepath is a directory/folder, recursively copy its contents to destpath. Contents of destpath that are not in sourcepath will be retained.

rclone sync --progress gdrive:sourcefolder destdir
rclone sync --progress sourcedir gdrive:destfolder

Replace contents of destdir/destfolder with the contents of sourcedir/sourcefolder (deleting any files not in the source).

Mounting Google Drive on a FAS RC compute node

Alternatively, rclone mount can make a Google Drive (subfolder) available on a FAS RC compute node as a regular file system (e.g., supporting common commands; such as cp, mv, and ls; that are used to interact with a POSIX file system), with limitations.

The directory on the FAS RC node at which the Google Drive will be made available as a file system (i.e., the mountpoint) must be on a node-local file system (such as /scratch) to avoid permissions issues when unmounting the file system. In particular, the mountpoint must not be within a file system in the /n/ directory, as these are all remote / network file systems.

The following example illustrates demonstrates this capability:

$ module load rclone
$ rclone lsf gdrive:fasrc/
cactus:2019.03.01--py27hdbcaa40_1.sif
ifxpong:1.4.7-ood.sif
jbrowse:1.16.5_2019-06-14.sif
subfolder/
$ mkdir /scratch/$USER
$ mkdir -m 700 /scratch/$USER/gdrive
$ rclone mount gdrive:fasrc /scratch/$USER/gdrive &
[1] 68913
$ ls -l /scratch/$USER/gdrive/
$ ls -l /scratch/nweeks/gdrive/
total 543900
-rw-r--r-- 1 fasrcuser fasrcgroup 495247360 May  1 16:27 cactus:2019.03.01--py27hdbcaa40_1.sif
-rw-r--r-- 1 fasrcuser fasrcgroup 50700288 Aug 22 16:05 ifxpong:1.4.7-ood.sif
-rw-r--r-- 1 fasrcuser fasrcgroup 11005952 Jun 14 15:16 jbrowse:1.16.5_2019-06-14.sif
drwxr-xr-x 1 fasrcuser fasrcgroup 0 Oct 24 10:21 subfolder
cactus_2019.09.03-623cfc5.sif  JBrowse-on-Odyssey.tar.gz  MAKER-odyssey-guide-for-review.tar.gz
$ fusermount -uz /scratch/$USER/gdrive/
[1]+  Done                    rclone mount gdrive:fasrc /scratch/$USER/gdrive

Comments:

  • The mountpoint (/scratch/$USER/gdrive) is created with appropriate permissions (via mkdir -m 700) to ensure only the owner has access.
  • The rclone mount command is executed asynchronously ("in the background") using the & operator.
  • fusermount -uz explicitly unmounts the Google Drive (causing the rclone mount process to terminate).
    • This performs a "lazy unmount", which requests that the OS perform the unmount when there are no processes whose current working directory is within the directory tree rooted at the mountpoint. To guard against accidentally leaving the directory mounted if a job or interactive session is prematurely terminated, the fusermount -uz command can be immediately issued after setting the working directory of the shell process that issues the rclone mount command can to the gdrive mountpoint; e.g.:
      rclone mount gdrive:fasrc /scratch/$USER/gdrive &
      cd /scratch/$USER/gdrive && fusermount -uz .

      Then /scratch/$USER/gdrive will be automatically unmounted when the shell's process has terminated or its working directory changed to a directory outside of /scratch/$USER/gdrive:

      cd ..
      [1]+ Done rclone mount gdrive:fasrc /scratch/$USER/gdrive
      

Limitations

At most 2 file transfers to Google Drive can be initiated per per second. Consider bundling many small files into a .zip or .tar(.gz) file.

Other Google drive limitations are listed in the rclone Google Drive documentation.

CC BY-NC-SA 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.