SRA 数据下载自救指南

SRA 数据下载自救指南

还在羡慕海峡那边的朋友下载SRA 快到飞起?还在难过用wget 下载数据经常下载不完整?用了官方的下载工具还是慢的不行?这里有一个SRA 下载自救尝试指南供你参考。

需要用到两个工具

  • SRA Toolkit
  • IBM aspera 高速文件传输工具

因为这是一篇极简自救指南,所以一切都不解释,直接给出链接,不明白的自行学习(爱学不学)。

SRA Toolkit 网址:https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc

aspera 网址:https://support.asperasoft.com/hc/en-us

aspera 官方对于下载NCBI数据的说明

https://support.asperasoft.com/hc/en-us/articles/216125898-Downloading-data-from-NCBI-via-the-command-line

SRA Toolkit 官方对于使用aspera的说明:

https://www.ncbi.nlm.nih.gov/books/NBK242625/

https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=prefetch

快速自救前奏

  • 下载 aspera (选择linux版本)

https://downloads.asperasoft.com/en/downloads/8?list

  • 安装 aspera
wget https://download.asperasoft.com/download/sw/connect/3.8.1/ibm-aspera-connect-3.8.1.161274-linux-g2.12-64.tar.gz
# 小心版本号有变动,不要直接复制上面的命令
tar zxvf ibm-aspera-connect-3.8.1.161274-linux-g2.12-64.tar.gz
bash ibm-aspera-connect-3.8.1.161274-linux-g2.12-64.sh
# 默认安装路径 /home/user/.aspera
  • 安装 sra toolkit 具体命令省略,注意一定要安装最新版本:)

正式开始自救

目前中文关于使用 aspera 下载 sra 数据的几篇教程都写的婆婆妈妈乱七八糟,千万不要再看了

记住,正式的自救只需要两步,其它写一大串的文章都是“耍流氓”。

  1. 把要下载的数据SRR号写入一个文件srr.txt,每行是一个SRR id

  2. 利用SRA toolkit 的 prefetch 下载,并指定下载方式为 ascp,命令如下,各种参数的含义自行查看文档(爱看不看)

prefetch -t ascp -a "/home/user/.aspera/connect/bin/ascp|/home/user/.aspera/connect/etc/asperaweb_id_dsa.openssh" --option-file srr.txt -O /opt/user/ncbi

其中-a 参数中必须要用绝对路径写上ascp所在的位置和previte KEY 的位置,如果是正常安装只需要把user替换为自己的用户名。

自救效果测试

下载了八个SRR文件,平均一个大小5G左右,使用时间如下:

2018-09-05T14:14:33 prefetch.2.9.2: 1) Downloading 'SRR******'...
2018-09-05T14:14:33 prefetch.2.9.2: Downloading via fasp...
SRR******
2018-09-05T14:16:58 prefetch.2.9.2: fasp download succeed
2018-09-05T14:16:58 prefetch.2.9.2: 1) 'SRR******' was downloaded successfully

2018-09-05T14:17:01 prefetch.2.9.2: 2) Downloading 'SRR******'...
2018-09-05T14:17:01 prefetch.2.9.2: Downloading via fasp...
SRR******
2018-09-05T14:19:25 prefetch.2.9.2: fasp download succeed
2018-09-05T14:19:25 prefetch.2.9.2: 2) 'SRR******' was downloaded successfully

2018-09-05T14:19:28 prefetch.2.9.2: 3) Downloading 'SRR******'...
2018-09-05T14:19:28 prefetch.2.9.2: Downloading via fasp...
SRR******
2018-09-05T14:22:31 prefetch.2.9.2: fasp download succeed
2018-09-05T14:22:31 prefetch.2.9.2: 3) 'SRR******' was downloaded successfully

2018-09-05T14:22:35 prefetch.2.9.2: 4) Downloading 'SRR******'...
2018-09-05T14:22:35 prefetch.2.9.2: Downloading via fasp...
SRR******
2018-09-05T14:25:14 prefetch.2.9.2: fasp download succeed
2018-09-05T14:25:14 prefetch.2.9.2: 4) 'SRR******' was downloaded successfully

2018-09-05T14:25:17 prefetch.2.9.2: 5) Downloading 'SRR******'...
2018-09-05T14:25:17 prefetch.2.9.2: Downloading via fasp...
SRR******
2018-09-05T14:26:46 prefetch.2.9.2: fasp download succeed
2018-09-05T14:26:46 prefetch.2.9.2: 5) 'SRR******' was downloaded successfully

2018-09-05T14:26:49 prefetch.2.9.2: 6) Downloading 'SRR******'...
2018-09-05T14:26:49 prefetch.2.9.2: Downloading via fasp...
SRR******
2018-09-05T14:28:13 prefetch.2.9.2: fasp download succeed
2018-09-05T14:28:13 prefetch.2.9.2: 6) 'SRR******' was downloaded successfully

2018-09-05T14:28:16 prefetch.2.9.2: 7) Downloading 'SRR******'...
2018-09-05T14:28:16 prefetch.2.9.2: Downloading via fasp...
SRR******
2018-09-05T14:29:56 prefetch.2.9.2: fasp download succeed
2018-09-05T14:29:56 prefetch.2.9.2: 7) 'SRR******' was downloaded successfully

2018-09-05T14:30:00 prefetch.2.9.2: 8) Downloading 'SRR******'...
2018-09-05T14:30:00 prefetch.2.9.2: Downloading via fasp...
SRR******
2018-09-05T14:31:58 prefetch.2.9.2: fasp download succeed
2018-09-05T14:31:58 prefetch.2.9.2: 8) 'SRR******' was downloaded successfully

喏,5G的文件,即便是在(你懂的)这种网络状况下,一个也只需要不到2分钟
自救成功,祝好!


本文作者:思考问题的熊

版权声明:本博客所有文章除特别声明外,均采用 知识共享署名-非商业性使用-禁止演绎 4.0 国际许可协议 (CC BY-NC-ND 4.0) 进行许可。

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×