在使用 sratools 的 fastq-dump 时出现了成片的问题信息:
2024-04-22T01:38:42 fastq-dump.2.9.2 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -9984 ( X509 - Certificate verification failed, e.g. CRL, CA or signature check failed )
2024-04-22T01:38:42 fastq-dump.2.9.2 sys: mbedtls_ssl_get_verify_result returned 0x4008 (  !! The certificate is not correctly signed by the trusted CA  !! The certificate is signed with an unacceptable hash.  )
2024-04-22T01:38:42 fastq-dump.2.9.2 sys: connection failed while opening file within cryptographic module - ktls_handshake failed while accessing '130.14.29.110' from '192.168.120.54'
2024-04-22T01:38:42 fastq-dump.2.9.2 sys: connection failed while opening file within cryptographic module - Failed to create TLS stream for 'www.ncbi.nlm.nih.gov' (130.14.29.110) from '192.168.120.54'
2024-04-22T01:38:42 fastq-dump.2.9.2 err: item not found while constructing within virtual database module - the path 'ERR5242993' cannot be opened as database or table问题指向了证书。但谷歌后依然不得其解,一是很久没有动过这个服务器的配置,怎么会出现证书之类的搞不清的问题呢?二是 按照网络上的方法检查了证书也还是没有找到原因。
后续将问题的研究转向 sratools 本身,发现这个帖子给了非常有用的提示:fastq-dump的版本和sratools本身的版本不一致。
我检查后发现的确如此:
(base) [wsx@xu2 debug]$ prefetch --help
Usage:
  prefetch [options] <SRA accession> [...]
  Download SRA files and their dependencies
  prefetch [options] --cart <kart file>
  Download cart file
  prefetch [options] <URL> --output-file <FILE>
  Download URL to FILE
  prefetch [options] <URL> [...] --output-directory <DIRECTORY>
  Download URL or URL-s to DIRECTORY
  prefetch [options] <SRA file> [...]
  Check SRA file for missed dependencies and download them
Options:
  -T|--type <value>                Specify file type to download. Default: sra 
  -t|--transport <http|fasp|both>  Transport: one of: fasp; http; both 
                                   [default]. (fasp only; http only; first try 
                                   fasp (ascp), use http if cannot download 
                                   using fasp). 
  --location <value>               Location of data. 
  -N|--min-size <size>             Minimum file size to download in KB 
                                   (inclusive). 
  -X|--max-size <size>             Maximum file size to download in KB 
                                   (exclusive). Default: 20G 
  -f|--force <yes|no|all|ALL>      Force object download: one of: no, yes, 
                                   all, ALL. no [default]: skip download if the 
                                   object if found and complete; yes: download 
                                   it even if it is found and is complete; all: 
                                   ignore lock files (stale locks or it is 
                                   being downloaded by another process use 
                                   at your own risk!); ALL: ignore lock files, 
                                   restart download from beginning. 
  -r|--resume <yes|no>             Resume partial downloads: one of: no, yes 
                                   [default]. 
  -C|--verify <yes|no>             Verify after download: one of: no, yes 
                                   [default]. 
  -p|--progress                    Show progress. 
  -H|--heartbeat <value>           Time period in minutes to display download 
                                   progress. (0: no progress), default: 1 
  --eliminate-quals                Don't download QUALITY column. 
  -c|--check-all                   Double-check all refseqs. 
  -S|--check-rs <yes|no|smart>     Check for refseqs in downloaded files: one 
                                   of: no, yes, smart [default]. Smart: skip 
                                   check for large encrypted non-sra files. 
  -o|--order <kart|size>           Kart prefetch order when downloading 
                                   kart: one of: kart, size. (in kart order, by 
                                   file size: smallest first), default: size. 
  -R|--rows <rows>                 Kart rows to download (default all). Row 
                                   list should be ordered. 
  --perm <PATH>                    PATH to jwt cart file. 
  --ngc <PATH>                     PATH to ngc file. 
  --cart <PATH>                    To read kart file. 
  -a|--ascp-path <ascp-binary|private-key-file>  Path to ascp program and 
                                   private key file (asperaweb_id_dsa.putty) 
  --ascp-options <value>           Arbitrary options to pass to ascp command 
                                   line. 
  -o|--output-file <FILE>          Write file to FILE when downloading 
                                   single file. 
  -O|--output-directory <DIRECTORY>  Save files to DIRECTORY/ 
  -h|--help                        Output brief explanation for the program. 
  -V|--version                     Display the version of the program then 
                                   quit. 
  -L|--log-level <level>           Logging level as number or enum string. One 
                                   of (fatal|sys|int|err|warn|info|debug) or 
                                   (0-6) Current/default is warn 
  -v|--verbose                     Increase the verbosity of the program 
                                   status messages. Use multiple times for more 
                                   verbosity. Negates quiet. 
  -q|--quiet                       Turn off all status messages for the 
                                   program. Negated by verbose. 
  --option-file <file>             Read more options and parameters from the 
                                   file. 
prefetch : 3.0.2
(base) [wsx@xu2 debug]$ which fast
fasterq-dump             fasterq-dump-orig.3.0.2  fastq-dump.3.0.2         fastq-load.3             
fasterq-dump.3           fastq-dump               fastq-dump-orig.3.0.2    fastq-load.3.0.2         
fasterq-dump.3.0.2       fastq-dump.3             fastq-load               
(base) [wsx@xu2 debug]$ fastq-dump --version
fastq-dump : 2.9.2继续查看了 ~/.bashrc 和环境变量:
$ echo $PATH
/data3/wsx/miniconda3/bin:/usr/local/bin:/data3/wsx/miniconda3/condabin:/data3/wsx/bin:/data3/wsx/soft/sratoolkit/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/lib/rstudio-server/bin/quarto/bin:/usr/lib/rstudio-server/bin/postback/postback:/usr/lib/rstudio-server/bin/postback/postback:/usr/lib/rstudio-server/bin/postback/postback:/usr/lib/rstudio-server/bin/postback/postback:/usr/lib/rstudio-server/bin/postback/postback:/usr/lib/rstudio-server/bin/postback/postback:/usr/lib/rstudio-server/bin/postback/postback:/usr/bin:/data3/wsx/.local/bin:/data3/wsx/bin发现环境变量很有问题,在 ~/.bashrc 的设置逻辑里 /data3/wsx/soft/sratoolkit/bin 应该是很靠前的, 但 conda 的激活将 /usr/local/bin 提前了,这就导致了 fastq-dump 使用的是系统的一个版本,从而产生了这种不一致的情况。
[wsx@xu2 share]$ which fastq-dump
/usr/local/bin/fastq-dump
我的解决办法就是在 dump 数据前,显式地运行 export PATH=$HOME/soft/sratoolkit/bin:$PATH 命令将正确的路径前提。
在脚本中的使用情况如下:
#!/bin/bash
cd /data3/wsx/share/gcap_debug
mkdir -p /data3/wsx/share/gcap_debug/temp
export PATH=$HOME/soft/sratoolkit/bin:$PATH
for i in ERR5242993 ERR5243012
do
  echo handling $i
  parallel-fastq-dump -t 20 --tmpdir temp -O gcap_debug/ --split-3  --gzip -s $i
done这样去运行就没有这个问题了:
$ bash 0-dump-sra.sh 
handling ERR5242993
SRR ids: ['ERR5242993']
extra args: ['--split-3', '--gzip']
tempdir: temp/pfd_ocitp7xi
ERR5242993 spots: 31931880
blocks: [[1, 1596594], [1596595, 3193188], [3193189, 4789782], [4789783, 6386376], [6386377, 7982970], [7982971, 9579564], [9579565, 11176158], [11176159, 12772752], [12772753, 14369346], [14369347, 15965940], [15965941, 17562534], [17562535, 19159128], [19159129, 20755722], [20755723, 22352316], [22352317, 23948910], [23948911, 25545504], [25545505, 27142098], [27142099, 28738692], [28738693, 30335286], [30335287, 31931880]]
Failed to call external services.
Read 1596594 spots for ERR5242993
Written 1596594 spots for ERR5242993
Read 1596594 spots for ERR5242993
Written 1596594 spots for ERR5242993
Read 1596594 spots for ERR5242993总结下来是:报错信息和错误的根源有时候南辕北辙。除了从问题本身思考,也要从其他可能得角度去探索。