在使用 sratools 的 fastq-dump 时出现了成片的问题信息:
2024-04-22T01:38:42 fastq-dump.2.9.2 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -9984 ( X509 - Certificate verification failed, e.g. CRL, CA or signature check failed )
2024-04-22T01:38:42 fastq-dump.2.9.2 sys: mbedtls_ssl_get_verify_result returned 0x4008 ( !! The certificate is not correctly signed by the trusted CA !! The certificate is signed with an unacceptable hash. )
2024-04-22T01:38:42 fastq-dump.2.9.2 sys: connection failed while opening file within cryptographic module - ktls_handshake failed while accessing '130.14.29.110' from '192.168.120.54'
2024-04-22T01:38:42 fastq-dump.2.9.2 sys: connection failed while opening file within cryptographic module - Failed to create TLS stream for 'www.ncbi.nlm.nih.gov' (130.14.29.110) from '192.168.120.54'
2024-04-22T01:38:42 fastq-dump.2.9.2 err: item not found while constructing within virtual database module - the path 'ERR5242993' cannot be opened as database or table问题指向了证书。但谷歌后依然不得其解,一是很久没有动过这个服务器的配置,怎么会出现证书之类的搞不清的问题呢?二是 按照网络上的方法检查了证书也还是没有找到原因。
后续将问题的研究转向 sratools 本身,发现这个帖子给了非常有用的提示:fastq-dump的版本和sratools本身的版本不一致。
我检查后发现的确如此:
(base) [wsx@xu2 debug]$ prefetch --help
Usage:
prefetch [options] <SRA accession> [...]
Download SRA files and their dependencies
prefetch [options] --cart <kart file>
Download cart file
prefetch [options] <URL> --output-file <FILE>
Download URL to FILE
prefetch [options] <URL> [...] --output-directory <DIRECTORY>
Download URL or URL-s to DIRECTORY
prefetch [options] <SRA file> [...]
Check SRA file for missed dependencies and download them
Options:
-T|--type <value> Specify file type to download. Default: sra
-t|--transport <http|fasp|both> Transport: one of: fasp; http; both
[default]. (fasp only; http only; first try
fasp (ascp), use http if cannot download
using fasp).
--location <value> Location of data.
-N|--min-size <size> Minimum file size to download in KB
(inclusive).
-X|--max-size <size> Maximum file size to download in KB
(exclusive). Default: 20G
-f|--force <yes|no|all|ALL> Force object download: one of: no, yes,
all, ALL. no [default]: skip download if the
object if found and complete; yes: download
it even if it is found and is complete; all:
ignore lock files (stale locks or it is
being downloaded by another process use
at your own risk!); ALL: ignore lock files,
restart download from beginning.
-r|--resume <yes|no> Resume partial downloads: one of: no, yes
[default].
-C|--verify <yes|no> Verify after download: one of: no, yes
[default].
-p|--progress Show progress.
-H|--heartbeat <value> Time period in minutes to display download
progress. (0: no progress), default: 1
--eliminate-quals Don't download QUALITY column.
-c|--check-all Double-check all refseqs.
-S|--check-rs <yes|no|smart> Check for refseqs in downloaded files: one
of: no, yes, smart [default]. Smart: skip
check for large encrypted non-sra files.
-o|--order <kart|size> Kart prefetch order when downloading
kart: one of: kart, size. (in kart order, by
file size: smallest first), default: size.
-R|--rows <rows> Kart rows to download (default all). Row
list should be ordered.
--perm <PATH> PATH to jwt cart file.
--ngc <PATH> PATH to ngc file.
--cart <PATH> To read kart file.
-a|--ascp-path <ascp-binary|private-key-file> Path to ascp program and
private key file (asperaweb_id_dsa.putty)
--ascp-options <value> Arbitrary options to pass to ascp command
line.
-o|--output-file <FILE> Write file to FILE when downloading
single file.
-O|--output-directory <DIRECTORY> Save files to DIRECTORY/
-h|--help Output brief explanation for the program.
-V|--version Display the version of the program then
quit.
-L|--log-level <level> Logging level as number or enum string. One
of (fatal|sys|int|err|warn|info|debug) or
(0-6) Current/default is warn
-v|--verbose Increase the verbosity of the program
status messages. Use multiple times for more
verbosity. Negates quiet.
-q|--quiet Turn off all status messages for the
program. Negated by verbose.
--option-file <file> Read more options and parameters from the
file.
prefetch : 3.0.2
(base) [wsx@xu2 debug]$ which fast
fasterq-dump fasterq-dump-orig.3.0.2 fastq-dump.3.0.2 fastq-load.3
fasterq-dump.3 fastq-dump fastq-dump-orig.3.0.2 fastq-load.3.0.2
fasterq-dump.3.0.2 fastq-dump.3 fastq-load
(base) [wsx@xu2 debug]$ fastq-dump --version
fastq-dump : 2.9.2继续查看了 ~/.bashrc 和环境变量:
$ echo $PATH
/data3/wsx/miniconda3/bin:/usr/local/bin:/data3/wsx/miniconda3/condabin:/data3/wsx/bin:/data3/wsx/soft/sratoolkit/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/lib/rstudio-server/bin/quarto/bin:/usr/lib/rstudio-server/bin/postback/postback:/usr/lib/rstudio-server/bin/postback/postback:/usr/lib/rstudio-server/bin/postback/postback:/usr/lib/rstudio-server/bin/postback/postback:/usr/lib/rstudio-server/bin/postback/postback:/usr/lib/rstudio-server/bin/postback/postback:/usr/lib/rstudio-server/bin/postback/postback:/usr/bin:/data3/wsx/.local/bin:/data3/wsx/bin发现环境变量很有问题,在 ~/.bashrc 的设置逻辑里 /data3/wsx/soft/sratoolkit/bin 应该是很靠前的, 但 conda 的激活将 /usr/local/bin 提前了,这就导致了 fastq-dump 使用的是系统的一个版本,从而产生了这种不一致的情况。
[wsx@xu2 share]$ which fastq-dump
/usr/local/bin/fastq-dump
我的解决办法就是在 dump 数据前,显式地运行 export PATH=$HOME/soft/sratoolkit/bin:$PATH 命令将正确的路径前提。
在脚本中的使用情况如下:
#!/bin/bash
cd /data3/wsx/share/gcap_debug
mkdir -p /data3/wsx/share/gcap_debug/temp
export PATH=$HOME/soft/sratoolkit/bin:$PATH
for i in ERR5242993 ERR5243012
do
echo handling $i
parallel-fastq-dump -t 20 --tmpdir temp -O gcap_debug/ --split-3 --gzip -s $i
done这样去运行就没有这个问题了:
$ bash 0-dump-sra.sh
handling ERR5242993
SRR ids: ['ERR5242993']
extra args: ['--split-3', '--gzip']
tempdir: temp/pfd_ocitp7xi
ERR5242993 spots: 31931880
blocks: [[1, 1596594], [1596595, 3193188], [3193189, 4789782], [4789783, 6386376], [6386377, 7982970], [7982971, 9579564], [9579565, 11176158], [11176159, 12772752], [12772753, 14369346], [14369347, 15965940], [15965941, 17562534], [17562535, 19159128], [19159129, 20755722], [20755723, 22352316], [22352317, 23948910], [23948911, 25545504], [25545505, 27142098], [27142099, 28738692], [28738693, 30335286], [30335287, 31931880]]
Failed to call external services.
Read 1596594 spots for ERR5242993
Written 1596594 spots for ERR5242993
Read 1596594 spots for ERR5242993
Written 1596594 spots for ERR5242993
Read 1596594 spots for ERR5242993总结下来是:报错信息和错误的根源有时候南辕北辙。除了从问题本身思考,也要从其他可能得角度去探索。