A command line tool allows users to query the peptide sequences against their own customized protein sequence database.

The tool provides two major functionalities: 
Given a protein sequence database in FASTA format, create the Lucene index for it.
Query the peptide sequences against the above index. The query can be:
A peptide sequence or a comma-separated list of peptide sequences or
A file in either FASTA format or a list of peptide sequences, one sequence per line.
The executable jar can be downloaded at http://alanine.bioinformatics.udel.edu/peptidematch_new/downloads/PeptideMatchCMD_1.0.jar. 
The source code is also availabe at http://alanine.bioinformatics.udel.edu/peptidematch_new/downloads/PeptideMatchCMD_src_1.0.zip.

Run from executable jar:

$ java -jar PeptideMatchCMD_1.0.jar -h
Command line options: -h 
usage: java -jar PeptideMatchCMD_1.0.jar [options]
            Available options:
            ------------------
 -a,--action <arg>       The action to perform ("index" or "query").
 -d,--dataFile <arg>     The path to a FASTA file to be indexed.
 -e,--LeqI               Treat Leucine (L) and Isoleucine (I) as
                         equivalent (default: no).
 -f,--force              Overwrite the indexDir (default: no).
 -h,--help               Print this message.
 -i,--indexDir <arg>     The directory where the index is stored.
 -l,--list               The query peptide sequence file is a list of
                         peptide sequences, one sequence per line
                         (default: no).
 -o,--outputFile <arg>   The path to the query result file.
 -Q,--queryFile <arg>    The path to the query peptide sequence file in
                         either FASTA format or a list of peptide
                         sequences, one sequence per line.
 -q,--query <arg>        One peptide sequence or a comma-separated list of
                         peptide sequences.

Compile from source:
$ unzip PeptideMatchCMD_src_1.0.zip
$ cd PeptideMatchCMD_src_1.0
$ ant
$ java -jar PeptideMatchCMD_1.0.jar -h
		
Tutorial 

Creating Lucene index using a protein sequence database in FASTA format:
$ java -jar PeptideMatchCMD_1.0.jar -a index -d uniprot_sprot.fasta -i sprot_index 
Command line: -a index -d uniprot_sprot.fasta -i sprot_index 
Indexing to directory "sprot_index" ...
Indexing "uniprot_sprot.fasta" ...
Indexing "uniprot_sprot.fasta" finished
Time used: 00 hours, 06 mins, 31.215 seconds
Query a peptide sequence:
$ java -jar PeptideMatchCMD_1.0.jar -a query -i sprot_index -q AAFGGSGGR -o out.txt 
Command line: -a query -i sprot_index -q AAFGGSGGR -o out.txt 
Quering...

AAFGGSGGR	has 1 match

Query is finished.
The result is saved in "out.txt".
Time used: 00 hours, 00 mins, 00.457 seconds

$ cat out.txt 
#Command line: -a query -i sprot_index -q AAFGGSGGR -o out.txt 
##Query	Subject	SubjectLength	MatchStart	MatchEnd
AAFGGSGGR	sp|P35908|K22E_HUMAN	639	516	524
Query a list of peptide sequences:
$ java -jar PeptideMatchCMD_1.0.jar -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt 
Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt 
Quering...

AAFGGSGGR	has 1 match
GVPDIR	has 4 matches

Query is finished.
The result is saved in "out.txt".
Time used: 00 hours, 00 mins, 00.493 seconds

$ cat out.txt 
#Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -o out.txt 
##Query	Subject	SubjectLength	MatchStart	MatchEnd
AAFGGSGGR	sp|P35908|K22E_HUMAN	639	516	524	
GVPDIR	sp|Q9CK59|Y1775_PASMU	92	45	50	
GVPDIR	sp|B1Y8E7|PYRB_LEPCP	320	194	199	
GVPDIR	sp|B4SHE6|MURD_PELPB	464	252	257	
GVPDIR	sp|Q6FX42|ATR_CANGA	2379	1135	1140
Query a list of peptide sequences and treat Leucine (L) and Isoleucine (I) as equivalent:
$ java -jar PeptideMatchCMD_1.0.jar -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt 
Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt 
Quering...

AAFGGSGGR	has 1 match
GVPDIR	has 13 matches

Query is finished.
The result is saved in "out.txt".
Time used: 00 hours, 00 mins, 00.513 seconds

$ cat out.txt 
#Command line: -a query -i sprot_index -q AAFGGSGGR,GVPDIR -e -o out.txt 
##Query	Subject	SubjectLength	MatchStart	MatchEnd	MatchedLEqIPositions
AAFGGSGGR	sp|P35908|K22E_HUMAN	639	516	524	
GVPDIR	sp|Q9CK59|Y1775_PASMU	92	45	50	
GVPDIR	sp|A0R5Z2|GLFT1_MYCS2	302	182	187	186
GVPDIR	sp|Q7D4V6|GLFT1_MYCTU	304	179	184	183
GVPDIR	sp|B1Y8E7|PYRB_LEPCP	320	194	199	
GVPDIR	sp|A5GDX3|RECF_GEOUR	364	126	131	130
GVPDIR	sp|P96919|EX5A_MYCTU	575	138	143	142
GVPDIR	sp|Q17QV2|MON1A_BOVIN	555	441	446	445
GVPDIR	sp|Q2QZ37|OBGM_ORYSJ	528	500	505	504
GVPDIR	sp|B4SHE6|MURD_PELPB	464	252	257	
GVPDIR	sp|Q9M1G3|LRK16_ARATH	669	595	600	599
GVPDIR	sp|Q5U3H2|SV421_DANRE	808	575	580	579
GVPDIR	sp|A6H5Y3|METH_MOUSE	1253	1147	1152	1151
GVPDIR	sp|Q6FX42|ATR_CANGA	2379	1135	1140
Query peptides in a FASTA file:
$ java -jar PeptideMatchCMD_1.0.jar -a query -i sprot_index -Q query.fasta -e -o out.txt 
Command line: -a query -i sprot_index -Q query.fasta -e -o out.txt 
Quering...

example_1	has 1 match
example_2	has 1 match
example_3	has 1 match
example_4	has 1 match
example_5	has 1 match
example_6	has 1 match
example_7	has 1 match
example_8	has 1 match
example_9	has 1 match
example_10	has 1 match

Query is finished.
The result is saved in "out.txt".
Time used: 00 hours, 00 mins, 00.724 seconds
Query peptides in a list file, one peptide per line:
$ java -jar PeptideMatchCMD_1.0.jar -a query -i sprot_index -Q query.list -l -e -o out.txt 
Command line: -a query -i sprot_index -Q query.list -l -e -o out.txt 
Quering...

AAFGGSGGR	has 1 match
ELEVQSEDGTFAK	has 1 match
FEDPAEGEDTLVEK	has 1 match
FSDGLITPDFLAK	has 1 match
GAPEFWAAR	has 1 match
GVIEANGGKVEK	has 1 match
HIPVYVSEEMVGHKFGEFSPTR	has 1 match
HNDVNFGTQDHNR	has 1 match
IGFYLTTCPR	has 1 match
ILVGQGNDGVAFVK	has 1 match

Query is finished.
The result is saved in "out.txt".
Time used: 00 hours, 00 mins, 00.752 seconds

