Sequence based search is an essential activity in biotechnology domain in current era of genomic
revolution wherein sequence information of biologics holds key to several scientific answers.
Commercial launch of such products/ technology/ kits require comprehensive analysis of patents
and published applications claiming sequences and its applications including modifications,
mutations, method of preparing such products, related applications/ uses of the sequences.
Sequence searching is a pretty niche domain that requires specific databases and technical expertise
to search and interpret sequence alignments. Major roadblock in conducting sequence-based FTO
searches include scattered information across multiple databases, expertise in reducing database
noise using sequence based parameters, lack of sequence analysis expertise combined with claim
interpretation skills.
Sequence searches are integral part of patent searches including following aspects:
- Antibody sequences (including variable regions and CDRs)/ Antibody-Drug conjugates
- Recombinant proteins/ Fusion proteins/ enzymes/ peptides
- Recombinant vector/ components (e.g. Promoter, Enhancer, Terminator)
- Genomic sequences (e.g. Coronavirus)/ Gene Sequences/ Gene editing tools (CRISP-Cas9)
- Primers & Probes (including multiple primers for multiplex PCRs)
- SiRNA, Epitope, Motif search
- Modified/ Non-natural bases
Below are some key measures that are very useful in improving recall in sequence based FTO
searches.
Multiple databases search execution:
There have been multiple instances in the past wherein shortlisted references have been identified
from multiple database sources leading to a conclusion that no single database provides complete
coverage for sequence information. Most important aspect for a comprehensive sequence-based
FTO search is to incorporate combination of commercial and no-fee databases in the searching
methodology. Industry standard sequence database with good coverage may be used including STN
(Registry/ DGENE files), GenomeQuest, The Lens, NCBI Blast. Usage of multi-database strategy
overcomes limitations like jurisdiction coverage, database specific sequence compilation errors, and
any possible sequence alignment algorithm/ parameter selection related errors on part of the user.
Understanding the aspect of partial sequence matches:
Search scope determines if a project requires only the complete sequences matches (one with 80%
or more query cover with the query sequence) or both complete matches as well as partial/
fragment matches (smaller query cover but high % identity to the query sequence). Before excluding
any such results with lower query cover but having a high identity, this aspect should be well
understood. There are many such instances wherein partial match is very critical to the FTO search
for instance if claims are directed to any specific protein domain, epitopes, and variable regions of
an antibody. Claim language should also be considered before excluding references based on lower
query cover as claims may recite limitations on sequence size by using terms including fragments,
parts, at least the sequence length. As a note of cautions, we suggest searches to be very cautious in
case, they come across patent references that have lower query cover but are 100% identical to
query sequence.
Including focused Keyword based searches:
Many a times some key references are missed out in FTO searches as searchers completely rely on
Sequence searches and ignores keyword aspect of the search. We understand that focused keyword
search has to be an integral part of a sequence search to capture instances wherein claims do not
recite the SEQ IDs but may claims mutations at specific positions or use abbreviations for such
modifications. Focused keyword search helps in increasing search recall in regard to the technology
and helps to identify references that are not covered by sequence search database. Needless to
recite that there are chances that sequence search database do not cover the sequences
comprehensively and accurately.
Executing different sequence search algorithms/ tools
Understanding of the sequence search algorithm and proper selection of tool improves the chances
of capturing all the potentially relevant results. We suggest combining different sequence searching
algorithm/ tools to increase overall coverage of the search. Most popular sequence tools used for
such searches include Blast similarity searches and exact sequence/sub-sequence search algorithms
to retrieve hits with exactly the same query sequence. Another key aspect is to include other useful
blast tools apart from most frequently used blastn and blastp based on the search requirement, e.g.
tblastn (to search translated nucleotide databases using a protein query sequence).
Additional searches:
Claiming of biological Sequence in the field of biotechnology is rapidly evolving and searchers should
understand different ways such patents are drafted. Additional searches (key Assignee, key Inventor,
forward citations, backward citations, and similarity searches based on the identified references
during the earlier steps) are key to ensure that we have captured all the documents that could be
potentially important to the technology domain.
The concept of comprehensive FTO search is to provide a complete, up-to-date list of all the patents
and published applications that may potentially hinder the launch of intended commercial
product/technology/ process. Following a well-structured and researched methodology, ensures a
comprehensive FTO report that reduces the risk of missing out a potentially relevant reference.
Considering high-stakes for the corporates/ start-ups/ academics with respect to sequence based
commercial products/ technologies, we suggest integrating these search strategies in a FTO search
methodology for a robust and comprehensive FTO search report.