News and Blogs

May 11, 2021

A comprehensive approach to sequence based FTO search (Freedom-to-operate)

Sequence based search is an essential activity in biotechnology domain in current era of genomic revolution wherein sequence information of biologics holds key to several scientific answers. Commercial launch of such products/ technology/ kits require comprehensive analysis of patents and published applications claiming sequences and its applications including modifications, mutations, method of preparing such products, related applications/ uses of the sequences. Sequence searching is a pretty niche domain that requires specific databases and technical expertise to search and interpret sequence alignments. Major roadblock in conducting sequence-based FTO searches include scattered information across multiple databases, expertise in reducing database noise using sequence based parameters, lack of sequence analysis expertise combined with claim interpretation skills.

Sequence searches are integral part of patent searches including following aspects:

  • Antibody sequences (including variable regions and CDRs)/ Antibody-Drug conjugates
  • Recombinant proteins/ Fusion proteins/ enzymes/ peptides
  • Recombinant vector/ components (e.g. Promoter, Enhancer, Terminator)
  • Genomic sequences (e.g. Coronavirus)/ Gene Sequences/ Gene editing tools (CRISP-Cas9)
  • Primers & Probes (including multiple primers for multiplex PCRs)
  • SiRNA, Epitope, Motif search
  • Modified/ Non-natural bases


Below are some key measures that are very useful in improving recall in sequence based FTO searches.

Multiple databases search execution:
There have been multiple instances in the past wherein shortlisted references have been identified from multiple database sources leading to a conclusion that no single database provides complete coverage for sequence information. Most important aspect for a comprehensive sequence-based FTO search is to incorporate combination of commercial and no-fee databases in the searching methodology. Industry standard sequence database with good coverage may be used including STN (Registry/ DGENE files), GenomeQuest, The Lens, NCBI Blast. Usage of multi-database strategy overcomes limitations like jurisdiction coverage, database specific sequence compilation errors, and any possible sequence alignment algorithm/ parameter selection related errors on part of the user.

Understanding the aspect of partial sequence matches:
Search scope determines if a project requires only the complete sequences matches (one with 80% or more query cover with the query sequence) or both complete matches as well as partial/ fragment matches (smaller query cover but high % identity to the query sequence). Before excluding any such results with lower query cover but having a high identity, this aspect should be well understood. There are many such instances wherein partial match is very critical to the FTO search for instance if claims are directed to any specific protein domain, epitopes, and variable regions of an antibody. Claim language should also be considered before excluding references based on lower query cover as claims may recite limitations on sequence size by using terms including fragments, parts, at least the sequence length. As a note of cautions, we suggest searches to be very cautious in case, they come across patent references that have lower query cover but are 100% identical to query sequence.

Including focused Keyword based searches:
Many a times some key references are missed out in FTO searches as searchers completely rely on Sequence searches and ignores keyword aspect of the search. We understand that focused keyword search has to be an integral part of a sequence search to capture instances wherein claims do not recite the SEQ IDs but may claims mutations at specific positions or use abbreviations for such modifications. Focused keyword search helps in increasing search recall in regard to the technology and helps to identify references that are not covered by sequence search database. Needless to recite that there are chances that sequence search database do not cover the sequences comprehensively and accurately.

Executing different sequence search algorithms/ tools
Understanding of the sequence search algorithm and proper selection of tool improves the chances of capturing all the potentially relevant results. We suggest combining different sequence searching algorithm/ tools to increase overall coverage of the search. Most popular sequence tools used for such searches include Blast similarity searches and exact sequence/sub-sequence search algorithms to retrieve hits with exactly the same query sequence. Another key aspect is to include other useful blast tools apart from most frequently used blastn and blastp based on the search requirement, e.g. tblastn (to search translated nucleotide databases using a protein query sequence).

Additional searches:
Claiming of biological Sequence in the field of biotechnology is rapidly evolving and searchers should understand different ways such patents are drafted. Additional searches (key Assignee, key Inventor, forward citations, backward citations, and similarity searches based on the identified references during the earlier steps) are key to ensure that we have captured all the documents that could be potentially important to the technology domain.

The concept of comprehensive FTO search is to provide a complete, up-to-date list of all the patents and published applications that may potentially hinder the launch of intended commercial product/technology/ process. Following a well-structured and researched methodology, ensures a comprehensive FTO report that reduces the risk of missing out a potentially relevant reference. Considering high-stakes for the corporates/ start-ups/ academics with respect to sequence based commercial products/ technologies, we suggest integrating these search strategies in a FTO search methodology for a robust and comprehensive FTO search report.