May 11, 2021
Sequence based search are key in establishing patentability of novel sequences as well as understanding freedom aspects for commercialization of biotechnology based products and technologies. These searches are key to many aspects of biotechnology and life sciences including Therapeutic drugs (Monoclonal Antibody sequences, Recombinant proteins, Peptides, SiRNA), Diagnostics tests/ kits (Primers, Probes, Genetic sequences, Epitopes), Industrial enzymes, Expression systems (Vectors, Promoters, Terminators, Markers) , Research Tools (CRISP-Cas9, etc.). Modified sequences are key in many new inventions and include aspects like mutations, substitutions, as well as use of synthetic modified bases (e.g. psuedouridine has been used in several mRNA vaccines).
Below are the key steps to be conducted during a sequence search:
1. Identification of Correct sequence information
Sequence search starts with confirming the correct information of nucleotide or amino acid sequence for the query sequence. It is important to cross-check the sequence information with respective research team in-case of novel sequences or reliable sources (e.g. regulatory agencies) while picking sequence information of already available sequences (e.g. during development of biosimilar products).
2. Selection of appropriate databases and search algorithms
Since no single database provides comprehensive coverage of the sequence data, we recommend a combination of commercial databases for example, databases hosted on STN (Registry, DGENE), GenomeQuest and No-fee databases (The Lens, NCBI Blast). Further, only a few databases cover Non-patent Literature including scientific publications which include Registry and NCBI database.
Both, similarity search (e.g. Blast) and exact match/ subsequence search are the most frequently used search algorithms. Many experienced sequence searchers recommend a combination of blast similarity search and exact sequence search to prepare a robust and comprehensive dataset for sequence based analysis. Further, additional algorithm parameters should be adjusted based on individual sequence characteristics (including word size, matrix, etc.).
3. Analysis of sequence alignments
Alignment of query sequence with subject sequences (database derived sequence) should be analyzed in detail to shortlist potentially relevant references with respect to the input query sequence. Two key alignment parameters used for sequence analysis include:
% Identity - Match between query and subject sequence for the aligned portion
% Query cover - % of query sequence being mapped to subject sequence in an alignment
Cut-off values for these search parameter vary in range of 50% to 100% depending on search requirements.
Sample sequence Alignment: % Identity: 97% & Query Cover: 100%
Above representation provides a sample sequence alignment highlighting search alignment parameters.
4. Interpreting claim language during sequence analysis
Both sequence alignment as well as claim language should be considered during the analysis of sequence based dataset. Various restrictions in claims of the patents/ published applications should be understood – e.g. % identity limitations, mutations/ substitutions at specific positions, coverage of parts/ fragments of the claimed sequence. Claims related to primers/ probes frequently have coverage on fragments/ contiguous nucleotide bases.
Patent/Published Application | Claimed subject matter |
---|---|
WO2021055395A1 (Assignee: Novozymes) |
Isolated purified polypeptide having beta glucanase activity, wherein polypeptide has at least 70% identity to at least 99% identity to polypeptide of SEQ ID No: 8. |
WO2014042693A1 (Assignee: LS9) |
Variant of ester synthase polypeptide comprising SEQ ID NO: 2, wherein the peptide is genetically engineered to have at least one mutation at an amino acid position selected from positions including 4, 5, 7, 15, 80, 147, 166, 266, 395, 466, etc. |
WO2015095392A1 (Assignee: Genentech) |
Anti-cluster of differentiation 3 (CD3) antibody, comprising multiple hypervariable regions (HVRs)
(a) an HVR-H1 comprising the amino acid sequence of SEQ ID NO: 1 ; (b) an HVR-H2 comprising the amino acid sequence of SEQ ID NO: 2; (c) an HVR-H3 comprising the amino acid sequence of SEQ ID NO: 3; |
WO2015051456A1 (Assignee: Univ. of Prince Edward Island) |
Method of virus detection wherein primer is selected from an isolated nucleic acid having at least 80% identity to the sequence of SEQ ID NO: 4 or 9, or fragments thereof of at least 15 contiguous nucleotide |