regex - Perl regular expression - lines must contain ADFHKMPRTWCEGILNQSVY and nothing else -
After the
Someone can help me - I need to match a regular expression only lines containing the letters ADFHKMPRTWCEGILNQSVY
more. Nothing else
I need a loop through the lines of text that will look like this:
& gt; Gee 46450118 | Gb AAS96767.1. Women with a family of proteins [Desflibrio Vlgaris unstar Hildenborough] MVDLSRKKTQALLPTDILFQTPYWAQVKTRLGMESHAFDIRSSGPWGDVLVLLRRFGRHRVAIVPQGPEV APPHEDYGVYLESFSLALAEGLGPDVAFIRYDLPWVSPYADEMHDEGWNAFPEARLRELRMNMGTRHWNL RKSFQDLTVASSLVVDITGEEAAVLERMKPKTRYNIGLARRKGVAVREVGRESLPQFHALYRQTAIRNGF EPCSITHFSAMFHALCDGAGSTELLFLLATHGTDILAGCIVGLAGRTANFLYGASGNVKRNLMAPYLMHW TAMCHARDRGCHDYEMGAVPPGHDPAHPFHGLYRFKTGFGGRVALRSGSWDYPLDHAAYRDFCNAESLYR TDAAPGRTQ & gt; Gee 46450117 | GB AAS96766.1. Iron-sulfur proteins Kuf [Deslfovibrio Vlgaris Straw Hildenborough] MNHEELFVIQAEAEKCRACRKCELACIASHNNLTIKEAAKKRTVFAPRVHVVKTDEVKMPVQCRQCKDAP CARVCPTRALVQDDGVVTMRAQFCAACRLCIMACPYGAISLSFIGLPEEDEAGAMHGREVAVRCDLCSEW RAREGKSSCACVEACPTKALHMVPLAEARGRHQ & gt; Gee 46450116 | GB AAS96765.1 | Hydrogenase nickel inclusion protein hipa [DESFOVIBRIO VALGARIS STHILDENBORO] MHEASIVAGIMRIVEEEAARHDVTRIARVRLRVGLLTGVEPRTLTACFELYSEGTVAEGASLDLETVPAL GTCHACGATFDLHRRCFACPTCGNDDITLEGGRELTIAGLEVPQPEGATA & gt; Gee 46450115 | GB AAS96764.1. Carbon monoxide induced hydrogen COOH, known [Desulfovibrio vulgaris str. Hildenborough] MSTPDSTTQTWTLPVGPLHVALEEPMYFKLDVDGEIVRNVEITAGHVHRGMEALAMRRNLFQNIVLTERV CSLCSNSHPFTYCMAVEHLAGIEVPARADHLRVVAEEIKRTASHLFNVAILAHIIGFKSLFMHVMEVREI MQDIKETVYGNRMDLAANCIGGVKYDVDAELLAMLLAGLDKVERNAREIYRIYASDPMVTGRTTGIGVLP PDEARRFGVVGPVARGSGLAVDVRRDVPYAAYPQLSFDVITEEGCDVRARALVRLREVFESISIIRQCVA TLPEGAMTVIMPEIPAGQSVARSEAPRGELMYYLRTDGTDIPNRLKWRVPSYMNWDALGVMMRDANVADI PLIVNSIDPCISCTER & gt; Gee 46450114 | GB AAS96763.1. Haidrojnej, Kuu Sbunet, Putiv [Dislofibrio Vlgaris unstar Hildenborough] MPDNALTAPLATALDALAEAEGFTWTRDAHGNAYGWLRLAERDTLPEAARLLAEGGARLATVTAYDPVRE PGVPRQEIAYHFDVHGTTLTVTVVLDPECPSVPSITPHFRNADWNEREFMEMYDIAVPGHPNPRRLFLDE KLDAGIMNTIIPLSTMTNGASTQNLWERILAARPGDKA & gt; Gee 46450113 | GB AAS96762.1. The Haidrojnej, Kuoks Sbunet, Putiviti [Desfovibrio vulgaris unstar Hildenborough] MFGFLKVLARNVLKGPSTDPFPFAEAHTPARFRGQVRLDPALCVGCAICHHVCAGGAINIAEREDGSGYD FTVWHNTCALCGLCRHYCPTGAITLSNDWHNAHLQSQKYDWCERQFVPFMQCEGCGAHIRPLPPQLAARA YGPGGFDFASFMRLCPSCRQLAAARADVHIPEASAMPAAPAGHADEPAIREGDATAVTVKGDETPATGVQ Q
They all start with>, so I can just so that view. However, I'm double-sure that I want to get the right lines, so I also want a regexp that corresponds to the lines containing ADFHKMPRTWCEGILNQSVY and nothing else.
Cheers,
Stephen
Only a regular There is a need to create an expression, which is an example script of the beginning and end line:
Use strict; Use warnings; While (& lt; statistics & gt;) {if (/ ^ [ADFHKMPRTWCEGILNQSVY] + $ /) {Print $ _; }} __DATA__ & gt; G. 46450118 | GB AAS96767.1. Women with a family of proteins [Desflibrio Vlgaris unstar Hildenborough] MVDLSRKKTQALLPTDILFQTPYWAQVKTRLGMESHAFDIRSSGPWGDVLVLLRRFGRHRVAIVPQGPEV APPHEDYGVYLESFSLALAEGLGPDVAFIRYDLPWVSPYADEMHDEGWNAFPEARLRELRMNMGTRHWNL RKSFQDLTVASSLVVDITGEEAAVLERMKPKTRYNIGLARRKGVAVREVGRESLPQFHALYRQTAIRNGF EPCSITHFSAMFHALCDGAGSTELLFLLATHGTDILAGCIVGLAGRTANFLYGASGNVKRNLMAPYLMHW TAMCHARDRGCHDYEMGAVPPGHDPAHPFHGLYRFKTGFGGRVALRSGSWDYPLDHAAYRDFCNAESLYR TDAAPGRTQ
Output:
MVDLSRKKTQALLPTDILFQTPYWAQVKTRLGMESHAFDIRSSGPWGDVLVLLRRFGRHRVAIVPQGPEV APPHEDYGVYLESFSLALAEGLGPDVAFIRYDLPWVSPYADEMHDEGWNAFPEARLRELRMNMGTRHWNL RKSFQDLTVASSLVVDITGEEAAVLERMKPKTRYNIGLARRKGVAVREVGRESLPQFHALYRQTAIRNGF EPCSITHFSAMFHALCDGAGSTELLFLLATHGTDILAGCIVGLAGRTANFLYGASGNVKRNLMAPYLMHW TAMCHARDRGCHDYEMGAVPPGHDPAHPFHGLYRFKTGFGGRVALRSGSWDYPLDHAAYRDFCNAESLYR TDAAPGRTQ
D Econstructing Regex, we have:
-
^
matches the beginning of the string - Matching with
[ADFHKMPRTWCEGILNQSVY]
- Account is
Comments
Post a Comment