Help with script to replace lines from 1 file to another

Discussion in 'Programming & Software Development' started by treeplant, Apr 18, 2018.

  1. treeplant

    treeplant Member

    Joined:
    Nov 7, 2007
    Messages:
    612
    I am trying to use sed or awk command in linux but I not sure if it is possible to achieve the outcome I am looking for.

    Using other solutions like perl is also ok, needing somebody help me tell me if possible.

    My problem:

    Have 2 files
    1. Main.php script to populate sample data to database
    2. Updatednames.txt updated user names

    Main.php - has many fields that start same, only 3 need to be modified

    Main.php
    1 text,Australia, #user country
    2 text,john, #first name
    3 text,doe, #surname
    4 text,john@domain.com #email
    5 text,active, #user status

    Updatenames.txt
    bob smith bob@domain.com

    I wanting make script can scan main.php, when detect line 2 or comment at end line 2 then it replace text between commas ,john, with first word first line updatenames.txt
    Next it detect line 3 or comment at end line 3 and replace word between commas line 3 ,doe, with second word line 1 updatenames.txt and same logic for line 4

    So final result in php file will be

    1 text,Australia, #user country
    2 text,bob, #first name
    3 text,smith, #surname
    4 text,bob@domain.com #email
    5 text,active, #user status

    The best competitors for the job I thiking are sed or awk command in linux, can anybody help me achieve this or have better solution please?
     
  2. OP
    OP
    treeplant

    treeplant Member

    Joined:
    Nov 7, 2007
    Messages:
    612
    My thinking is scanning txt file and putting each word into array so 3 array with 1 word each A B C array then going php file and replacings line 2 everything between , and , with content array A, then line 3 array B line 4 array C,
    And do loop to doing next line of 3 words in txt file and replace next custom line number in php

    Is this can be done in bash scripting anybody or help me with program this logic?
     
  3. neRok

    neRok Member

    Joined:
    Aug 19, 2006
    Messages:
    2,658
    Location:
    Perth NOR
    I can't help with the linux commands, but it would be easy with python, and doesn't linux come with python?

    But first, is there more than 1 name in each file, or is what you have posted the entire contents?
     
  4. OP
    OP
    treeplant

    treeplant Member

    Joined:
    Nov 7, 2007
    Messages:
    612
    Yes python would be very suitable too.

    Yes there is total of 90 names.
    The php file has about 8000 line numebrs

    The txt file has 90 lines each line has name surname email

    I can manually enter all the corresponding line number into the perl script which php lines need to be edited

    So I think logic will be like

    Start loop
    Make array A B C from line 1 txt file
    Scan php insert A into line 20 between , ,
    B into line 21 between , ,
    C into line 22 between , ,

    Continue loop replace A B C from line 2 txt
    Insert A into line (next instance)
    ......
    Until line 90 in txt file

    I can continue to make the loop once I have the start
     
  5. OP
    OP
    treeplant

    treeplant Member

    Joined:
    Nov 7, 2007
    Messages:
    612
    Unless can also do this to make it more smart but more difficult

    21 text,name, #name1
    22 text,surname, #surname1
    23 text,email, #email1
    ...
    ...
    61 text,name, #name2
    62 text,surname, #surname2
    63 text,email, #email2


    Perl script make array A B C
    Perl script scan php for:
    #Name1 replace everything going left between , , with Array A
    #Surname1 same as above replace with B
    Same method for C

    Then replace array A B C from txt line 2
    And same method as above for #name2 #surname2 #email2

    This way no need manually enter line numbers

    If it is possible matching comment line (#name) and work backwards going right to left to replace word between comma
     
    Last edited: Apr 19, 2018
  6. neRok

    neRok Member

    Joined:
    Aug 19, 2006
    Messages:
    2,658
    Location:
    Perth NOR
    Here's something, which runs on python 3.6. It uses regex to find chunks of first name, surname and email, and replaces them with the next entry in the updatenames file. If names run out in the updatenames file, it doesn't do anything with that chunk. If you wanted to delete that chunk, you would need to update the regex to grab the additional lines either side of the aforementioned chunk. Really though, you should probably have all this in a CSV file, and just have php read and parse that CSV file, and use that data to add to your database.

    main.php
    Code:
    1 text,Australia, #user country
    2 text,john, #first name
    3 text,doe, #surname
    4 text,john@domain.com, #email
    5 text,active, #user status
    6 text,Australia, #user country
    7 text,john, #first name
    8 text,doe, #surname
    9 text,john@domain.com, #email
    10 text,active, #user status
    11 text,Australia, #user country
    12 text,john, #first name
    13 text,doe, #surname
    14 text,john@domain.com, #email
    15 text,active, #user status
    16 text,Australia, #user country
    17 text,this, #first name
    18 text,won't, #surname
    19 text,change@domain.com, #email
    20 text,active, #user status
    updatenames.txt
    Code:
    bob1 smith1 bob1@domain.com
    bob2 smith2 bob2@domain.com
    bob3 smith3 bob3@domain.com
    python script
    Code:
    import re
    
    with open('main.php', 'r', encoding='utf-8') as f:
        main_php = f.read()
    
    updatenames_txt = open('updatenames.txt', 'r', encoding='utf-8')
    
    def replace(match):
        try:
            update_str = next(updatenames_txt)
        except StopIteration:
            # What to do if run out of replacement names?
            return match[0] # Return original text.
        first, surname, email = update_str.split()
        new_lines = match[1]+first+match[2]+surname+match[3]+email+match[4]
        return new_lines
    
    pattern = '^(\d+ \w+,)[^,]+(, ?#first name\r?\n^\d+ \w+,)[^,]+(, ?#surname\r?\n^\d+ \w+,)[^,]+(, ?#email\r?\n)'
    
    new_main_php = re.sub(pattern, replace, main_php, flags=re.MULTILINE)
    
    with open('new_main.php', 'w', encoding='utf-8') as f:
        f.write(new_main_php)
    
    the output (new_main.php)
    Code:
    1 text,Australia, #user country
    2 text,bob1, #first name
    3 text,smith1, #surname
    4 text,bob1@domain.com, #email
    5 text,active, #user status
    6 text,Australia, #user country
    7 text,bob2, #first name
    8 text,smith2, #surname
    9 text,bob2@domain.com, #email
    10 text,active, #user status
    11 text,Australia, #user country
    12 text,bob3, #first name
    13 text,smith3, #surname
    14 text,bob3@domain.com, #email
    15 text,active, #user status
    16 text,Australia, #user country
    17 text,this, #first name
    18 text,won't, #surname
    19 text,change@domain.com, #email
    20 text,active, #user status
     
    treeplant likes this.
  7. OP
    OP
    treeplant

    treeplant Member

    Joined:
    Nov 7, 2007
    Messages:
    612
    Just did some testing, and this is exactly the solution to my problem, thank you very much for creating this script for me
     
  8. OP
    OP
    treeplant

    treeplant Member

    Joined:
    Nov 7, 2007
    Messages:
    612
    I am wondering, if I add a line between one of the lines that are modified so there is a break between them then it breaks it.

    Would the line responsible for this be

    pattern = '^(\d+ \w+,)[^,]+(, ?#first name\r?\n^\d+ \w+,)[^,]+(, ?#surname\r?\n^\d+ \w+,)[^,]+(, ?#email\r?\n)'


    As it is trying to match the whole pattern of the 3 lines being together?

    This is something for me to consider for the future, also is it easily modifiable so that lines in the future can be put between the 3 lines that are modified?
     
  9. neRok

    neRok Member

    Joined:
    Aug 19, 2006
    Messages:
    2,658
    Location:
    Perth NOR
    The pattern line is a regular expression. It's a powerful way to match text. You should probably do some reading up on them. I'll explain the components in the pattern for you anyway...

    ^ matches the start of a line.
    \d+ \w+, is finding some digits followed by a space, then some letters (a word) followed by a comma, thus it matches 2 text,
    [^,]+
    is finding any characters except a comma, thus it matches the text between commas you want to replace, eg john
    , ?#first name
    is finding a comma, followed by an optional space, followed by the characters/word 'first name', thus it matches , #first name
    \r?\n
    is finding a line break (enter / newlines / whatever you want to call it) in either windows or linux format.
    Then the pattern repeats. The round brackets () are marking out capturing groups, which you can then reference later (which the script does, by adding them together with the new data in between).

    If you start adding in more lines between the 3 lines, or change the order of the 3 lines, then the pattern is going to become messy and unreliable. In that case, I would probably match the entirety of a persons data, say from country to country (eg lines 1-5, 6-10, 11-15, 16-20), then when you have that whole chunk, replace the bits you need.

    Here's a pattern for that (ps, I test my regex out here: https://regexr.com/3o7lq)
    Code:
    ^\d.+#user country[\s\S]+?(?=(?:^\d.+#user country)|\Z|^[^\d]|^$)
    ^\d you should recognise
    .+ searches for any characters except new lines and the like, and it will do so up to #user country
    [\s\S]+?
    searches for anything including new lines, the +? makes it find as few as possible
    (?= ) is a positive look-ahead, which means it is going to look ahead to find some other string
    In this case, it will cause the +? to grow until the look-ahead has been satisfied (ie, grabs all the lines)
    The bit inside the look-ahead is a bit tricky, because I had to include a few options (the | is an or character);
    (?:^\d.+#user country), searches for the next country line
    \Z will not work in all regex implementations, but in python, means end of file/string.
    ^[^\d] searches for a line that doesn't start with a digit (this one might be problematic if you have lines that start with a space and then a digit, or have some extra blank lines)
    ^$ searches for an empty line (^ being start, $ being end, and that means nothing in between).

    That should give you a good start. Now you will need to fix the script, and you can edit the replace() function to do 3 new replaces on each chunk the re.sub() gives it.
     
    Last edited: Apr 20, 2018
  10. OP
    OP
    treeplant

    treeplant Member

    Joined:
    Nov 7, 2007
    Messages:
    612
    This is really good information, thank you for not only writing the script for me but more importantly interpreting it which is the most useful for me to learn and expand on it.

    This is a excellent mini guide I am saving it for future reference
     

Share This Page