Programming questions: Python

Discussion in 'The Common Room' started by jesheezy, Aug 14, 2015.

  1. jesheezy

    jesheezy NPC

    Messages:
    8
    Likes Received:
    1
    This is one of those my problem is a symptom not a cause.
    The cause is that I have several issues regarding a match and call thing.
    The help desk for the RE module in python has me greatly confused.
    Are they using re.search() to search the re object or invoking the re module? How about compile?

    If you want to see my code where i'm having problems here it is, but remember the real problem above:

    So everything is great until I start to use re.
    here is where i try to take two variables, e.g. pmfreqx of 24 and try to identify it
    Code:
    dict_month = {60:'Quin',36:'Trien',24:'Bi-an',12:'Annual(?<!Bi-)(?<!Semi-)',6:'Semi-An',3:'Quart',2:'Bi-Month',1:'Month(?<!Bi-)'}
    dict_week = {2:'Bi-Week',1:'Week(?<!Bi-)'}
    
    if pmfreq == "Weeks":
            freq_search_var = re.compile(dict_week[pmfreqx], flags=re.IGNORECASE)
    elif pmfreq == "Months":
            freq_search_var = re.compile(dict_month[pmfreqx], flags=re.IGNORECASE)
    This part all executes successfully up to this point.
    Here is where the debugger stops at: (2nd line in this block)
    Code:
            compid_match = 0
            compid_match = task_row[1].match(re.compile(compid))
    
    After trudging through some ideas for debugging, I'm kind of lost at this point. This errror is where I last left off and I'm not sure how to address it.

     
    Last edited: Aug 15, 2015
  2. Tigers

    Tigers High Priest Staff Member

    Messages:
    8
    Likes Received:
    6
    Hi!

    The reason the second line doesn't work, going on the error message you posted, is that task_row[1] is a string, and the .match() function is a property of a regex pattern object; i.e. the thing that is returned from re.compile(compid). If you switch them around it ought to work:
    Code:
    compid_match = re.compile(compid).match(task_row[1])
    As a future debugging tip, if you're ever in doubt about what type a variable is you can always invoke the type() function on the variable. E.g:
    Code:
    print type(task_row[1])
    would print <type 'str'>

    It also helps if the code snippet you submit is runnable from the get-go, even though it might crash. This is a good read on making it easier for people to debug any question you might pose in the future!
     
    Elf and jesheezy like this.
  3. jesheezy

    jesheezy NPC

    Messages:
    8
    Likes Received:
    1
    okay i see- so the match needs a regex object to work on, and the string for what it's looking in. Thanks a bunch!

    Sorry this is my 5th day programming with python and no experience with regular expressions so I'm kind of new to it.
    Elf gave me a regex tutorial that I'm trudging through.
     
    Elf likes this.
  4. Bemoliph

    Bemoliph High Priest Staff Member

    Messages:
    35
    Likes Received:
    14
    Cool, welcome to Python Land!

    Here are some additional resources for regex in Python specifically:
    • Google Developers Tutorial - A plain language introduction to regex and Python.
    • PyMOTW's Tutorial - Another plain language intro with A LOT more examples.
    • re HOWTO - The regex tutorial on the official Python site. A bit lengthy and technical.
    • re module - The official docs. It's all there, but probably hard to read.
    For some extra advice specific to your code, be careful with re.match(). It only matches at the beginning of the string, whereas .search() finds stuff anywhere more easily:
    Code:
    >>> import re
    
    >>> regex = re.compile("asd", flags=re.IGNORECASE)
    >>> text = "Hello, I am an asd thing!"
    
    >>> # .match() tries to match at the start of the string.
    >>> # It failed here because "asd" isn't right at the beginning.
    >>> regex.match(text)
    None
    
    >>> # .search() tries to match anywhere in the string.
    >>> # It found stuff and gave back a "match object" with the details.
    >>> match = regex.search(text)
    >>> match.group()
    'asd'
     
    Last edited: Aug 15, 2015
    jesheezy and Elf like this.
  5. Elf

    Elf Immortal Staff Member

    Messages:
    105
    Likes Received:
    6
    Location:
    Clark County, WA
    Glad @Tigers and @Bemoliph were able to help you get that sorted! Sorry -- been out at work and then pretty tired today.

    Just looking at your code snippet, once you get any other issues sorted out, you may consider precompiling the regex pattern objects and storing those in the dictionary instead of just the strings. This would avoid the need to compile it for each match.

    For example:
    Code:
    def re_compile_dict(d, f):
      for k in d:
        d[k] = re.compile(d[k], flags=f)
    
    [...]
    
    dict_month = {60:'Quin',36:'Trien',24:'Bi-an',12:'Annual(?<!Bi-)(?<!Semi-)',6:'Semi-An',3:'Quart',2:'Bi-Month',1:'Month(?<!Bi-)'}
    dict_week = {2:'Bi-Week',1:'Week(?<!Bi-)'}
    
    re_compile_dict(dict_month, re.IGNORECASE)
    re_compile_dict(dict_week, re.IGNORECASE)
    
    [...]
    
    if pmfreq == "Weeks":
      freq_search_var = dict_week[pmfreqx]
    elif pmfreq == "Months":
      freq_search_var = dict_month[pmfreqx]
    
     
    jesheezy likes this.
  6. Elf

    Elf Immortal Staff Member

    Messages:
    105
    Likes Received:
    6
    Location:
    Clark County, WA
    I think you have the loops down okay, but the problem is that you are exhausting readerTaskHeader the first time around.

    So, at the beginning of the program when you do:
    it sets up readerTaskHeader at the beginning of that file. On the first go-around of the outer loop, the inner loop reads to the end of that file. Then on the next iteration of the outer loop, it is still at the end of the file so there is nothing more to read and the loop exits.

    Simple but inelegant solution is to set up readerTaskHeader right before the inner loop rather than at the top of the code.

    A better way to do it, since you seem to plan on reusing the contents of that file multiple times (which presumably does not change during execution), would be to read the contents of that file into a list at the beginning (i.e. get it in memory), then just iterate over that list in the inner loop. For example:
    Code:
    taskHeader = open('taskheader.csv', encoding='windows-1252')
    readerTaskHeader = csv.reader(taskHeader)
    taskHeaders = []
    for row in readerTaskHeader:
      taskHeaders.append(row)
    
    [...]
    
      for task_row in taskHeaders:
       [...]
    
    I admit though I have not fully looked through the logic of what you are trying to do, so let me know if that is off base.
     
  7. jesheezy

    jesheezy NPC

    Messages:
    8
    Likes Received:
    1
    Another python question which I suspect may be caused by a fundamental misunderstanding of python loops:
    I successfully got it to iterate once.
    I debugged by using a lot of debug print lines to catch the logic, and I found that it loops 1 and 2 the first time as expected, then after that it only performs loop 1.

    Eventually I tracked down that it's resetting at the compid_match

    What am I missing here?

    the gist:
    Code:
    for row in readerCurrentFile: #LOOP 1
        # iterates through readerCurrentFile to define search variables
        [...]
        for task_row in readerTaskHeader: #LOOP 2
            # searches each row iteratively through readerTaskHeader
           # Match compid
                 #if no match, continue <<<- This is where it goes back to 1st loop
            [...]
            # Match task frequency
                 #if no match, continue
            [...]
            # once both of the above matches check out, will grab data I need (task_no)
            task_no = ""
            task_no = task_row[0]
            if task_row:
                break
        [...]
        # writes PM code
        print("Successful write of PM schedule row")
        print(compid + " " + dict_freq_names[str(pmfreqx) + str(pmfreq)] + ": " + pmid + " " + task_no)
    
    Both Loops:
    Code:
    import csv
    import re
    #Writes schedule
    csvNewPMSchedule = open('new_pm_schedule.csv', 'a', newline='')
    writerNewPMSchedule = csv.writer(csvNewPMSchedule)
    
    # Dictionaries of PM Frequency
    def re_compile_dict(d,f):
        for k in d:
            d[k] = re.compile(d[k], flags=f)
    
    dict_month = {60:'Quin',36:'Trien',24:'Bi-An',12:'Annual(?<!Bi-)(?<!Semi-)',6:'Semi-An',3:'Quart',2:'Bi-Month',1:'Month(?<!Bi-)'}
    dict_week = {2:'Bi-Week',1:'Week(?<!Bi-)'}
    dict_freq_names = {'60Months':'Quintennial','36Months':'Triennial','24Months':'Bi-Annual','12Months':'Annual','6Months':'Semi-Annual','3Months':'Quarterly','2Months':'Bi-Monthly','1Months':'Monthly','2Weeks':'Bi-Weekly','1Weeks':'Weekly'}
    
    re_compile_dict(dict_month,re.IGNORECASE)
    re_compile_dict(dict_week, re.IGNORECASE)
    
    # Unique Task Counter
    task_num = 0
    total_lines = 0
    
    #Error catcher
    error_in_row = []
    
    #Blank out all rows
    pmid = 0
    compid = 0
    comp_desc = 0
    pmfreqx = 0
    pmfreq = 0
    pmfreqtype = 0
    
    # PM Schedule Draft (as provided by eMaint)
    currentFile = open('pm_schedule.csv', encoding='windows-1252')
    readerCurrentFile = csv.reader(currentFile)
    
    # Loop 1
    for row in readerCurrentFile:
        if row[0] == "pmid":
            continue
        #defines row items
        pmid = row[0]
        compid = row[1]
        comp_desc = row[2]
        #quantity of pm frequency
        pmfreqx_temp = row[3]
        #unit of pm frequency, choices are: Months, Weeks
        pmfreq = row[4]
        #pmfreqtype is currently only static not sure what other options we have
        pmfreqtype = row[5]
        #pmnextdate is the next scheduled due date from this one. we probably need logic later that closes out any past due date
        pmnextdate = row[6]
        # Task Number This is what we want to change
        # pass
        # We want to change this to task header's task_desc
        sched_task_desc = row[8]
        #last done date
        last_pm_date = row[9]
        #
        #determines frequency search criteria
        #
        try:
            pmfreqx = int(pmfreqx_temp)
        except (TypeError, ValueError):
                print("Invalid PM frequency data, Skipping row " + pmid)
                error_in_row.append(pmid)
                continue
        #
        #defines frequency search variable
        #
        freq_search_var = ""
        if pmfreq == "Weeks":
            freq_search_var = dict_week[pmfreqx]
        elif pmfreq == "Months":
            freq_search_var = dict_month[pmfreqx]
        if not freq_search_var:
            print("Error in assigning frequency" + compid + " " + str(pmfreqx) + " " + pmfreq)
            error_in_row.append(pmid)
            continue
        #defines Equipment ID Search Variable
        print(compid + " frequency found: " + str(pmfreqx) + " " + str(pmfreq))
        compid_search_var = re.compile(compid,re.IGNORECASE)
        #
        # Matching function - search taskHeader for data
        #
        #PM Task Header Reference 
        taskHeader = open('taskheader.csv', encoding='windows-1252')
        readerTaskHeader = csv.reader(taskHeader)
        for task_row in readerTaskHeader:
            # task_row[0]: taskHeader pm number
            # task_row[1]: "taskHeader task_desc
            # task_row[2]: taskHeader_task_notes
            #
            # search for compid
            compid_match = ""
            compid_match = compid_search_var.search(task_row[1])
            if not compid_match:
                print(task_row[1] + " does not match ID for " + compid + ", trying next row.") #debug 2
                continue # <<< PROBLEM IS RIGHT HERE
            print("Found compid " + task_row[1]) # debug line
            #
            freq_match = ""
            freq_match = freq_search_var.search(task_row[1])
            if not freq_match:
                print(task_row[1] + " does not match freq for " + compid + " " + dict_freq_names[str(pmfreqx) + str(pmfreq)] + ", trying next row."ask #debug line
                continue
            print("Frequency Match: " + compid + " " + dict_freq_names[str(pmfreqx) + str(pmfreq)]) # freq debug line
            #
            task_no = ""
            print("Assigning Task Number to " + task_row[0])
            task_no = task_row[0]
            if task_row:
                break
        #
        #error check
        #
        if not task_no:
            print("ERROR IN SEARCH " + compid + " " + pmid)
            error_in_row.append(pmid)
            continue
        #
        # Writes Rows
        #
        writerNewPMSchedule.writerow([pmid,compid,comp_desc,pmfreqx,pmfreq,pmfreqtype,pmnextdate,task_no,sched_task_desc,last_pm_date])
        print("Successful write of PM schedule row")
        print(compid + " " + dict_freq_names[str(pmfreqx) + str(pmfreq)] + ": " + pmid + " " + task_no)
        print("==================================================================")
    # for row in error_in_row:
    #    writerNewPMSchedule.writerow(["Error in row:",str(error_in_row[row])])
    #    print("Error in row: " + str(error_in_row[row]))
    print("Finished")
    
    
    # def pm_searcher(k,r):
    #    m = k.search(r)
    #    if not m:
    #        continue
    
     
    Last edited: Aug 17, 2015
  8. Elf

    Elf Immortal Staff Member

    Messages:
    105
    Likes Received:
    6
    Location:
    Clark County, WA
    Merged the two threads due to continuing course of discussion and answers. See if post #6 helps.

    Sorry, just saw you split it out intentionally! But I had already posted a reply here. Anyways, let me know how that goes.
     
  9. jesheezy

    jesheezy NPC

    Messages:
    8
    Likes Received:
    1
    I was researching Stack Overflow and found that python closes out the file after the loop so I added that in. Corrected code but problem still persists.
     
  10. Elf

    Elf Immortal Staff Member

    Messages:
    105
    Likes Received:
    6
    Location:
    Clark County, WA
    Hm, so the intent is that continue goes to the next row from "for task_row in readerTaskHeader"?

    I believe it should do that -- to my understanding continue/break apply to the loop they are immediately in rather than any outer containing loops. You are saying that, after that, it does not proceed to the next row but rather breaks out of the loop?

    At this point it seems like some sort of a logic problem. Hard to assist without having the full dataset, etc. What I would do is run the program under a debugger and set a breakpoint there. There is pdb, the command line debugger, but you may also see if you can find a nice IDE that has an integrated debugger that shows things graphically. Perhaps @Bemoliph or @Tigers have suggestions?

    Once you do that and set the breakpoint, just check the state of the variables to see what is not expected.
     
  11. Bemoliph

    Bemoliph High Priest Staff Member

    Messages:
    35
    Likes Received:
    14
    For Python IDEs on Windows, PyCharm has a free community edition that works well and has a debugger. There are also plugins for more "generic" IDEs you might already be using, like PyDev for Eclipse, NBPython for NetBeans, or even Visual Studio. See here and here for more options.

    If IDEs aren't your thing, good free text editors on Windows include Notepad++, Sublime Text 3, or Atom.io. You won't get much code inspection or auto-complete and almost definitely no debugging this way, though.
     
    Elf likes this.
  12. jesheezy

    jesheezy NPC

    Messages:
    8
    Likes Received:
    1
    I was previously using notepad++ and IDLE.
    Ran it in pycharm and it gives this error:
    which is located here:
    Code:
            if not freq_match:
                print(task_row[1] + " does not match freq for " + compid + " " + dict_freq_names[str(pmfreqx) + str(pmfreq)] + ", trying next row." #debug line
                continue
     
    Last edited: Aug 17, 2015
  13. Elf

    Elf Immortal Staff Member

    Messages:
    105
    Likes Received:
    6
    Location:
    Clark County, WA
    Oh, easy one. You have an unmatched parenthesis on the preceding line.

    Might be the sort of thing pylint would catch?
     
  14. Bemoliph

    Bemoliph High Priest Staff Member

    Messages:
    35
    Likes Received:
    14
    You're missing the closing ) on print(). It should look like:
    Code:
    print(task_row[1] + " does not match freq for " + compid + " " + dict_freq_names[str(pmfreqx) + str(pmfreq)] + ", trying next row.") #debug line
    Since you're using PyCharm, pay attention to red, yellow, and green squiggles under the code (think MS Word misspellings), and to the similarly colored lines on the scrollbar. These point out issues you can mouse over for suggestions.

    [​IMG]

    EDIT: To expand:
    • Red squiggles point out syntax errors you MUST fix before the script can run.
    • Yellow squiggles point out warnings like "you never actually use this" or bad formatting.
    • Green squiggles point out spelling errors. You can turn it off for variable names in Settings > Editor > Code Style > Inspections > Spelling: Type > uncheck "Process code".
     
    Last edited: Aug 17, 2015
  15. jesheezy

    jesheezy NPC

    Messages:
    8
    Likes Received:
    1
    okay when i opened up pycharm i must have been in a wrong window because apparently i must have been typing into the code lol.
    I deleted that extra text, and it runs fine on pycharm. However, running it on IDLE causes the aforementioned problem. ????
     
  16. Bemoliph

    Bemoliph High Priest Staff Member

    Messages:
    35
    Likes Received:
    14
    PyCharm uses whatever Python interpreter you have installed, so it should run the same. Are you sure you're running the newest code in both?
     
  17. jesheezy

    jesheezy NPC

    Messages:
    8
    Likes Received:
    1
    Does anyone have recommendations on IDE that has UML diagramming capabilities?
     
  18. Elf

    Elf Immortal Staff Member

    Messages:
    105
    Likes Received:
    6
    Location:
    Clark County, WA
    UML diagramming is tough, I think a lot of the more comprehensive tools I have seen for it have gone unmaintained since the early 2000s, ArgoUML being the one off the top of my head. Basically for me it is either Dia (Linux/X) or the ubiquitious Visio (Windows) and just drawing them graphically instead of generating them from text. I certainly do not touch UML to code generation; my approach to UML is to use it lightly.

    If you really want it inside of an IDE for some reason, NetBeans apparently has a UML plugin, as does Eclipse, but I have never used the UML tools in either.
     
  19. Bemoliph

    Bemoliph High Priest Staff Member

    Messages:
    35
    Likes Received:
    14
    NetBeans UML support was dropped over 5 years ago in 6.5 and is no longer compatible post-6.7 (now 8.0). I checked out some of the other NetBeans plugins, but they didn't really work or do what I hoped.

    Eclipse's UML support worked well enough with the right plugin back in 2011-2012, but I've forgotten which plugin it was.
     
  20. Elf

    Elf Immortal Staff Member

    Messages:
    105
    Likes Received:
    6
    Location:
    Clark County, WA
    Ah, well, there you go. I think UML has really dropped out of favor lately which explains the state of the tools. But I do think there is a place for using it at a basic level; just perhaps not to the depth taught in the average CS curriculum.