Evernote scraping 3

After extracting the contents into multiple lists. The next is to extract each individual list content containing the lift data for the day. To do this, I’ve created loop to iterate each of the individual list and a nested loop to iterated through the values of the individual lists.

for lists in threelist2:
    for value in lists:

Now within the nested loop I added a few if statements to sort if the given value is a lift or a date. Using the regex variables from earlier.

if regex_dates.search(value):
            print(f"{value}, Date")

if regex_squat.search(value):
            print(f"{value}, lift")

I used the print statements before turning to an csv file, so find any problems seeing statements in my command prompt. The f stings help making the statements printed out clearer. The value variable is simply the individual item in the individual list. By printing the variable in the f-string it prints out the single item from the list. Using the regex statements from earlier it prints a lift or date tag alongside the value.

lift data ex.png

I decided to print the values like this because, it allows me to see how the program categorised the values, for example if I see a lift value attached to a date tag I know there is a problem so I need to check it out.

As I was running the program I found out that, there are a few values that are not under the lift or data category. So they would be labelled incorrectly. So new if statement was added to label some values as ‘non-lift’ which is separate from lifts or date.

A problem I had to deal with nested lists. As values shifted to previous lifts turn into lists rather than staying as strings. As mentioned in the previous article. To solution I found is to use the isinstance function to sort out functions from lists. The isinstance checks if an item is a certain type of object. In my case list. Under the isinstance using the master regex which contains all of the lift patterns in a single form. It will filter out values which are lift or non-lift values.

if isinstance(value, list):
   print(f"{value[0]}, nested-list")

            if master_regex.search(value[0]):
                print(f'{value[0]}, lift')
                pass
            else:
                print(f'{value[0]}, non-lift')

            continue

The value[0] is to access the single item in the nested list. As just printing value will just print the nested list itself. Also will stop the regex from working correctly as it takes strings as input.

Tobi Olabode