Client received a few Excel files of subscribers that needed to be sorted. They’re needing to pull the last name out and put it in its own column, so they can easily look up the subscribers. The issue with the Excel files is they want to sort the subscribers by last name in alphabetical order; however, the names are only listed under full name. To make it worse the last word in that cell might not be the last name, it could be ‘JR’ or even the town ‘Chicago’. Here’s a fake name example of a bad record:
“J T & Jane Q. Smith Jr. St. Louis”
The logic behind the program (high level):
The program opens the excel file and adds the 2 new columns ‘Last Name’ and ‘First Name’ these will be the 2 columns we’ll be filling in for sorting. In the above example we’ll be getting:
Last Name: Smith; First Name: J T & Jane Q.
They have a ‘Company Name’ column and to get that to be sorted I made a straight copy of that cell over to ‘Last Name’.
To check if the city is in the ‘Full Name’ column:
To find out if the city name is in ‘Full Name’, they have a city column I originally thought I can compare the 2 but found that some people had moved since the town was added to their name. An example of this would be if from the example above would say ‘St. Louis’ in ‘Full Name’ but under town it would be ‘Chicago’. To get around this I’m pulling all the towns into a set; a set only keeps the unique names. We’ll use this to run through the ‘Full Name’ to compare if that town is in the ‘Full Name’.
Another issue with the towns is some are 2 words like ‘St. Louis’. Since the town names were always the last. I pulled the last two words from each ‘Full Name’ and checked that against the town names if the town is found, then set it to have the last two words removed.
Also needed to do a search if there was a suffix to the name, like SR. or JR. this was a pre-made list then compared and removed if found, had to go back and add more as I found them such as a business in the wrong spot with a LTD at the end.
The main purpose for this is to get the client something that is better setup then what they had. This will save a lot of time from somebody having to do all of this manually. It will still take a person to look through the file to finalize it.
GitHub for heavily commented code = https://github.com/amacher/excelPullLastName/blob/main/PullLastName.py
*To help them quickly get through I created a column in Excel and used a RIGHT() code to pull the last name in cell information.