Updating an Excel file in Python

December 27, 2021

Client received a few Excel files of subscribers that needed to be sorted. They’re needing to pull the last name out and put it in its own column, so they can easily look up the subscribers. The issue with the Excel files is they want to sort the subscribers by last name in alphabetical order; however, the names are only listed under full name. To make it worse the last word in that cell might not be the last name, it could be ‘JR’ or even the town ‘Chicago’. Here’s a fake name example of a bad record:

“J T & Jane Q. Smith Jr. St. Louis”

The logic behind the program (high level):

The program opens the excel file and adds the 2 new columns ‘Last Name’ and ‘First Name’ these will be the 2 columns we’ll be filling in for sorting. In the above example we’ll be getting:

Last Name: Smith; First Name: J T & Jane Q.

They have a ‘Company Name’ column and to get that to be sorted I made a straight copy of that cell over to ‘Last Name’.

To check if the city is in the ‘Full Name’ column:

To find out if the city name is in ‘Full Name’, they have a city column I originally thought I can compare the 2 but found that some people had moved since the town was added to their name. An example of this would be if from the example above would say ‘St. Louis’ in ‘Full Name’ but under town it would be ‘Chicago’. To get around this I’m pulling all the towns into a set; a set only keeps the unique names. We’ll use this to run through the ‘Full Name’ to compare if that town is in the ‘Full Name’.  

Another issue with the towns is some are 2 words like ‘St. Louis’. Since the town names were always the last. I pulled the last two words from each ‘Full Name’ and checked that against the town names if the town is found, then set it to have the last two words removed.

Also needed to do a search if there was a suffix to the name, like SR. or JR. this was a pre-made list then compared and removed if found, had to go back and add more as I found them such as a business in the wrong spot with a LTD at the end.

The main purpose for this is to get the client something that is better setup then what they had. This will save a lot of time from somebody having to do all of this manually. It will still take a person to look through the file to finalize it.

GitHub for heavily commented code = https://github.com/amacher/excelPullLastName/blob/main/PullLastName.py

*To help them quickly get through I created a column in Excel and used a RIGHT() code to pull the last name in cell information.

Related Articles

Tic-Tac-Toe in JavaScript

Tic-Tac-Toe in JavaScript

To take a break from the other projects this weekend I worked on a Tic-Tac-Toe game in JavaScript. The basic instructions I had for myself was it needed to be able to be played by 1 or 2 people. I started with creating the game for 2 players. Thinking that if I can...

read more
Save posts to a WordPress Database – Working Notes

Save posts to a WordPress Database – Working Notes

This is notes/overview of a project I started on to add several posts to WordPress (WP) from HTML files without having to copy and paste each file. These are just notes from that dropped project and the basic Python script to do this (github). Make sure you fully test...

read more
CMYK converter added

CMYK converter added

CMYK is a tricky one, since each device would interpret the colors differently. Had almost decided not to even include it but thought it could be useful at times. Looking around I found a basic formula that most people were using to give a rough idea for the color. I...

read more

Pin It on Pinterest

Share This