I was recently asked by a friend for some help making an upload for a microtik router. There was only one problem, there were 16,346 items in his list. He had given me a list of websites formatted to block url's using a host file on individual machines.
The list was cleaned up some, but there were still parts that were commented out. I was given a text file. The first thing I did was change the extension to a csv file. I made a copy of the text file and then renamed "blocklist.txt" to "blocklist.csv".
For those new to csv files, they can be opened and edited in Microsoft Excel. They are single worksheet files. If a new worksheet is added to the file, it will not save correctly and data will be lost. Because a csv file is a comma delimited file, any text following a comma will be pushed in to the next column. Fortunately url's do not have comma's in them. If there are any comma's in the file, they would be part of a note, which we need to remove anyways. I just selected columns B through H and deleted everything in those columns to ensure they were empty.
One other drawback to csv files, cell equations do not persist. When saved, the equations are changed to the value of the equation in the cells. At this point, I had to save the csv file as an xlsx file so I could come back later to make changes if I needed too. This is easily done by going to "Save As" -> "Excel Workbook". Here is where the real work begins.
I inserted some column heads in the first row so I can sort my data using the "Filter" button from the "Data" ribbon. I then used "=IFERROR( first equation, second equation )" to return numeric values on items in the column A. With "IFERROR", if the first equation returns an error, it will go to the second equation to return a value. The first equation is a "FIND" function. The notes are all preceded by a "#". My equation will be "=FIND( "#", A2)", where A2 is the field I am searching in for the "#". Not every field has a "#" and will produce an error if it does not find one. This brings us to the second equation, a simple length function. "LEN( A2 )" where A2 is the field we are checking. This does nothing more than return the number of characters in the field.
All together the equation looks like:
=IFERROR( FIND( "#", A2 ) - 1, LEN( A2 ) )
If I were to write it out, it would look like this:
=if there is an error in ( ( my find function that gets the number of characters until it finds the first "#", subtracting one because I don't want the "#" counted as part of the length of this field ) then goto ( my length function that just tells me how many characters are in this field ) )
Column "C" will contain the base url list from column "A", minus the note's. We will use the "=LEFT( text, number )" function, which will return the text from left to right for the number of characters defined in the function. The function will look like "=LEFT( A2, B2 )".
Unfortunately there are spaces left after the text. To get rid of this white space we will use the "=TRIM( field )" function. This TRIM function will go around our LEFT function to reduce the number of columns we will need to use. The function looks like:
=TRIM( LEFT( A2, B2 ) )
=remove the spaces from the front and back of ( text from left to right ( from this text , at this number of places ) )
Now we have our base url. If all we needed was just a list we would be done, but the microtik router requires a script to run, not a list. The list must be formatted like:
In field E2, place the text "add dst-host=", and field E3 place " action=deny". Column "D" will contain the formatted list. It is a simple equation, "=E$2&C2&E$3". The "$" placed after the "E" in the equation will keep the row number from changing when copying the equation to the below fields.
Now the url's are formatted, we can copy the list in column "D" and paste into a text file.
Taking raw data and formatting it into a script that a microtik could understand took about 5 minutes to create using Excel. This is just another example of the flexibility that Excel has, and demonstrates some of the file types that it supports. It is the simple projects that I don't want to spend a lot of time on that makes me respect Excel more every time I use it.