Scraping and Finding Ordered words in a dictionary in Python
In this tutorial, we will learn the scraping and finding ordered words in a dictionary in Python. An ordered word is a word in which the alphabets in the word appear in the alphabetical order. Scraping and Finding Ordered words will use 2 different functions.
For Example, aam and aals are ordered words while abacus is not an ordered word. We are going to use ‘.txt’ file which is like a dictionary and contains words.
Installing the libraries
pip install requests
The entire code is divided into 2 sections:
- Firstly, we will scrape the URL which contains the .txt file.
- Secondly, we will write a function to get the ordered words from the .txt file.
Scraping the website
We need to scrape the website which contains a dictionary of words. The data that we will scrape is in a .txt file.
def scrapeWords(): scrape_url = "https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt" scrapeData = requests.get(scrape_url) listofwords = scrapeData.content listofwords = listofwords.decode("utf-8").split() return listofwords
The request method is used to get the URL of the website mentioned above it. The requests method we fetch the data on the website. The content method will be used to extract the data from the website. Then we will decode the UTF-8 encoded text and split the string to turn it into a list of words.
Finding Ordered Words Function
def isOrdered(): collection = scrapeWords() collection = collection[:100] word = '' for word in collection: result = 'Word is ordered' i = 0 l = len(word) - 1 if (len(word) < 3): continue while i < l: if (ord(word[i]) > ord(word[i+1])): result = 'Word is not ordered' break else: i += 1 if (result == 'Word is ordered'): print(word,': ',result)
We will assign a variable called ‘collections’ which will be an object for the function ‘scrapeWords()’ which we defined earlier. We have taken just the first 100 words of the file.
If the word is just a single alphabet or a two letter alphabet we will skip this word as it is not sufficient to check whether it is ordered or not. The ‘while’ loop will go through all the words in the collections list and check for the order.
Here we are only printing the ordered words but if you want you can print the Unordered words by replacing the last block of code. The ‘if’ condition needs to be changed for the desired results.
if (result != 'Word is ordered'): print(word,': ',result)
The above code will give us the unordered words. You can also change the number of words through which we will pass the ordered function.
Now to finally execute the entire function we use the following code:
if __name__ == '__main__': isOrdered()
By running the above line of code we don’t need to enter any input and the file automatically runs the ordered function and indirectly runs the scrape function as well.
Entire Code
import requests def scrapeWords(): scrape_url = "https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt" scrapeData = requests.get(scrape_url) listofwords = scrapeData.content listofwords = listofwords.decode("utf-8").split() return listofwords def isOrdered(): collection = scrapeWords() collection = collection[:100] word = '' for word in collection: result = 'Word is ordered' i = 0 l = len(word) - 1 if (len(word) < 3): continue while i < l: if (ord(word[i]) > ord(word[i+1])): result = 'Word is not ordered' break else: i += 1 if (result == 'Word is ordered'): print(word,': ',result) if __name__ == '__main__': isOrdered()
If you want you can also refer to the below link to find the number of unique characters in a string in Python,
Leave a Reply