Scraping and Finding Ordered words in a dictionary in Python

In this tutorial, we will learn the scraping and finding ordered words in a dictionary in Python. An ordered word is a word in which the alphabets in the word appear in the alphabetical order. Scraping and Finding Ordered words will use 2 different functions.
For Example, aam and aals are ordered words while abacus is not an ordered word. We are going to use ‘.txt’ file which is like a dictionary and contains words.

Installing the libraries

pip install requests

The entire code is divided into 2 sections:

  1. Firstly, we will scrape the URL which contains the .txt file.
  2. Secondly, we will write a function to get the ordered words from the .txt file.

Scraping the website

We need to scrape the website which contains a dictionary of words. The data that we will scrape is in a .txt file.

def scrapeWords(): 
    scrape_url = "https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt"
    scrapeData = requests.get(scrape_url) 
    listofwords = scrapeData.content 
    listofwords = listofwords.decode("utf-8").split() 
    return listofwords

The request method is used to get the URL of the website mentioned above it. The requests method we fetch the data on the website. The content method will be used to extract the data from the website. Then we will decode the UTF-8 encoded text and split the string to turn it into a list of words.

Finding Ordered Words Function

def isOrdered(): 
    collection = scrapeWords() 
    collection = collection[:100] 
    word = '' 

    for word in collection: 
        result = 'Word is ordered'
        i = 0
        l = len(word) - 1
    if (len(word) < 3):
        continue 
    while i < l:		 
        if (ord(word[i]) > ord(word[i+1])): 
        result = 'Word is not ordered'
        break
    else: 
        i += 1
      
    if (result == 'Word is ordered'): 
    print(word,': ',result)

We will assign a variable called ‘collections’ which will be an object for the function ‘scrapeWords()’ which we defined earlier. We have taken just the first 100 words of the file.

If the word is just a single alphabet or a two letter alphabet we will skip this word as it is not sufficient to check whether it is ordered or not. The ‘while’ loop will go through all the words in the collections list and check for the order.

Here we are only printing the ordered words but if you want you can print the Unordered words by replacing the last block of code. The ‘if’ condition needs to be changed for the desired results.

if (result != 'Word is ordered'):
    print(word,': ',result)

The above code will give us the unordered words. You can also change the number of words through which we will pass the ordered function.

Now to finally execute the entire function we use the following code:

if __name__ == '__main__': 
    isOrdered()

By running the above line of code we don’t need to enter any input and the file automatically runs the ordered function and indirectly runs the scrape function as well.

Entire Code

import requests 

def scrapeWords(): 
 
  scrape_url = "https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt"
  scrapeData = requests.get(scrape_url) 
  listofwords = scrapeData.content 
  listofwords = listofwords.decode("utf-8").split() 
  return listofwords 

def isOrdered():  
  collection = scrapeWords() 
  collection = collection[:100] 
  word = '' 

  for word in collection: 
    result = 'Word is ordered'
    i = 0
    l = len(word) - 1
    if (len(word) < 3):  
      continue
    while i < l:		 
      if (ord(word[i]) > ord(word[i+1])): 
        result = 'Word is not ordered'
        break
      else: 
        i += 1 
    if (result == 'Word is ordered'): 
      print(word,': ',result) 

if __name__ == '__main__': 
  isOrdered()

If you want you can also refer to the below link to find the number of unique characters in a string in Python,

Count the number of unique characters in a string in Python

Leave a Reply

Your email address will not be published. Required fields are marked *