Detect and Extract URL from String in Java

In this tutorial, we will look at one of the interesting concepts of Java. Suppose you are given so long text with some URLs in it and your work is to fetch those URLs then how will you do that?
Here is a solution, We will use regular expressions, pattern, and matcher in Java. With the help of Regular Expression, we can provide the URL pattern using which Java matcher can match that pattern with given String and can fetch those also.

For detecting any URL pattern we have a common regular expression:
regex = “\\b(https://|www[.])[A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]”
There is a meaning behind every symbol. More info about symbols and characters of regex.

Algorithm:

  1. Provide String with URL in the main method.
  2. Create a list for storing fetched URLs from the string.
  3. Given the pattern to detect URLs.
  4. Compile that pattern with the help of the Pattern class in Java.
  5. Match that Pattern by using the Matcher class so that we can extract the String which is matching to that pattern.
  6. Provide some conditional statements to know if there is a match or not and if yes then add that URL to the list that we have created.
  7. And lastly, return that list. If empty return -1 or some empty message.
public static void main(String[] args) {
    String string = "Welcome to isro website https://www.isro.gov.in for your  https://www.geeksforgeeks.org with the help of  https://mail.google.com/";
    extractUrl(string);
}

In the main method, we have provided a string with URLs and passed that string as an argument to the method extractUrl().

List<String> list = new ArrayList<>();

We are creating a list in the method to store fetched URLs.

String regexString = "\\b(https://|www[.])[A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]";
Pattern pattern = Pattern.compile(regexString,Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(string);

Also gives a pattern to detect a URL and compile it with compile() method in the Pattern class. Now, Match the pattern with string to extract the URL. And if matched the add that URL to the list with the help of the below code.

while (matcher.find()) {
    list.add(string.substring(matcher.start(0),matcher.end(0)));
}

We are using the substring(start, end) method from the String class.  Give some conditional statements for knowing is there any URL is present or not and if yes then print those otherwise print some message.

if (list.size() ==0) {
    System.out.println("Empty list");
    return;
}
for(String str:list)
    System.out.println(str);

Java program to detect and extract URL from a string

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class Practice {

  public static void main(String[] args) {
      String string = "Welcome to isro website https://www.isro.gov.in for your  https://www.geeksforgeeks.org with the help of  https://mail.google.com/";
      extractUrl(string);
  }

  public static void extractUrl(String string) {
    
      List<String> list = new ArrayList<>();
            String regexString = "\\b(https://|www[.])[A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]";
      Pattern pattern = Pattern.compile(regexString,Pattern.CASE_INSENSITIVE);
      Matcher matcher = pattern.matcher(string);
      while (matcher.find()) {
    list.add(string.substring(matcher.start(0),matcher.end(0)));
      }
      if (list.size() ==0) {
    System.out.println("Empty list");
    return;
      }
      for(String str:list)
    System.out.println(str);
  }
  
}

That’s all for this tutorial.

Leave a Reply

Your email address will not be published. Required fields are marked *