Whether you are parsing structured data, processing user input, extracting meaningful content, or data cleaning and normalization, you need to create a mechanism that will pull the string between quotes.
Here are four ways to extract a string between quotes in Python:
- Using regular expression
- Using str.split()
- Using custom parser
- Using startswith(), endswith(), and replace()
Method 1: Using regular expression
The “re” module in Python provides a re.findall() method that captures a group that matches any character except a double quote, zero or more times, and does the extraction.
import re def quote_extraction(str): return re.findall('"([^"]*)"', str) main_str = 'The festival of "Paryushana" is the "best" and "Holiest"' print(quote_extraction(main_str))
Output
['Paryushana', 'best', 'Holiest']
The findall() method returns a list of elements that are enclosed in double quotes(“”).
I would highly recommend this approach because it finds all matching substrings, not just the first one. Furthermore, it is concise and readable. However, The re.finall() method doesn’t handle escaped quotes within the string (e.g., “like \”this\””).
Time Complexity: O(n), where n is the length of the input string.
Space Complexity: O(m), where m is the total length of all matched substrings.
Method 2: Using str.split()
We can use the str.split() method to split the string for each double quotation mark which will return a list. Then, use the list comprehension to create a list containing all the strings that were enclosed in quotes.
def quote_extraction_split(str): parts = str.split('"') return [parts[i] for i in range(1, len(parts), 2)] main_str = 'The festival of "Paryushana" is the "best" and "Holiest"' print(quote_extraction_split(main_str))
Output
['Paryushana', 'best', 'Holiest']
If you are aware that input strings are well-formatted, and all quotes are properly paired then you can use this approach. However, this approach also does not handle escaped quotes and it will include empty strings if there are two quotes next to each other (” “).
Time Complexity: O(n), where n is the length of the input string.
Space Complexity: O(n + m), where m is the size of the quoted strings.
Method 3: Using a custom parser
In this approach, we will create a custom function called “quote_extraction_custom()” that maintains the custom parsing logic while providing the desired list output. It will correctly handle multiple quoted strings in the input and separate them into individual list elements.
def quote_extraction_custom(s): result = [] current_word = "" in_quotes = False for char in s: if char == '"': in_quotes = not in_quotes if not in_quotes and current_word: # End of a quoted section result.append(current_word) current_word = "" elif in_quotes: current_word += char return result main_str = 'The festival of "Paryushana" is the "best" and "Holiest"' print(quote_extraction_custom(main_str))
Output
['Paryushana', 'best', 'Holiest']
Time Complexity: O(n), where n is the length of the input string.
Space Complexity: O(n + m), where m is the total length of all quoted substrings.
Method 4: Using startswith(), endswith(), and replace()
You can use the combination of string methods such as startswith(), endswith(), replace(), split(), and list comprehension to create a list of elements containing only extracted strings from double quotes.
def quote_extraction_string_methods(str): words = str.split() # Split the string into words quoted_words = [word.replace('"', '') for word in words if word.startswith( '"') and word.endswith('"')] return quoted_words main_str = 'The festival of "Paryushana" is the "best" and "Holiest"' print(quote_extraction_string_methods(main_str))
Output
['Paryushana', 'best', 'Holiest']
You can use this approach, which is efficient and easy to read. It handles the case where quoted words are separated by spaces well.
Time Complexity: O(n), where n is the length of the input string
Space Complexity: O(n + m), where m is the total size of the quoted strings.
That’s all!