To efficiently and quickly convert Set to Numpy Array, use the np.fromiter() method for large sets to avoid intermediate lists, reducing memory overhead and improving performance.
Another method is np.array(), which is also helpful, but it requires creating an intermediate list, making it less efficient.
Numpy arrays are more memory efficient than Python’s built-in data structures. Since the Set does not allow duplicated elements, the output numpy array will also have unique elements.
Method 1: Using np.fromiter()
The .fromiter() method accepts set and dtype arguments and returns the numpy array.
import numpy as np my_set = {x for x in range(1000000)} print(my_set) # Output: { 0 1 2 ... 999998 999999} print(type(my_set)) # Output: <class 'set'> array = np.fromiter(my_set, dtype=np.int32) print(array) # Output: [ 0 1 2 ... 999998 999999] print(type(array)) # Output: <class 'numpy.ndarray'>
Pre-allocation
The count argument in the np.fromiter() method controls how many elements are read from the iterable, and len(set) ensures that all of the set’s elements are converted to the NumPy array.
import numpy as np my_set = {11, 24, 43, 33, 21, 19} print(my_set) # Output: {33, 19, 21, 24, 43, 11} print(type(my_set)) # Output: <class 'set'> array = np.fromiter(my_set, dtype=np.float64, count=len(my_set)) print(array) # Output: [33. 19. 21. 24. 43. 11.] print(type(array)) # Output: <class 'numpy.ndarray'>
Method 2: Using numpy.array()
The numpy.array() function accepts an iterable and creates an array out of it. If you pass the set to np.array() directly, it returns a 1D numpy array. The Data type (dtype) is inferred, but you can set it explicitly.
Before using numpy.array(), convert the input set into the list using the list() method to ensure order. Then, pass the list to the numpy.array() method.
import numpy as np my_set = {1, 2, 3, 4} print(my_set) # Output: {1, 2, 3, 4} print(type(my_set)) # Output: <class 'set'> # Convert set to list first to ensure order array = np.array(list(my_set)) print(array) # Output: [1 2 3 4] print(type(array)) # Output: <class 'numpy.ndarray'>
Mixed Data Types
If your input Set contains mixed data types, the output numpy array will have an object-type array.
import numpy as np my_set = {"Krunal", 21, True} print(my_set) # Output: {True, 21, 'Krunal'} print(type(my_set)) # Output: <class 'set'> array = np.array(my_set) print(array) # Output: {True, 'Krunal', 21} print(array.dtype) # Output: object
You can see from the above output that the final array’s data type is an object.
Preserving Order via Sorting
If you want to preserve the order, you can use the sorted() function and pass the set to it, which returns a list.
In the next step, we pass that list to the numpy.array() method to get an array.
import numpy as np my_set = {3, 2, 4, 1} print(my_set) # Output: {3, 2, 4, 1} print(type(my_set)) # Output: <class 'set'> # Sorting before conversion array = np.array(sorted(my_set)) print(array) # Output: [1 2 3 4] print(type(array)) # Output: <class 'numpy.ndarray'>
Specifying Data Type (dtype)
If you want to control the output array data type, you can use the dtype argument in the np.array() method.
import numpy as np my_set = {3, 2, 4, 1} print(my_set) # Output: {3, 2, 4, 1} print(type(my_set)) # Output: <class 'set'> # Changing the array type to float32 array = np.array(list(my_set), dtype=np.float32) print(array) # Output: [1. 2. 3. 4.] print(type(array)) # Output: <class 'numpy.ndarray'> print(array.dtype) # Output: float32
In the above code, we changed the numpy array’s type to float32.
Handling empty sets
If your input set is empty, the output numpy array would be empty, too.
import numpy as np # Initializing an empty set my_set = set() print(my_set) # Output: {} print(type(my_set)) # Output: <class 'set'> # Converting an empty set to numpy array # Explicitly define dtype to int to avoid object arrays array = np.array(list(my_set), dtype=int) print(array) # Output: [] print(type(array)) # Output: <class 'numpy.ndarray'> print(array.dtype) # Output: int64
In the above code, the data type of an output numpy array’s element is int64, not object.
Structured Arrays
If you are working with a set of tuples, you can convert it to a structured numpy array using the np.array() and sorted() functions.
import numpy as np # Creating a set of tuples my_set = {(1, 'a'), (2, 'b')} print(my_set) # Output: {(2, 'b'), (1, 'a')} dt = [('id', int), ('name', 'U1')] print(type(my_set)) # Output: <class 'set'> # Converting a set of tuples to structured array array = np.array(sorted(my_set), dtype=dt) print(array) # Output: [(1, 'a') (2, 'b')] print(type(array)) # Output: <class 'numpy.ndarray'>
The above output shows that we get the array of tuples from a set of tuples.
Conclusion
For large sets, np.fromiter() is recommended for efficiency, while np.array(list(set)) remains a simple alternative for small sets.