Dmytro Hrimov

Senior software engineer

How to: Copy data in Python

Understanding Shallow vs Deep Copy in Python

When working with data structures in Python, particularly lists and dictionaries, you may find yourself needing to copy an object. However, it's crucial to understand the difference between shallow and deep copying first to avoid unintended side effects in your code.

What is a Shallow Copy

A shallow copy creates a new object, but it inserts references into it to the objects found in the original. This means that while the new object is a separate instance, the elements within it are still references to the same objects as those in the original. In Python, you can create a shallow copy using the copy module:

import copy

original_list = [1, 2, [3, 4]]
shallow_copied_list = copy.copy(original_list)

In this example, shallow_copied_list is a new list, but it contains references to the same nested list [3, 4] found in original_list. Modifying the nested list in either original_list or shallow_copied_list will affect both lists.

What is a Deep Copy

In contrast, a deep copy creates a new object and recursively copies all objects found in the original, creating fully independent copies. This ensures that modifications to the new object do not affect the original object. You can create a deep copy using the copy module as well:

import copy

original_list = [1, 2, [3, 4]]
deep_copied_list = copy.deepcopy(original_list)

Here, deep_copied_list is a completely independent copy of original_list, including the nested list. Changes to the nested list in deep_copied_list will not affect original_list, and vice versa.

How to Make Shallow Copies in Python

In addition to using the copy module from the example above, there are several other ways to create shallow copies of objects in Python. Here are a few common methods:

Using the copy() Method

Python lists and dictionaries have a built-in copy() method that returns a shallow copy of the original object:

original_dict = {'a': 1, 'b': {'c': 2}}
shallow_copied_dict = original_dict.copy()

print(shallow_copied_dict)  # Output: {'a': 1, 'b': {'c': 2}}

Using List Slicing

You can create a shallow copy of a list using slicing. This method is concise and commonly used for copying lists:

original_list = [1, 2, [3, 4]]
shallow_copied_list = original_list[:]

print(shallow_copied_list)  # Output: [1, 2, [3, 4]]

Using the list() Function

Another way to create a shallow copy of a list is by using the list() function:

original_list = [1, 2, [3, 4]]
shallow_copied_list = list(original_list)

print(shallow_copied_list)  # Output: [1, 2, [3, 4]]

Using the dict() Function

To create a shallow copy of a dictionary, you can use the dict() function:

original_dict = {'a': 1, 'b': {'c': 2}}
shallow_copied_dict = dict(original_dict)

print(shallow_copied_dict)  # Output: {'a': 1, 'b': {'c': 2}}

How to Make Deep Copies in Python

Other than using copy module, like in the example above, here are some other options:

Using Custom Deep Copy Functions

For specific use cases, you might want to create a custom deep copy function. This can be useful if you need to handle certain types of objects in a particular way or if you want to avoid importing the copy module.

Here's an example of a simple custom deep copy function for lists:

def custom_deepcopy(obj):
    if isinstance(obj, list):
        return [custom_deepcopy(item) for item in obj]
    else:
        return obj

original_list = [1, 2, [3, 4]]
deep_copied_list = custom_deepcopy(original_list)

print(deep_copied_list)  # Output: [1, 2, [3, 4]]

This function checks if the object is a list and recursively copies each element. For non-list objects, it returns the object itself.

Using JSON Serialization

Another method to create deep copies is by using JSON serialization and deserialization. This method involves converting the object to a JSON string and then back to a Python object. It's particularly useful for data structures that are easily representable in JSON format.

Here's how you can use JSON serialization for deep copying:

import json

original_list = [1, 2, [3, 4]]
deep_copied_list = json.loads(json.dumps(original_list))

print(deep_copied_list)  # Output: [1, 2, [3, 4]]

This approach works well for lists and dictionaries but may only be suitable for some complex objects or objects that can be serialized to JSON.

I am not suggesting it is a good Pythonic way of doing things, but it is still an option!

Copying Custom Objects

When working with custom objects, you can define your own deep copy method within the class. This method allows you to control exactly how the object and its attributes are copied.

Here's an example:

class CustomObject:
    def __init__(self, value, nested):
        self.value = value
        self.nested = nested

    def __deepcopy__(self, memo):
        # Create a new instance of CustomObject
        copied_object = CustomObject(self.value, copy.deepcopy(self.nested, memo))
        return copied_object

original_object = CustomObject(1, [2, 3])
deep_copied_object = copy.deepcopy(original_object)

print(deep_copied_object.nested)  # Output: [2, 3]

In this example, the __deepcopy__ method is defined within the CustomObject class to handle deep copying of its attributes.

Conclusion

Understanding the difference between shallow and deep copying in Python is essential for managing data structures and avoiding unintended side effects in your code. Shallow copies create new objects with references to the original elements, while deep copies create fully independent copies of the original objects. By using the appropriate copying method, such as list slicing, the copy() method, or custom deep copy functions, you can ensure your data is managed correctly according to your specific use case.