Concurrency/Multi threading in Realm (RealmSwift Part 4)

https://realm.io/docs/swift/latest/

In this part we will cover following topics

  1. Need of Concurrency in Database
  2. Concurrency in Realm
  3. Showcase of Improved Application Responsiveness Using Concurrency
  4. Performance Comparison between CoreData and Realm
  5. Three Rules of Threading in Realm
  6. Recover Unused disk space taken by Realm by using both automatic and manual configuration
  7. Tradeoff between Performance and Realm file size using effective batch offset
  8. Realm autorefresh feature

This is the continuation of the previous part . If you know the basics operation in Realm you can continue this.

Why Concurrency

Supposing you are importing hundreds or thousands of records on the main thread from bundle into the Realm during the first launch of your application? The consequences could be dramatic. For example, your application could be killed by Apple’s watchdog for taking too long to launch and this significantly slows down the UI performance and might even lead to its complete freezing which is not good user experience. With the concurrency you can do importing task to some other thread which makes your main thread free and user can interact without knowing anything in the background.

Concurrency in Realm

Concurrency is the ability to work with the data on more than one queue at the same time. The work submitted to these queues is executed on a background thread. Realm made it easy for developers to do concurrency by ensuring data consistency as well.

Realm uses MVCC (Multiversion Concurrency Control) to provide concurrency without breaking I (Isolation)of ACID. Realm did some slight changes to original MVCC. In Realm, only a single writer can be operating at any time and will always work on the latest version — it cannot work on an earlier one

MVCC solves an important concurrency problem: in all databases there comes a time when someone is reading from a database while someone else is writing to it (for example, different threads can be reading or writing to the same database). This creates an inconsistency in the data — maybe when you’re reading records while a write operation is only partially complete. If the database allows this, the result you get back will be inconsistent with what eventually ends up in the database.

Getting Started

Download starter project with title “RealmConcurrency”. When application first launch you will redirect to login screen as shown in Figure 1. When you tap on Export it will load 47472 google play apps data into Realm as shown in Figure 1

Figure 1

Project Structure

ViewController → Responsible for login screen task. It has export button. On Tapping it will load data from csv file into Realm object and persist it through batch insert.

googleplaystore.csv → Contains all google play apps information which is total of 47472 as shown in Figure 2.

Figure 2

Observation / Problem

Make sure you delete the application first. Run the application and the login screen will appear. Tap on textfield and start typing and it is working fine. Now tap on export button top left on the navigation bar and immediately start typing again you will observe UI is hanging.The export operation takes several seconds(approx 4 sec), and it prevents the UI from responding to events such as typing.

Figure 3

What is happening

From creating 47472 Realm Objects from csv to persist them on disk was done on main thread as shown in Figure 4 was the reason of freezing the UI. When you tapped on export button following things will happened

  1. Loaded all the contents of the csv file which contains 47472 data into data variable on main thread
  2. Created GooglePlay Realm Objects on main thread and populate its properties as well and append in realmBatches array
  3. Batch inserted realm objects to disk using realm.add<S: Sequence> method on main thread

As you can see in the Figure 3 console logs approx 4 sec main thread blocked by this task and you can imagine what will be the user reaction when your application freezed almost 4 second.

Note: We did same operation on CoreData in this part which took 6 seconds on main thread , one of the reason to choose Realm it’s performance (due to its Zero-Copy architecture) but sqlite file size is 6.4 MB

Figure 4

Before we actually fix it we will perform some experiments as you can see in Figure 3 when we inserted 47472 objectsdefault.realm file size is 76 MB which is huge as compared to CoreData we will talk later on this . Now as shown in Figure 5 we deleted all Realm objects using deleteAll() method but still file size didn’t compact (76MB still) and the answer is

You should expect a Realm database to take less space on disk than an equivalent SQLite database, but in order to give you a consistent view of your data, Realm operates on multiple versions of a Realm. This can cause the Realm file to grow disproportionately if the difference between the oldest and newest version of data grows too big.

Realm will automatically remove the older versions of data if they are not being used anymore, but the actual file size will not decrease. The extra space will be reused by future writes.

If needed, the extra space can be removed by compacting the Realm file

Figure 5

As shown in Figure 6 we recovered space. The process is to copy the db to a temporary location, then copy it back and use the new default.realm file and the actual data that is taking on disk now reduced to 6.3 MB slightly less than what CoreData is taking. There are few things to note

  1. You can use Realm automatic compact feature Realm Docs / compacting-realms as well as shown in Figure 6.1 and to provide great performance, concurrency and safety advantages Realm file is always larger than the total size of the objects stored within it
  2. We used autoreleasepool to make sure the object will not be released before the end of the pool suggested by this .

Note: Run compact file code periodically on every app launch might be often enough, for application.

Figure 6

As shown in Figure 6.1 on every application launch it will compact the file if it is over 70MB in size and less than 50% ‘used’ . In our case file size is 75Mb which is over 70MB and the space used is 6.3 MB which is very less than 50% realm automatically compressed it to the actual size.

Note: if another process is accessing the Realm, compaction will be skipped even if the configuration block’s conditions were met. That’s because compaction cannot be safely performed while a Realm is being accessed.

Figure 6.1

As you can see in the Figure 7 on every iteration we are saving realm object into the disk means we performed 47472 transactions (db hit) as compared to previous one where we only perform one transaction and save it in batched. Due to this it blocked main thread approx 192 sec but the file size is 7.8 MB which is very less than as compared to batch insert. 😧 There are couple of things

  1. For every write transaction, Realm performs many tasks which includes transaction validity , checking consistency , send notification to observers, Refreshes the snapshots of all threads currently accessing the same file (if it’s autorefresh is true). This amount of overhead might seem negligible, but in our case we have 47472 transactions, this has serious performance considerations that’s the reason it is taking 192 sec
  2. Why file size is small as compared to previous one?. For the answer I will copy and paste this stackoverflow answer “. Realm’s memory layout algorithm requires that the file size be at least 8x the size of the largest single blob stored in the Realm file. When you add 40,000 objects in one transaction, you end up with a single transaction log entry that’s around 5MB in size. This means that the file has to be at least 40MB in size in order to store it. When you add one object in 40,000 transactions, you still end up with a single transaction log entry only this time it’s on a hundred or so bytes in size. This happens because when Realm commits a transaction, it attempts to first reclaim unused transaction log entries before allocating space for new entries. Since the Realm file is not open elsewhere, the previous entry can be reclaimed as each new commit is performed. realm/realm-core#2343 tracks improving how Realm stores transaction log entries to avoid the significant overallocation you’re seeing.”

Note: From 2.0 Realm transaction logs and data stored in same file with the extension realm.

Solution: Until this issue is open on Realm, split the difference between the two approaches and add groups of objects per write transaction. Batches of 1000 is the best size

Figure 7

As shown in Figure 7.1 we inserted in a batches of 1000 which takes 6.9 secs and without compact file size is 9.4 MB which contains logging history with the data.

Figure 7.1

As shown in Figure 8 we recovered space. The process is to copy the db to a temporary location, then copy it back and use the new default.realm file and the actual data that is taking on disk now reduced to 6.3 MB slightly less than what is taking previously with one by one insertion to the db

Figure 8

Improve Responsiveness Using Concurrency

Now coming to our main objective which is to remove unresponsiveness of the UI and as you can see we reduces the block time of main thread from 4 secs to 0 sec and fixed the unresponsiveness of the UI. There is one thing to note

We used autoreleasepool and wrapping work in autorelease pools is beneficial since GCD makes no guarantee as to how timely its own autorelease pools will be drained. Keeping Realm instances alive in undrained autorelease pools will result in higher than necessary footprint, and other related issues.

As Documented :Realm read transaction lifetimes are tied to the memory lifetime of Realm instances. Avoid “pinning” old Realm transactions by using auto-refreshing Realms and wrapping all use of Realm APIs from background threads in explicit autorelease pools.

Threading Rule 1 :

Use autoreleasepool when working on realm on background thread

Figure 9

Threading Rule 2

If you create an instance of a Realm, you can use it only on the same thread. If you want to use the same Realm on another thread, you have to create a new Realm instance on that other thread otherwise application will crash as shown in Figure 10

Figure 10

Threading Rule 3

If you get a Realm object, list, results, or any other Realm type from a Realm, you can use it only on the same thread. If you want to use the same object, or whatever on another thread, you have re-fetch the object or pass through ThreadSafeReference otherwise application will crash as shown in Figure 11

Figure 11

As shown in Figure 12 we passed object on one thread to another thread using ThreadSafeReference. Other option would be you can observe the changes using Realm’s reactive architecture!

Figure 12

Why can’t Realm objects be passed across thread

Since Realm is based upon a zero‑copy architecture, all objects are live and auto‑updating. If Realm allowed objects to be passed across threads, Realm would not be able to ensure data consistency because various threads could be attempting to change an object’s data at undetermined points in time. The data could become inconsistent very quickly. One thread may need to write to a value while another one is reading from it, and vice versa. This becomes problematic very quickly and you can no longer trust which thread has the correct object data.

Realm autorefresh

If autorefresh property true current realm automatically updated when changes happen in other threads.If set to true (the default), changes made on other threads will be reflected in this Realm on the next cycle of the run loop after the changes are committed. If set to false, you must manually call refresh() on the Realm to update it to get the latest data.

As shown in Figure 13 we added new object on main thread Realm and at the same time we are polling on background thread to get the changes but background Thread Realm was not notified even its autorefresh property is true and the answer is as documented “background threads do not have an active run loop and you will need to manually callrefresh() in order to update to the latest version, even if autorefresh is set to true.

Figure 13

As shown in Figure 14 by manually calling refresh() will update background thread realm with the latest data committed. There is no need to manually calling refresh() on main thread since main thread has run loop (if autorefresh = true)

Figure 14

Useful Links

Senior iOS Engineer | HungerStation | Delivery Hero