Unlimited Offline Access: How We Improved Tarteel’s Performance by Offloading JSON Transformation to SQL

Offline features were causing slowdowns and crashes with increased user data. We improved performance by offloading calculations to a multithreaded native module and switching from in-memory JSON to SQLite for data storage. While migrating to SQL, we found a trick that sped up data retrieval by 69%.

Unlimited Offline Access: How We Improved Tarteel’s Performance by Offloading JSON Transformation to SQL

Background

Memorizing the Quran, known as Hifz, demands a dedicated and consistent time commitment. In today's fast-paced world, many people make use of small pockets of time throughout their day — whether on the go or away from home — to memorize and review. However, we also recognize that a fast, stable internet connection isn't always within reach for everyone. At Tarteel, our mission is to empower Muslims to engage with the Quran — whether through memorization, revision, or other forms — anytime and anywhere. That's why one of our guiding principles in developing the Tarteel app is ensuring that its core features are fully accessible offline.

About a year after adding offline support to some of our features, a few of our power users started reporting that the app was becoming extremely slow and frequently crashing. These were users who had been using Tarteel extensively for many months — some for more than 4 hours every day! We recognized this as a critical warning sign; if we didn’t fix the issue, even our less active users would eventually be affected as their activity history grew. In this blog post, we explore some of the efforts we took to restore Tarteel’s performance for users with large amounts of account data. In particular, we highlight a key insight we gained regarding data retrieval from SQLite, which resulted in a 69% performance improvement.

Statistics Engine

One key area of functionality that we’ve already made available offline is our statistics engine. Whether it’s updating your streak, tracking your progress, or finding out what you need to recite next to achieve your goal, all of this is calculated in the app and works offline. Unfortunately what we found is that for our power users (some of whom recite to Tarteel’s AI for more than 4 hours every day!) the performance of the app declined over time.

To understand why the performance was declining, we have to consider how the Tarteel app is built. We build the Tarteel app using React Native, a popular framework for building cross-platform apps using JavaScript. React Native’s architecture does not support multithreading on its main JavaScript thread. As users accumulated more account data in the app, calculating statistics took longer, blocking the main JavaScript thread and causing the app to feel slow and unresponsive. Additionally, our app previously stored all user data in-memory as JSON, leading to increased memory usage as account data grew and further impacting performance.

The solution we decided to take consisted of two theoretically straightforward components, each one solving one of the two mentioned problems:

  1. Do calculations in a native module where we don’t block the JavaScript thread and can make use of multithreading.
  2. Store user data on disk using SQLite instead of in-memory, only accessing it when needed and retrieving as little data as possible.

Optimizing SQL query time

The benefit to keeping data in-memory as JSON is that access time is very fast for a JavaScript application. Storing the data in SQLite and retrieving it when needed means we need to be mindful of how much data is retrieved by each query. As we migrated our app from using in-memory JSON to SQLite, we found that for certain features it’s unavoidable that we need to fetch and process a large number of rows from SQLite in JavaScript. Unfortunately, upon initial implementation we found this to be too slow and the processing being done in JavaScript was blocking the JavaScript thread noticeably.

The impact was particularly noticeable when retrieving the user’s QuranSession list. A QuranSession is the object we use to represent a period of reading, reciting, or listening to the Quran and it has a one-to-many relationship with any recitation mistakes made during that session. Many of our active users have tens of thousands of Quran sessions and tens of thousands of mistakes. Below we discuss how we debugged the performance of retrieving all active (not deleted) Quran sessions and their associated mistakes in JavaScript and the innovative solution we found to improving performance.

Initial Performance Breakdown

We started by breaking down the performance of our getActiveQuranSessions function into individual steps for a user with 19,394 active Quran sessions and 2,796 mistakes. The function was retrieving the data and returning it as a JS object with keys being the session ids and values being the session data, including an array of mistake objects if any. Below is the outcome from a sample performance measurement:

 LOG [446.30 ms] - (1) retrieve the active sessions from SQLite
 LOG [274.17 ms] - (2) map the retrieved sessions into the desired JS object structure
 LOG [ 61.71 ms] - (3) retrieve the mistakes of the active sessions from SQLite
 LOG [ 40.06 ms] - (4) map the retrieved mistakes into the desired JS object structure
 LOG [822.24 ms] - getActiveQuranSessions total

The above logs revealed that the most time-consuming operations were fetching sessions (1) and mapping them to the desired format (2). Given that our dataset contained 19,394 sessions and 2,796 mistakes, it was evident that the volume of data being processed was a major contributor to the function’s latency. Of particular concern was that the mapping and grouping of sessions as well as mistakes was being done on the JavaScript thread, making it a blocking operation and further impacting the user experience.

Exploring Potential Solutions

To get a better understanding of the SQL query time, we explored four different changes:

  1. Removing ORDER BY and WHERE Clauses: We first measured the raw SELECT performance by removing the ORDER BY and WHERE clauses. This didn't lead to any significant improvement in query time.
  2. Using Prepared Statements: We tried using prepared statements, but the performance remained roughly the same.
  3. Using Drizzle: Experimenting with Drizzle, a SQL query builder, yielded slightly slower performance (~1000ms function time).
  4. Only retrieving one column: Retrieving only the id column of the QuranSession table led to the session query time dropping to 71.50ms.

While experimenting with Drizzle, we noticed that there wasn't much difference in performance whether we included the mistakes in our sessions query or not. This observation led us to investigate further how joins and data transformations were being handled. From experimenting with retrieving a single column, we confirmed that the amount of data or amount of columns was having an impact on query time (as we expected). We further confirmed this by selecting the id into multiple columns with select id as id1, id as id2, ..., id as id8 and observing the query time rising to 313ms.

The JSON Transformation Hack

While analyzing the SQL query generated by Drizzle, we found it used a creative approach to transform the mistakes data into JSON within the query itself. This involved using json_group_array and json_object to structure the data as JSON:

SELECT
    "id",
    "createdAt",
    ...
    (
        SELECT
            COALESCE(
                json_group_array(
                    json_array(
                        "id",
                        "sessionId",
                        "mistakeType",
                        ...
                    )
                ),
                json_array()
            ) AS "data"
        FROM
            "mistake" AS "session_mistakes"
        WHERE
            "session_mistakes"."sessionId" = "session"."id"
    ) AS "mistakes"
FROM
    "session"
WHERE
    "session"."isDeleted" = ?
ORDER BY
    "session"."startedAt" DESC;

By transforming the mistakes into JSON within the query, we offloaded the processing for the mapping logic from JavaScript to the SQLite native layer, thereby reducing the load on the JavaScript thread.

Implementing Full JSON Transformation

We extended this approach to transform the entire session row into a JSON object, ensuring each row returned by the query was already structured as a JSON object:

SELECT
    json_object(
        'id', "id",
        'createdAt', "createdAt",
        ...
        'mistakes', (
            SELECT
                COALESCE(
                    json_group_array(
                        json_object(
                            'id', "id",
                            'sessionId', "sessionId",
                            'mistakeType', "mistakeType",
                            ...
                        )
                    ),
                    json_array()
                )
            FROM
                "mistake" AS "session_mistakes"
            WHERE
                "session_mistakes"."sessionId" = "session"."id"
        )
    ) AS "session"
FROM
    "session"
WHERE
    "session"."isDeleted" = ?
ORDER BY
    "session"."startedAt" DESC;

Results and Performance Gains

Moving the JSON transformation from the JS layer to the SQLite layer yielded significant performance improvements:

Old approach:
 LOG [451.14 ms] - (1) getting sessions
 LOG [280.96 ms] - (2) mapping sessions rows to js objects
 LOG [ 65.39 ms] - (3) getting mistakes
 LOG [ 37.83 ms] - (4) mapping mistakes rows to js objects
 LOG [837.46 ms] - total

New approach:
 LOG [154.94 ms] - (1) getting sessions (including mistakes) as json strings
 LOG [100.98 ms] - (2) parsing sessions json strings to js objects
 LOG [256.60 ms] - total

By transforming the session rows and their associated mistakes into JSON objects within SQLite, the total query time dropped from ~837ms to ~257ms, a 69% improvement, and the work being done in the JS thread dropped from ~319ms to ~101ms, a 68% improvement. This improvement was achieved by reducing the number of returned columns and reducing the JavaScript overhead for mapping and transformation operations.

Conclusion

This experience highlighted the power of offloading data transformation to the database layer. By leveraging SQLite's JSON functions, we not only achieved a significant performance boost but also simplified our codebase. This approach can be widely applicable in scenarios where large datasets need to be processed and transformed efficiently.

If you're encountering similar performance challenges, consider leveraging native database capabilities to handle data transformations, thereby reducing the load on your application layer. This strategy can lead to substantial performance improvements and a more responsive application. Even beyond data transformations, we’ve also seen significant improvements from moving calculations (such as those for streaks) natively into the database.

While we're pleased with the lessons learned and the enhancements we've made, we remain committed to further optimizing the Tarteel app's performance and expanding its offline capabilities. If the idea of helping tens of millions of Muslims connect with the Quran while tackling these kinds of technical challenges excites you, we invite you to apply to join our team: https://t.zip/careers. To experience the improved performance for yourself, download the app here: download.tarteel.ai.