Tuesday, March 11, 2025

C++11 faster than C++03

 C++11 faster than C++03 : Why and How



C++11 introduced several features and improvements that can make code written in C++11 faster than code written in C++03. Here are some key reasons:

1. Move Semantics

  • C++11 introduced move semantics, which allow the resources of temporary objects to be "moved" rather than copied. This reduces unnecessary deep copies of objects, especially for classes that manage resources like dynamic memory, file handles, or sockets. This can lead to significant performance improvements in applications that create and destroy many temporary objects.

2. Rvalue References

  • Move semantics are enabled by rvalue references (denoted by &&), which allow developers to distinguish between objects that can be moved (temporaries) and those that cannot. This optimization minimizes the overhead of copying large objects.

3. Improved Concurrency Support

  • C++11 introduced a standardized memory model and threading library (<thread>, <mutex>, <future>, <atomic>, etc.). This allows for more efficient and portable multithreaded code. The memory model also helps in writing lock-free and wait-free algorithms, which can improve performance in multi-core systems.

4. Optimized Standard Library

  • The Standard Template Library (STL) was updated in C++11 to take advantage of new language features like move semantics and improved algorithms. For example, many standard containers (e.g., std::vector, std::string) benefit from move operations, reducing overhead in common operations like reallocation or sorting.

5. constexpr

  • The constexpr keyword in C++11 allows the evaluation of functions and expressions at compile time rather than runtime. This can lead to significant performance gains by reducing runtime computation and improving opportunities for compiler optimizations.

6. Uniform Initialization

  • C++11 introduced uniform initialization (using curly braces {}), which can lead to more efficient code generation by the compiler. It also helps avoid certain pitfalls associated with older initialization syntax.

7. auto and type inference

  • The auto keyword allows the compiler to infer the type of a variable. While this doesn't directly improve performance, it can enable better optimizations by the compiler, especially when used with template-heavy code.

8. Range-based for Loop

  • The range-based for loop (for (auto &x : container)) introduced in C++11 can make iteration over containers more efficient by eliminating the need for manual indexing or iterator management.

9. nullptr

  • The introduction of nullptr in C++11 provides a type-safe null pointer constant, which can lead to better optimizations and prevent bugs that might otherwise lead to inefficiencies in the code.

10. Lambda Expressions

  • Lambda expressions provide a concise way to define anonymous functions directly within code, which can lead to better optimization opportunities by the compiler, especially when used in algorithms.

11. New Containers and Smart Pointers

  • C++11 added new containers like std::array (a stack-allocated array) and smart pointers (std::unique_ptr, std::shared_ptr), which can help manage resources more efficiently and avoid performance issues related to manual memory management.

12. Improved Compiler Optimizations

  • The introduction of these features and the move towards modern C++ coding practices have also led to better compiler optimizations. Modern C++ compilers are designed to take full advantage of the language features introduced in C++11, leading to faster and more optimized code compared to C++03.

Overall, the combination of these features and improvements allows C++11 code to be more efficient and often faster than equivalent C++03 code.

my bashrc

 #Source global defs

if [ -f /etc/bashrc ]; then

. /etc/bashrc

fi


if [ -e “$HOME”/.bash_aliases ]; then

source “$HOME”/.bash_aliases

fi


#User specific env

if ! [[ “$PATH” =~”$HOME/.local/bin:$HOME/bin:” ]]

then

PATH=“$HOME/.local/bin:$HOME/bin:$PATH”

fi

export PATH


ORACLE_HOME=/opt/oracle

export ORACLE_HOME

ORACLE_SID=OTP01

export ORACLE_SID

JAVA_HOME=/usr/lib/jvm/java-1.8.0

export JAVA_HOME



alias ll=‘ls -l | more’

alias lt=‘ls -t | more’

alias lsa=‘ls -a | more’

alias lla=‘ls -la | more’

alias lR=‘ls -R | more’

alias ltr=‘ls -ltr | more’

alias ldir=‘ls -l | grep ^d | more’

alias c=‘clear’

alias h=‘history 200’

alias find=‘find 2>/dev/null’ #what does 2>/dev/null does?

alias sq=‘sqlplus username/password@schemaname’

alias srcrc=‘source ~/.bashrc’

alias myps=‘ps -aef | grep dagaa | grep -v vscode | more’ #displays processes for dagaa except for vscode

alias myps=‘ps -aef | grep dagaa | grep vscode | more’ #displays vscode processes for dagaa 

alias gs=‘git status’

alias gb=‘git branch’

alias gl=‘git log’

alias gd=‘git diff’

alias run_send_to_mnc=‘cdmnc && python send_to_mnc.py http://dc2dvxdsklv19.ipa.lmeft.co.uk:38501’

google test for a code that has random number generator

 Writing tests for code that uses a random number generator (RNG) can be tricky, as random behavior introduces non-determinism into your tests. However, you can make testing easier by controlling or "mocking" the randomness. There are several strategies you can use to test code that involves random numbers, particularly when using a testing framework like Google Test.

Here are the steps to test code that uses a random number generator:

1. Refactor to Allow Injecting the RNG

If possible, refactor your code to allow the random number generator to be injected (dependency injection). This allows you to substitute the random number generator with a mock or predictable RNG during testing.

Example Code to Refactor:

cpp
#include <iostream> #include <random> class RandomGenerator { public: virtual int generate() = 0; }; class RealRandomGenerator : public RandomGenerator { public: int generate() override { std::random_device rd; std::mt19937 gen(rd()); std::uniform_int_distribution<> dis(1, 100); return dis(gen); } }; class MyClass { private: RandomGenerator* rng; public: MyClass(RandomGenerator* rng) : rng(rng) {} int getRandomValue() { return rng->generate(); } };

In this refactored code, RandomGenerator is an abstract class that is implemented by RealRandomGenerator. The MyClass class takes a pointer to a RandomGenerator in its constructor, allowing you to inject different implementations for testing purposes.

2. Write Google Test for Injected RNG

Now, you can write a test that uses a mock RNG. In Google Test, you can use Google Mock to create a mock class for RandomGenerator that returns predictable values.

Step-by-Step Google Test Example:

cpp
#include <gtest/gtest.h> #include <gmock/gmock.h> // Mock class for RandomGenerator class MockRandomGenerator : public RandomGenerator { public: MOCK_METHOD(int, generate, (), (override)); }; // Test for MyClass TEST(MyClassTest, GetRandomValue) { MockRandomGenerator mockRng; // Set up the mock to return predictable values EXPECT_CALL(mockRng, generate()) .WillOnce(testing::Return(42)) // First call will return 42 .WillOnce(testing::Return(99)); // Second call will return 99 MyClass obj(&mockRng); // Test that the first call returns 42 EXPECT_EQ(obj.getRandomValue(), 42); // Test that the second call returns 99 EXPECT_EQ(obj.getRandomValue(), 99); }

Key Elements of the Test:

  1. MockRandomGenerator: The mock class simulates the behavior of the RandomGenerator. We use Google Mock's MOCK_METHOD macro to create the mock method generate().

  2. EXPECT_CALL: We use EXPECT_CALL to specify that the generate() method should be called, and we use WillOnce(testing::Return(value)) to define what value it should return when called.

  3. Testing: After injecting the mock into the class, we test the behavior of the MyClass methods. We verify that the correct value is returned when the random number is generated, allowing us to validate the logic without involving true randomness.

3. Test with Real RNG (Optional)

If you want to test the real random number generator as well, you can do so by controlling the seed or by using a known seed value in the random generator.

cpp
TEST(MyClassTest, RealRandomValue) { RealRandomGenerator realRng; MyClass obj(&realRng); // Since random numbers are unpredictable, we can't check the exact number, // but we can test if the value falls within a certain range. int value = obj.getRandomValue(); EXPECT_GE(value, 1); EXPECT_LE(value, 100); }

In this test, we’re not testing for a specific random number (since it's truly random), but we can verify that the generated value falls within the expected range (1 to 100 in this case).

4. Use a Fixed Seed for Predictability

If you want to make the tests more deterministic but still use a real RNG, you can fix the random number generator's seed to a known value. This way, the sequence of random numbers generated will always be the same across runs.

cpp
class FixedSeedRandomGenerator : public RandomGenerator { public: int generate() override { static std::mt19937 gen(12345); // Fixed seed for predictability std::uniform_int_distribution<> dis(1, 100); return dis(gen); } }; TEST(MyClassTest, FixedSeedRandomValue) { FixedSeedRandomGenerator fixedRng; MyClass obj(&fixedRng); // Now the sequence of numbers generated is fixed, so we can test it deterministically. EXPECT_EQ(obj.getRandomValue(), 25); // Fixed result from fixed seed }

5. Testing Edge Cases

You can test edge cases like generating the minimum or maximum possible random values:

cpp
TEST(MyClassTest, EdgeCaseRandomValue) { MockRandomGenerator mockRng; EXPECT_CALL(mockRng, generate()) .WillOnce(testing::Return(1)); // Min value EXPECT_CALL(mockRng, generate()) .WillOnce(testing::Return(100)); // Max value MyClass obj(&mockRng); // Test the min value EXPECT_EQ(obj.getRandomValue(), 1); // Test the max value EXPECT_EQ(obj.getRandomValue(), 100); }

Summary:

To effectively test code that involves randomness:

  • Refactor your code to allow dependency injection of the RNG.
  • Use Google Mock to mock or control the behavior of the RNG.
  • You can use a fixed seed for reproducible tests.
  • Validate that your code behaves correctly even with predictable/random values within expected ranges.

By following these strategies, you can create reliable and deterministic tests for code that involves random number generation.

what are spin locks in c threading

 A spin lock in the context of C threading (or any other threading model) is a type of synchronization primitive used to protect shared resources from concurrent access in a multi-threaded environment. The term "spin" comes from the fact that the thread attempting to acquire the lock "spins" in a loop, repeatedly checking if the lock is available, without relinquishing the CPU.

How Spin Locks Work:

A spin lock is typically implemented using atomic operations like compare-and-swap (CAS) or test-and-set. Here’s a basic outline of how it works:

  1. Acquire: A thread attempts to acquire the lock by checking if the lock is free (typically stored in a boolean or integer flag).

    • If the lock is free (e.g., the flag is 0 or false), the thread changes the flag to indicate it has acquired the lock (e.g., sets the flag to 1 or true).
    • If the lock is already taken (e.g., the flag is 1 or true), the thread will spin and keep checking the flag, re-trying to acquire the lock in a loop.
  2. Release: When the thread is done using the shared resource, it releases the lock by resetting the flag to indicate the lock is available again.

Example of a Simple Spin Lock Implementation in C:

c
#include <stdio.h> #include <pthread.h> #include <stdatomic.h> // Define the spinlock using an atomic integer. atomic_int lock = 0; // Spinlock acquire function void spinlock_acquire(atomic_int *lock) { while (atomic_exchange_explicit(lock, 1, memory_order_acquire) != 0) { // If the lock is taken, the thread will "spin" here // (This is a busy-wait loop). } } // Spinlock release function void spinlock_release(atomic_int *lock) { atomic_store_explicit(lock, 0, memory_order_release); // Set lock to 0 to release it. } // A function that will be run by each thread void* thread_func(void* arg) { spinlock_acquire(&lock); printf("Thread %ld has acquired the lock\n", (long)arg); // Critical section - perform operations on shared resource here. // (Simulate work with sleep or computation) printf("Thread %ld is releasing the lock\n", (long)arg); spinlock_release(&lock); return NULL; } int main() { pthread_t threads[5]; // Create multiple threads for (long i = 0; i < 5; i++) { pthread_create(&threads[i], NULL, thread_func, (void*)i); } // Join threads to wait for their completion for (int i = 0; i < 5; i++) { pthread_join(threads[i], NULL); } return 0; }

Key Components:

  • Atomic Operations: The spin lock relies on atomic operations to ensure that the check and set of the lock state is done safely without interference from other threads.
    • In this example, atomic_exchange_explicit is used to atomically set the lock and get the previous value (whether the lock was already taken or not).
  • Spin-Waiting: If the lock is taken (the atomic operation shows that the lock is already set), the thread spins by continually checking the lock state in a busy-wait loop until it is available.
  • Critical Section: The code inside the critical section (between acquire and release) is where threads access shared resources that must be protected by synchronization.

Advantages of Spin Locks:

  1. Simplicity: Spin locks are relatively simple to implement. They require only basic atomic operations and a loop to "spin" while waiting for the lock to become available.
  2. Low Latency in Contention-Free Scenarios: If a thread is unlikely to face contention (e.g., if the lock is rarely contested), spin locks can be very efficient because the thread can immediately acquire the lock without having to involve the operating system or incur the overhead of blocking or context switching.

Disadvantages of Spin Locks:

  1. CPU Wasting: If a thread keeps spinning while waiting for the lock to be released, it wastes CPU time. This is inefficient if the lock is held for a long time, especially on systems with multiple cores where other threads could have been running.
  2. Starvation: Spin locks can lead to starvation, where some threads may never acquire the lock if other threads are continuously acquiring and releasing it in a high-contention scenario.
  3. Inefficiency Under High Contention: If many threads are competing for the same lock, spin locks can cause significant inefficiencies because all threads waste CPU time trying to acquire the lock, even when it's not available.

When to Use Spin Locks:

  • Low Contention: Spin locks are most effective when the lock is expected to be held for a very short duration (e.g., in a scenario where lock contention is rare or low). This way, the overhead of acquiring and releasing the lock is minimal compared to the cost of putting the thread to sleep and then waking it up.
  • Real-Time Systems: In systems where you can't afford the overhead of sleeping threads, spin locks may be useful (although care should be taken to avoid wasting too much CPU time).
  • Single-Core Systems or Lock-Free Data Structures: Spin locks can also be useful in cases where a single thread may quickly acquire and release the lock, or in lock-free algorithms where spinning is part of the design.

Alternatives to Spin Locks:

  • Mutexes: A mutex (short for "mutual exclusion") is a more common synchronization primitive, which, unlike spin locks, involves putting the thread to sleep if the lock is unavailable. This can be more efficient in high-contention scenarios, as it avoids wasting CPU time while waiting for the lock.
  • Read-Write Locks: A read-write lock allows multiple threads to read a shared resource concurrently but ensures exclusive access when a thread needs to modify the resource. This can be more efficient in read-heavy scenarios.
  • Semaphores and Condition Variables: These are other synchronization primitives that provide more control over blocking and signaling between threads.

Conclusion:

A spin lock is a simple and lightweight synchronization mechanism used in multithreaded environments, especially in low-contention scenarios. While it can offer performance benefits in certain situations, it's generally less efficient than alternatives like mutexes or condition variables when there is significant contention for resources. Spin locks should be used cautiously, with careful consideration of the potential impact on CPU usage and performance.

what is a core dump

 A core dump (or core file) is a file that captures the memory contents of a running process at a particular point in time, typically when the process crashes or encounters a serious error, such as a segmentation fault or illegal instruction.

What is a core dump?

  • It is essentially a snapshot of a process’s memory, including the call stack, memory allocations, and the state of the program at the time of the crash.
  • It helps developers analyze the cause of a crash by examining the state of the program when the error occurred.

Why does a core dump occur?

  • A core dump typically happens when a process encounters a fatal error that causes it to terminate unexpectedly. This might include:
    • Segmentation faults (segfaults): Trying to access memory that the process is not allowed to.
    • Illegal instructions: Executing invalid machine instructions.
    • Memory access violations: Trying to access memory in a way that is not allowed (e.g., reading from or writing to protected memory).
    • Unhandled exceptions or errors in certain environments (especially in C/C++ applications).

Key Contents of a Core Dump:

  • Process Memory: A snapshot of the process's memory, including heap and stack memory.
  • Registers: The values of CPU registers at the time of the crash.
  • Stack Trace: The call stack showing the functions or methods that were executed leading up to the crash.
  • Program Counter (PC): The instruction pointer indicating where the crash occurred in the code.
  • Thread Information: Information about the state of the threads (if the process is multi-threaded).

How to Generate a Core Dump:

  • Linux/Unix Systems:

    • On many Unix-like systems, core dumps can be enabled or disabled using system settings like ulimit. For example, to allow core dumps, you can use the following command in the shell:
      bash
      ulimit -c unlimited
      This allows the system to generate core dumps without any size limits. You can also configure where the core dumps are stored (e.g., /var/crash/).
    • Core dumps are often generated automatically when a program crashes. The file is typically named core or core.<pid>.
  • Windows Systems:

    • On Windows, the system generates minidumps by default when a process crashes. A full memory dump can also be configured to be written.
    • Tools like Windows Error Reporting (WER) or DbgHelp can be used to capture and analyze crash dumps.

How to Analyze a Core Dump:

  • GDB (GNU Debugger): On Linux, you can use a debugger like GDB to analyze a core dump file. For example:

    bash
    gdb /path/to/executable /path/to/core

    This will load the executable and the core dump file into GDB, where you can inspect the state of the program, including the call stack, variable values, and other information.

  • Other Debuggers: Other debuggers or analysis tools like LLDB (for macOS), WinDbg (for Windows), or specialized analysis tools can also be used to inspect the core dump.

Core Dump Use Cases:

  1. Debugging: Developers use core dumps to diagnose and debug why a process crashed. By analyzing the core dump, they can figure out what part of the code or which memory areas caused the crash.
  2. Post-mortem Analysis: If a program crashes in production and developers are not able to reproduce the issue, core dumps help them recreate the situation for further analysis.
  3. Crash Reporting: In some systems, core dumps are automatically uploaded to a server or logging system for analysis.

Handling Core Dumps:

  • Security Concerns: Core dumps can contain sensitive information, such as passwords, encryption keys, or private data from the process’s memory. As a result, it's essential to restrict access to core dumps and configure proper security measures.
  • Size Considerations: Core dumps can be large, depending on the size of the process’s memory. You may want to configure how large core dumps can be or where they should be stored.
  • Disabling Core Dumps: On some systems, you may want to disable core dumps entirely to avoid storing potentially large files. This can be done using system settings (e.g., ulimit -c 0 in Unix/Linux).

Conclusion:

A core dump is a valuable diagnostic tool for developers to investigate the state of a program after a crash, helping them understand the reason behind the crash and fix bugs. However, it's important to handle core dumps carefully due to their potential size and sensitivity of the data they contain.

L1 L2 L3 cache size

 The sizes of L1, L2, and L3 caches can vary depending on the CPU architecture and model. However, I can give you general guidelines for typical cache sizes in modern processors:

L1 Cache

  • Size: Usually between 32 KB to 128 KB per core.
  • Speed: Fastest cache level, with access time typically around 1 to 2 CPU cycles.
  • Structure: Split into L1 Data Cache (L1d) and L1 Instruction Cache (L1i), each typically ranging from 16 KB to 64 KB in modern processors.
  • Purpose: Stores data and instructions that are most frequently used by the processor, providing extremely fast access.

L2 Cache

  • Size: Typically between 256 KB to 2 MB per core.
  • Speed: Slower than L1, but still much faster than main memory. Access time typically around 3 to 10 CPU cycles.
  • Structure: A unified cache (holds both data and instructions).
  • Purpose: Acts as a secondary cache to store data that isn't in L1 but is likely to be used soon. It helps reduce the frequency of accesses to slower L3 or main memory.

L3 Cache

  • Size: Ranges from 2 MB to 64 MB, shared among all cores on a CPU.
  • Speed: Slower than L2, with access time typically around 10 to 20 CPU cycles or more, depending on the architecture.
  • Structure: Usually a shared cache, meaning that all cores in a multi-core processor have access to the same L3 cache.
  • Purpose: Serves as a last-level cache, storing data that is less likely to be reused soon but still faster than accessing the main memory.

Examples by Processor Family:

  1. Intel Core i7/i9 (Coffee Lake, Comet Lake)

    • L1 Cache: 32 KB per core (16 KB data + 16 KB instruction)
    • L2 Cache: 256 KB per core
    • L3 Cache: 12 MB shared (for an 8-core processor)
  2. AMD Ryzen 7 5800X (Zen 3)

    • L1 Cache: 32 KB per core (16 KB data + 16 KB instruction)
    • L2 Cache: 512 KB per core
    • L3 Cache: 32 MB shared (for an 8-core processor)
  3. Apple M1 (ARM-based)

    • L1 Cache: 192 KB per core
    • L2 Cache: 12 MB shared
    • L3 Cache: No L3 cache on M1 chip. Some ARM chips use L2 as the last level of cache.

Key Considerations for Performance:

  • Access Speed: L1 cache is the fastest and closest to the core, so the processor can access it almost instantly. L2 is a bit slower but still much faster than main memory, while L3 is slower and shared among all cores.
  • Cache Hits/Misses: If your working data fits in L1 or L2, the CPU will access it much faster, minimizing latency. If it doesn't fit, data may be fetched from L3 or the main memory, leading to slower performance.
  • Cache Design: Some processors may implement more sophisticated designs, such as inclusive (L1 + L2 + L3 store the same data) or exclusive (L1, L2, and L3 store different data), which affects how efficiently caches work.

Conclusion:

  • L1 Cache: Small but extremely fast (32 KB - 128 KB per core).
  • L2 Cache: Larger but slower than L1 (256 KB - 2 MB per core).
  • L3 Cache: Largest but slowest of the three (2 MB - 64 MB shared).

For low-latency applications, it’s crucial to keep the most frequently accessed data in L1 and L2 cache to avoid slow accesses to L3 or even slower main memory.

You said:

C++ low latency server, how much data in kilo bytes can I load into memory

 In a C++ low-latency server, the amount of data you can load into memory while maintaining low-latency performance depends on several factors, such as:

1. Available System Memory and Architecture

  • The physical RAM available on the machine plays a major role. For example, on a 64-bit system with 16 GB of RAM, you could theoretically load several gigabytes of data into memory. However, it’s essential to ensure that the data stays within the optimal working set for low-latency access. Too much memory usage can lead to paging or swapping, which drastically impacts performance.

2. Data Access Patterns

  • Cache locality is crucial for low-latency performance. The more your data fits into the CPU's cache (L1, L2, and L3), the faster it can be accessed.
  • If your dataset is too large to fit into the cache (for example, a few megabytes or more), it can lead to cache misses, which will increase latency.
  • Memory access patterns should be sequential and cache-friendly (e.g., avoid random access across large datasets). If data needs to be accessed in a random order, it can increase memory access latency.

3. CPU and Memory Bandwidth

  • The CPU clock speed and memory bandwidth will affect how fast data can be loaded and accessed. A fast CPU with high memory bandwidth will be able to process more data with lower latency.

4. Memory Overhead

  • When loading data into memory, you must account for the overhead of your data structures, such as metadata, alignment padding, etc. In certain cases, the memory overhead of libraries or objects can increase the effective data load.

5. Latency Sensitivity of Your Application

  • For very low-latency systems (e.g., real-time trading systems, high-frequency trading, or financial applications), even millisecond delays matter. Here, keeping the memory footprint small (tens to hundreds of kilobytes) can be essential.
  • For other types of servers, you might be able to load several megabytes into memory without noticeable latency increases.

6. System Load and Other Processes

  • The amount of memory being consumed by other processes on the system impacts available memory for your server. If your system is under heavy load, the operating system may swap out parts of memory, which can hurt performance. In such cases, a lower memory footprint ensures that your process can access its data quickly.

7. Operating System and Memory Management

  • Operating systems have different mechanisms for managing memory and process priorities. In Linux, for example, you can use madvise() to provide the OS with hints about memory usage patterns to optimize performance. Other OS-specific settings (e.g., lock pages in memory, huge pages) can also help reduce latency.

How much data can you load into memory?

For low-latency performance, the data should ideally be small enough that it fits within the CPU cache and the system’s memory.

  • Small-scale servers (e.g., small microservices, real-time trading, etc.): 100 KB to a few MBs of memory can usually be managed without significantly affecting latency.
  • Larger-scale systems with heavy memory demands (e.g., big data, high-throughput services): You can load tens to hundreds of MBs, but this depends on memory access patterns, the use of optimizations like memory pooling, and the system’s hardware.

In general, aim to keep the memory access time low by minimizing memory fragmentation and accessing data in a cache-friendly manner. If latency is your priority, err on the side of smaller memory footprints (especially when optimizing for caches).

Practical Tips:

  1. Test and Profile: It’s essential to profile your application with actual data loads and measure the latency. Tools like perf (Linux) or built-in profilers in your IDE can help identify where latency bottlenecks occur.

  2. Memory Pooling: Use memory pools (e.g., with malloc or custom allocators) to avoid fragmentation and improve cache performance.

  3. Use CPU caches effectively: Try to keep your hot data (the most frequently accessed) small enough to fit in the L1 or L2 cache.

  4. Consider Huge Pages: On some systems, you can use large page sizes (huge pages) to reduce memory access latency for large datasets.

Conclusion:

For low-latency C++ servers, aim for a memory footprint of tens to a few hundred kilobytes, but this can increase depending on your server's workload. Memory access patterns, CPU cache utilization, and system resources are the key considerations when scaling up the data loaded into memory. Always profile your specific use case to find the optimal balance.