Skip to Content
avatar image
Former Member

sybase thread model and cpu/core/thread at system level

Hi community,

I would like to have you feedback about the usage of sybase threads engines vs solaris chip/cpu/threads, if there are any settings that must be prevented.

as far as I understand, the sybase engines are now threaded (if configuration parameter 'kernel mode' is set to 'threaded' which is my case) meaning that at the system level we see only one process

my_host simon /tmp/

bash$ ps -ef | grep dataserver

s157 9192 9190 9 Apr 11 ? 1033:21 /my_host/sybase/ase157/MYHOST/ASE-15_0/bin/dataserver -d/my_host/sybase/ase157

bash$ prstat -L -p 9192

PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/LWPID

9192 s157 95G 95G sleep 1 0 2:54:48 0.0% dataserver/22

9192 s157 95G 95G sleep 12 0 2:24:04 0.0% dataserver/20

9192 s157 95G 95G sleep 1 0 2:18:13 0.0% dataserver/19

...

9192 s157 95G 95G sleep 13 0 0:00:07 0.0% dataserver/1

Total: 1 processes, 276 lwps, load averages: 0.19, 0.43, 0.38


note: By the way is there a way to do the correlation between 'prstat -L' output and the 'number of max online engines'?

So we have threads at Sybase level

On the other hand, most servers I'm working on are Solaris10 zone and with those T6 processors, I have chip containing cores that could be multi-thread.

This means the output of mpstat shows the thread dedicated to my zone, I could potentially have to threads on the same chip etc...

are there any recommendations on how core/threads at system level should be configured for optimal performances? for example if my dataserver is configured for 4 threads, should I ask my UNIX admin to have a zone with 4 'virtual' cpu that are actually four threads on the same core?

Thanks for your input,

Simon

Add comment
10|10000 characters needed characters exceeded

  • Follow
  • Get RSS Feed

1 Answer

  • Best Answer
    Apr 14, 2016 at 02:46 PM

    A lot depends on the actual CPU usage by ASE.

    First, as you likely understand, a chip thread is not 100% scalable. It only works if the CPU core is idle while waiting for a mutli-cycle operation (e.g. main memory fetch) to return. For example, a CPU core that is 60% busy can only gain 40% at most by adding 1 thread assuming that the 1 thread can keep it busy. If a very computationally intensive task or other CPU intensive task hits and can keep a core pegged at 100%.....then threading buys you nothing....or worse.

    Remember, the OS sees virtual processors as CPU's - so it schedules the tasks across all the available "CPU's". Generally, it will try to distribute tasks across the cores first and then distribute on the threads. If you had a really cpu intensive process, it would "starve" other OS processes on the same core - consequently, chip threading is implemented by implementing some aspect of a timesharing implementation - e.g. whether the task hit a main mem fetch or not, at a certain point, it simply yields the CPU to the other thread. For SPARC T, this happens every tick (or used to ....may have changed)....IBM POWER uses a timeslice.

    Point being that if the *real* cpu usage of the engines is 30%, then a single core with any number of threads is going to have an issue if the number of engines is >3. You need to get a close look at the current CPU usage during peak periods at the chip level - not sure if OS commands will do this for you or not. Thankfully, with larger data caches, DBMS engines do a lot of main memory fetches, so they tend to do okay with threads - provided you can keep the engines busy.

    So, while I am a bit leery of suggesting all the virtual processors being threads on the same core...however, we do recommend that when multiple cores are in the same domain/HW partition, that they be from the same socket. This is to reduce the system bus flushing and maximize the benefits of the L1/L2/L3 caches.

    I also am a bit leery of over threading as this has been a huge problem with SPARC in the past. For example, early T2-T4 chips with 8 cores per chip and 8 threads per core saw 64 virtual processors at the OS level. If you only ran 16 ASE engines, the OS would schedule a lot of the OS tasks (print daemons, etc.) on the others - net effect is that the ASE engines got less cpu time than if the fewer threads. Not sure if T6/T7 manage this better - there has been a LOT of improvement in SPARC cores in recent years.

    Best thing - get the sysadmins to find out the actual cpu usage per core and be a bit conservative. In other words, if you have 8 engines, rather than allocating a single core with 8 threads (even if CPU utilization adds up), you might want to instead allocate 2 cores w/ 16 threads.....or 4 cores w/ 4 theads (and consider reducing the threads per core - you can do this on Solaris).

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member

      Hi Jeff,

      I'm currently reading a white-paper you wrote (Managing workloads with ASE: Techniques for OLTP scaling and performance management with large SMP and shred disk clusters)

      On page 10, it's writtent the following:

      the total number of strands enabled on all the cores should be approximatively the same as the numbers of the number of engines anticipated to be used by ASE.

      Based on that, I have a solaris zone with 9 virtual cpus, and my dataserver is configured with 6 threads for the 'syb_default_pool'

      sp_configure output:

      max online engines 1 9531 6 6 number static

      sp_helpthread output:

      Name Type CurrentSize TargetSize IdleTimeout Description InstanceName

      ----------------- -------------------- ----------- ---------- ----------- -------------------------------------------- ------------

      syb_blocking_pool Run To Completion 4 4 0 A pool dedicated to executing blocking calls NULL

      syb_default_pool Engine (Multiplexed) 6 6 100 The default pool to run query sessions NULL

      syb_system_pool Run To Completion 3 3 0 The I/O and system task pool NULL

      Does the total of threads of those 3 thread_pools should be <= total number of virutal cpus or just the total of syd_default_pool should be <= totable number of virutal cpus?

      Thank you

      Simon