This article is written in Englsih and Portuguese (original version here)
Este artigo está escrito em Inglês e Português (versão original aqui)
English version
What I'm writing about is not really a new feature... It was introduced in 11.70.xC1 and even then it was a port from Informix XPS and I suppose other RDBMs may have something like this. The reason why I'm writing about this now is that I recently went through some situations that made evident the reason why it was introduced
Problem description
It may not be evident to most people working in databases, but the way the data is physically distributed on disk will have a serious impact on the query resolution time. Imagine a simple scenario where we're joining two tables by customer_num for example. One of the tables is relatively large (200M rows) and for each customer_num it contains several records (tens). Let's call it the events table. We're joining it with another table (let's just call it a temp table) where we also have the customer_num and some other fields to which we apply some conditions.
The query plan is pretty obvious. Filters on the temp table and a nested loop with an index lookup on the events table.
This description mimics a real customer situation where they were complaining that the same query, roughly with the same amount of data and similar distribution jumped from around 1H to several hours. Apparently nothing relevant was changed. So why would the query take so much longer now?!
After examining the situation the conclusion was pretty obvious, although the root cause was still unknown: The query plan was the same, but the query now was "always" waiting for I/O. Assuming no problems were happening on the I/O system, what could explain this?
The explanation
After some discussion we figured out what changed. In previous situations, the events table was loaded by a process that worked as a single thread, and the way it compiled the information made the INSERTs in "batch mode". In practice, all the records from a single customer were inserted sequentially, so they were physically near on the disk. Due to some other changes, now the table was loaded using parallel threads, each working on a group of customers. So in practice each customer's records were split across several (sometimes far) physical disk pages.
It's important to note that from the informix perspective we were doing exactly the same steps:
- Filter the records on the temp table, extracting a stream of customer_num
- For each customer_num (NESTED LOOP), do an index lookup on the events table. This returns a set of rowids (physical page pointers on the data structure)
- For each rowid extracted before, access the physical page that contains the matching data.
Why would it take longer then?:
- In the previous situation, with a single access to a page we would retrieve several rows of data
- Even when the number of records for a customer_num would require more than one page, it was very likely the pages were physical contiguous. Due to several layers of caches (physical disks, controllers, arrays etc.) the second page access was probably already in cache (the Informix page size is small and no disk will transfer "only" 2 or 4KB on each operation). So the sub-sequent disk I/Os for the same customer would probably not require a real physical I/O
Is this enough to explain the performance difference? Yes! Definitely. We tested to load ordered data into the table and got the expected results.
This is a good reason for the existence of CLUSTERED INDEXES.
INDEX SKIP SCAN
So, how is the above problem related to the feature called INDEX SKIP SCAN? To be honest it isn't... INDEX SKIP SCAN will not be used in the above situation. But I'm not (yet) completely crazy. In practice this feature tries to solve the same problem. How does it work? The skip scan was introduced when we released the multi index scan. When we use more than one index on the same table for a query, we gather several sets of ROWIDs. And it's highly likely that some of these ROWIDs are repeated among different sets. So it makes sense that we eliminate (skip) some ROWIDs from the scan (to avoid fetching the same rows several times). That's where the name comes from. In order to do this, each set of ROWIDs must be ordered and then merged.
Although it was designed for multi index access, the process can be useful for a single index also. Before the feature, Informix would fetch the page for each rowid retrieved from the index. With the feature it will try to buffer the rowids, sort them, and access the pages after. The result of this is that we increase the probability of fetching sequential (or at least close enough) pages, so that we take advantage of the underlying caches.
Testing
In order to show the effect of this physical layout, I prepared a relatively simple test case. I generated 2M rows of a table using an AWK script. This table has just two columns:
- col1 INTEGER
- col2 CHAR(75)
The data was created in a way that the values for column col1 vary between 1 and 40000. On average each value in col1 has 50 rows with that value. Maximum repetition is 82:
castelo@primary:informix-> dbaccess stores -
Database selected.
> select first 3 col1 c1, count(*) c2 from test_data group by 1 order by 2 desc;
select first 3 col1 c1, count(*) c2 from test_data group by 1 order by 2;
select avg(c2) from (select col1 c1, count(*) c2 from test_data group by 1 );
c1 c2
38137 82
30478 79
29224 79
3 row(s) retrieved.
>
c1 c2
33710 24
39038 26
32313 26
3 row(s) retrieved.
>
(avg)
50.0000000000000
1 row(s) retrieved.
>
I then created a few SQL scripts:
- test_query_count
Does a simple count WHERE col1 BETWEEN 1000 and 1400
- test_query_data_full
Does a SELECT * of the same records, forcing a SEQUENTIAL SCAN
- test_query_data_index_skip_scan
Does a SELECT * of the same records forcing the new feature
- test_query_data_index
Does a SELECT * of the same records using the normal INDEX PATH
- test_query_force_sort
This is the trickiest one. I do a SELECT * using the ROWIDs gathered from an inline view that uses the same criteria and ORDERs the ROWIDs. The goal here is to emulate the feature in a way that it's compatible with older versions
All the scripts write the query plan to a file and insert a record into a control table that has the identifier for the query and the start and stop time.
castelo@primary:informix-> for script in test_query*.sql
do
printf '===============================================\n%s\n===============================================\n' $script
cat $script
printf "\n\n\n"
done
===============================================
test_query_count.sql
===============================================
!rm sqexplain_query_count.out
SET EXPLAIN FILE TO 'sqexplain_query_count.out';
INSERT INTO test_data_results (access_type, query_start) VALUES ('Simple count', CURRENT YEAR TO FRACTION(3));
SELECT --+ EXPLAIN
COUNT(*) FROM test_data WHERE col1 BETWEEN 1000 AND 1400;
UPDATE test_data_results SET query_end = CURRENT YEAR TO FRACTION(3) WHERE access_type = 'Simple count';
===============================================
test_query_data_full.sql
===============================================
!rm sqexplain_data_full.out
SET EXPLAIN FILE TO 'sqexplain_data_full.out';
INSERT INTO test_data_results (access_type, query_start) VALUES ('Sequential scan (force)', CURRENT YEAR TO FRACTION(3));
UNLOAD TO /dev/null
SELECT --+ EXPLAIN, FULL (test_data)
* FROM test_data WHERE col1 BETWEEN 1000 AND 1400;
UPDATE test_data_results SET query_end = CURRENT YEAR TO FRACTION(3) WHERE access_type = 'Sequential scan (force)';
===============================================
test_query_data_index_skip_scan.sql
===============================================
!rm sqexplain_data_index_skip_scan.out
SET EXPLAIN FILE TO 'sqexplain_data_index_skip_scan.out';
INSERT INTO test_data_results (access_type, query_start) VALUES ('Data with index skip scan (force)', CURRENT YEAR TO FRACTION(3));
UNLOAD TO /dev/null SELECT --+ EXPLAIN, MULTI_INDEX(test_data)
* FROM test_data WHERE col1 BETWEEN 1000 AND 1400;
UPDATE test_data_results SET query_end = CURRENT YEAR TO FRACTION(3) WHERE access_type = 'Data with index skip scan (force)';
===============================================
test_query_data_index.sql
===============================================
!rm sqexplain_data_index.out
SET EXPLAIN FILE TO 'sqexplain_data_index.out';
INSERT INTO test_data_results (access_type, query_start) VALUES ('Data with index', CURRENT YEAR TO FRACTION(5));
UNLOAD TO /dev/null SELECT --+ EXPLAIN
* FROM test_data WHERE col1 BETWEEN 1000 AND 1400;
UPDATE test_data_results SET query_end = CURRENT YEAR TO FRACTION(3) WHERE access_type = 'Data with index';
===============================================
test_query_force_sort.sql
===============================================
!rm sqexplain_force_sort.out
SET EXPLAIN FILE TO 'sqexplain_force_sort.out';
INSERT INTO test_data_results (access_type, query_start) VALUES ('Data with index and sorted rowIDs', CURRENT YEAR TO FRACTION(3));
UNLOAD TO /dev/null SELECT --+ EXPLAIN
a.* FROM test_data a, (SELECT rowid r FROM test_data c WHERE col1 BETWEEN 1000 AND 1400 ORDER BY 1) b
WHERE a.rowid = b.r;
UPDATE test_data_results SET query_end = CURRENT YEAR TO FRACTION(3) WHERE access_type = 'Data with index and sorted rowIDs';
Finally I created a SHELL script that recreates and loads the data table and the control table and then executes each of these scripts. It restarts the engine between each script to clean up the caches Let's see the results:
castelo@primary:informix-> cat test_script.sh;printf "\n
\n";./test_script.sh 2>&1| grep -v "^$"
#!/bin/bash
printf "Recreating and loading table...\n"
dbaccess stores <<EOF
DROP TABLE IF EXISTS test_data;
DROP TABLE IF EXISTS test_data_results;
CREATE RAW TABLE test_data
(
col1 INTEGER,
col2 CHAR(75)
) IN dbs1 EXTENT SIZE 5000 NEXT SIZE 5000;
BEGIN WORK;
LOCK TABLE test_data IN EXCLUSIVE MODE;
LOAD FROM test_data.unl INSERT INTO test_data;
COMMIT WORK;
ALTER TABLE test_data TYPE (standard);
CREATE INDEX ix_test_data_1 ON test_data(col1) IN dbs2;
CREATE TABLE test_data_results
(
access_type VARCHAR(255),
query_start DATETIME YEAR TO FRACTION(3),
query_end DATETIME YEAR TO FRACTION(3)
);
EOF
printf "Done...\n"
for SCRIPT in test_query_count.sql test_query_data_full.sql test_query_data_index_skip_scan.sql test_query_data_index.sql test_query_force_sort.sql
do
printf "Stopping informix...\n"
onmode -ky
printf "Starting informix...\n"
oninit -w
printf "Running script $SCRIPT\n"
dbaccess stores $SCRIPT
printf "\nDone...\n"
done
dbaccess stores <<EOF
SELECT
access_type, query_start, query_end, query_end - query_start elapsed_time
FROM
test_data_results;
EOF
Recreating and loading table...
Database selected.
Table dropped.
Table dropped.
Table created.
Started transaction.
Table locked.
2000000 row(s) loaded.
Data committed.
Table altered.
Index created.
Table created.
Database closed.
Done...
Stopping informix...
Starting informix...
Running script test_query_count.sql
Database selected.
Explain set.
1 row(s) inserted.
(count(*))
20267
1 row(s) retrieved.
1 row(s) updated.
Database closed.
Done...
Stopping informix...
Starting informix...
Running script test_query_data_full.sql
Database selected.
Explain set.
1 row(s) inserted.
20267 row(s) unloaded.
1 row(s) updated.
Database closed.
Done...
Stopping informix...
Starting informix...
Running script test_query_data_index_skip_scan.sql
Database selected.
Explain set.
1 row(s) inserted.
20267 row(s) unloaded.
1 row(s) updated.
Database closed.
Done...
Stopping informix...
Starting informix...
Running script test_query_data_index.sql
Database selected.
Explain set.
1 row(s) inserted.
20267 row(s) unloaded.
1 row(s) updated.
Database closed.
Done...
Stopping informix...
Starting informix...
Running script test_query_force_sort.sql
Database selected.
Explain set.
1 row(s) inserted.
20267 row(s) unloaded.
1 row(s) updated.
Database closed.
Done...
Database selected.
access_type Simple count
query_start 2014-09-02 13:08:12.141
query_end 2014-09-02 13:08:12.259
elapsed_time 0 00:00:00.118
access_type Sequential scan (force)
query_start 2014-09-02 13:08:29.215
query_end 2014-09-02 13:08:36.838
elapsed_time 0 00:00:07.623
access_type Data with index skip scan (force)
query_start 2014-09-02 13:08:54.250
query_end 2014-09-02 13:09:03.403
elapsed_time 0 00:00:09.153
access_type Data with index
query_start 2014-09-02 13:09:21.684
query_end 2014-09-02 13:10:53.474
elapsed_time 0 00:01:31.790
access_type Data with index and sorted rowIDs
query_start 2014-09-02 13:11:12.682
query_end 2014-09-02 13:11:22.874
elapsed_time 0 00:00:10.192
5 row(s) retrieved.
Database closed.
castelo@primary:informix->
We can see the results in a table:
Query
|
Start
|
Stop
|
Elapsed
|
Simple count
|
13:08:12.141
|
13:08:12.259
|
00:00:00.118 (0s)
|
Sequential scan (force)
|
13:08:29.215
|
13:08:36.838
|
00:00:07.623 (7.6s)
|
Data with index skip scan (force)
|
13:08:54.250
|
13:09:03.403
|
00:00:09.153 (9.1s)
|
Data with index
|
13:09:21.684
|
13:10:53.474
|
00:01:31.790 (1m31s)
|
Data with index and sorted rowIDs
|
13:11:12.682
|
13:11:22.874
|
00:00:10.192 (10.1s)
|
So, here are the important points:
- SELECT COUNT(*) is extremely fast. Just count the index entries that match the criteria
- A sequential scan takes around 8s. The table size is small.
- Forcing the MULTINDEX path allows the INDEX SKIP SCAN feature and it's around the same as a full sequential scan
- Going through the normal index PATH is extremely slow. 1.5 minutes compared to ~10s for the other options. That's the price we pay for scattered reads.
- By simulating the feature we get a time very close to the feature itself
I also used a script I mentioned before, called ixprofiling, to show the real work being done do solve each query. For the query that uses INDEX SKIP SCAN:
castelo@primary:informix-> onmode -ky;oninit -w;ixprofiling -z -i stores test1.sql
Database selected.
Engine statistics RESETed. Query results:
Query start time: 17:31:06.017924000
UNLOAD TO /dev/null SELECT --+ EXPLAIN, MULTI_INDEX(test_data)
* FROM test_data WHERE col1 BETWEEN 1000 AND 1400;
20267 row(s) unloaded.
Query stop time: 17:31:15.806221000
Thread profiles (SID: 6)
LkReq LkWai DLks TOuts LgRec IsRd IsWrt IsRWr IsDel BfRd BfWrt LgUse LgMax SeqSc Srts DskSr SrtMx Sched CPU Time Name
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----------- ------------
40641 0 0 0 0 526 0 0 0 18335 0 0 0 2 1 0 0 36962 0.340086802 sqlexec
LkWs IOWs nIOW IdxBR Name
------------ ------------ ------------ ------------ -----------------------------------
0.0 9.3030077117 18157 0 sqlexec
Partitions profiles (Database: stores)
LkReq LkWai DLks TOuts DskRd DskWr IsRd IsWrt IsRWr IsDel BfRd BfWrt SeqSc Object name
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ------------------------------------------------------
20267 0 0 0 18018 0 495 0 0 0 18076 0 1 test_data
20307 0 0 0 109 0 0 0 0 0 365 0 0 test_data#ix_test_data_1
Database closed.
real 0m11.807s
user 0m0.000s
sys 0m0.090s
castelo@primary:informix->
For the normal query:
castelo@primary:informix-> onmode -ky;oninit -w;ixprofiling -z -i stores test2.sql
Database selected.
Engine statistics RESETed. Query results:
Query start time: 17:45:39.873507000
UNLOAD TO /dev/null SELECT --+ EXPLAIN
* FROM test_data WHERE col1 BETWEEN 1000 AND 1400;
20267 row(s) unloaded.
Query stop time: 17:47:11.803271000
Thread profiles (SID: 6)
LkReq LkWai DLks TOuts LgRec IsRd IsWrt IsRWr IsDel BfRd BfWrt LgUse LgMax SeqSc Srts DskSr SrtMx Sched CPU Time Name
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----------- ------------
40720 0 0 0 0 31 0 0 0 20608 0 0 0 1 0 0 0 950 0.261092628 sqlexec
LkWs IOWs nIOW IdxBR Name
------------ ------------ ------------ ------------ -----------------------------------
0.0 6.2963172635 320 0 sqlexec
Partitions profiles (Database: stores)
LkReq LkWai DLks TOuts DskRd DskWr IsRd IsWrt IsRWr IsDel BfRd BfWrt SeqSc Object name
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ------------------------------------------------------
20267 0 0 0 17921 0 0 0 0 0 39562 0 0 test_data
20386 0 0 0 206 0 0 0 0 0 587 0 0 test_data#ix_test_data_1
Database closed.
real 1m35.757s
user 0m0.050s
sys 0m0.080s
castelo@primary:informix->
Things to note:
- The number of disk reads is roughly the same.
- The number of buffer reads on the partitions is significantly higher in the second case
- The CPU time is actually higher (slightly) on the first query
If we compare sections of onstat -g ioa we confirm the work done in terms of I/O was similar, but we know one of them took much longer:
I/O queues:
AIO I/O queues with INDEX SKIP SCAN:
q name/id len maxlen totalops dskread dskwrite dskcopy
fifo 0 0 0 0 0 0 0
drda_dbg 0 0 0 0 0 0 0
sqli_dbg 0 0 0 0 0 0 0
kio 0 0 32 725 408 317 0
kio 1 0 33 20342 20274 68 0
adt 0 0 0 0 0 0 0
msc 0 0 1 4 0 0 0
AIO I/O queues for normal query:
q name/id len maxlen totalops dskread dskwrite dskcopy
fifo 0 0 0 0 0 0 0
drda_dbg 0 0 0 0 0 0 0
sqli_dbg 0 0 0 0 0 0 0
kio 0 0 33 9667 9653 14 0
kio 1 0 33 11051 11035 16 0
adt 0 0 0 0 0 0 0
msc 0 0 1 4 0 0 0
Apparently the slower one did less work! Some more writes for the first. Hard to tell if significant.
AIO I/O vps with INDEX SKIP SCAN:
class/vp/id s io/s totalops dskread dskwrite dskcopy wakeups io/wup errors tempops
fifo 7 0 i 0.0 0 0 0 0 0 0.0 0 0
kio -1 0 i 0.5 379 210 169 0 579 0.7 0 0
kio -1 1 i 25.4 19839 19792 47 0 33783 0.6 0 0
msc 6 0 i 0.0 4 0 0 0 4 1.0 0 9
aio 5 0 i 0.6 469 195 84 0 469 1.0 0 0
[...]
AIO I/O vps for normal query:
class/vp/id s io/s totalops dskread dskwrite dskcopy wakeups io/wup errors tempops
fifo 7 0 i 0.0 0 0 0 0 0 0.0 0 0
kio -1 0 i 56.2 9506 9492 14 0 15327 0.6 0 0
kio -1 1 i 61.5 10395 10379 16 0 14895 0.7 0 0
msc 6 0 i 0.0 4 0 0 0 4 1.0 0 9
aio 5 0 i 2.7 459 195 74 0 459 1.0 0 460
[...]
Roughly the same work, but more evely split across the two kio threads for the slower query. Possibly an indication that the I/Os are slower.
AIO big buffer usage summary with INDEX SKIP SCAN:
class reads writes
pages ops pgs/op holes hl-ops hls/op pages ops pgs/op
fifo 0 0 0.00 0 0 0.00 0 0 0.00
drda_dbg 0 0 0.00 0 0 0.00 0 0 0.00
sqli_dbg 0 0 0.00 0 0 0.00 0 0 0.00
kio 21665 20002 1.08 983 151 6.51 871 216 4.03
AIO big buffer usage summary for normal query:
class reads writes
pages ops pgs/op holes hl-ops hls/op pages ops pgs/op
fifo 0 0 0.00 0 0 0.00 0 0 0.00
drda_dbg 0 0 0.00 0 0 0.00 0 0 0.00
sqli_dbg 0 0 0.00 0 0 0.00 0 0 0.00
kio 22576 19871 1.14 1888 279 6.77 498 30 16.60
Not a huge difference for the number of operations.
Conclusion
Don't underestimate the real effect of data dispersion on disk. This situation shows clearly the impact. On another customer situation I had very recently I saw around 4 times performance degradation when comparing the sorted rowids method to the natural method.
I'd like to see this feature being more naturally used in Informix (outside the scenario of MULTI_INDEX path).
Versão Portuguesa
O assunto sobre o qual vou escrever não é propriamente novidade. Foi introduzido na versão 11.70.xC1 e mesmo aí foi um
port de uma funcionalidade que já existia no Informix XPS e penso que em outras bases de dados do mercado. A razão porque decidi descrever esta funcionalidade prende.se com o facto de ter estado envolvido em situações recentemente onde ficou evidente a razão para a sua criação.
Descrição do problema
Pode não ser evidente para todas as pessoas que trabalham com bases de dados, mas a forma como os dados estão físicamente dispostos nos discos terá um impacto significativo na resolução das queries. Imagine um cenário simples onde façamos um JOIN entre duas tabelas por customer_num por exemplo. Uma dessas tabelas é relativamente grande (200M registos) e para cada customer_num contém vários registos (dezenas). Vamos chamar a esta tabela "eventos". Estamos a fazer o JOIN com outra tabela (chamemos-lhe apenas uma tabela temporária) onde temos o campo customer_num e outros campos aos quais aplicamos uma ou mais condições. O plano de execução será bastante óbvio. Filtros aplicados na tabela temporária e um NESTED LOOP JOIN com procura por índice na tabela de eventos.
Esta descrição é uma réplica de uma situação real num cliente onde se estavam a queixar que para o mesmo processo, grosseiramente com o mesmo volume de dados e distribuições semelhantes o processo tinha passado de 1H para várias. Aparentemente nada tinha mudado. Sendo assim de onde vinha a lentidão?!
Depois de analizar a situação a conclusão foi bastante óbvia, apesar se a causa original ser ainda desconhecida: O plano de execução não se tinha alterado, mas agora a query estava "sempre" à espera de I/O. Assumindo que não havia problemas no sistema de I/O o que poderia causar isto?
A explicação
Após alguma discussão, descobrimos o que tinha mudado. Na situação anterior, a tabela de eventos era carregada por um único processo e a forma de processamento resultava em INSERTs em "batch". Na prática, todos os registos de um mesmo customer_num eram inseridos sequencialmente, pelo que resultava numa ordenação física no disco. Devido a algumas mudanças nos processos, agora a tabela era carregada usando processos paralelos, cada um trabalhando num conjunto distinto de clientes. Por isso na prática, os registos para um mesmo customer_num acabavam dispersos (por vezes muito afastadas) em várias páginas em disco.
É importante salientar que do ponto de vista do Informix estávamos a fazer exactamente os mesmos passos:
- Filtrar os registos na tabela temporária, obtendo uma lista de valores de customer_num
- Para cada customer_num (NESTED LOOP), fazer uma pesquisa no índice da tabela de eventos. Isto retorna uma lista de rowIDs (ponteiros para as páginas físicas dos dados)
- Para cada rowID obtido anteriromente, aceder à página de dados e obter os dados correspondentes
Porque demoraria mais então?
- Na situação anterior, com um único acesso a uma página obtinhamos normalmente mais que um registo de dados (os acessos futuros já estavam em cache
- Mesmo quando o número de registos para um customer_num precisasse de mais de uma página, as páginas sub-sequentes estavam contíguas em disco. E devido a vários níveis de caches (discos físicos, controladores, arrays etc.) o acesso às segundas páginas já seria resolvido em cache (o tamanho de página do Informix é pequeno e nenhum sistema de armazenamento faz transferências de "apenas" 2 ou 4KB em cada operação). Ou seja, os segundos acessos raramente necessitariam de uma operação física em disco
Será isto suficiente para explicar a diferença de performance? Sim! Sem sombra para dúvidas. Testámos carregar a tabela de eventos de forma "ordenada" e os tempos de execução voltaram a rondar 1H.
Eis uma boa razão para a existência de CLUSTERED INDEXES.
INDEX SKIP SCAN
Bom, então como está relacionado o problema acima com a funcionalidade chamada INDEX SKIP SCAN? Para ser honesto não está... INDEX SKIP SCAN não seria utilizado na situação acima. Mas (ainda) não estou completamente louco. Na prática esta funcionalidade resolve o mesmo problema. Como funciona? O INDEX SKIP SCAN foi introduzido juntamente com o MULTI INDEX SCAN. Quando usamos mais que um índice numa mesma tabela, obtemos tantos conjuntos de rowIDs quanto os índices que estamos a utilizar. E é muito provável que haja duplicações entre estes conjuntos. Por isso faz todo o sentido que eliminemos (SKIP) os duplicados para evitar tentar obter a mesma linha duas vezes. É daí que vem o nome. Cada conjunto de rowIDs têm de ser ordenado e depois é feito o
merge dos mesmos.
Apesar de ter sido desenhado para o MULTI INDEX ACCESS, o processo pode ser útil para um único indíce também. Antes desta funcionalidade o acesso às páginas de dados era directo e pela ordem que os rowIDs eram encontrados. Com a funcionalidade activa é feita uma ordenação dos valores de rowID e depois então é feito o acesso aos dados. Daqui resulta uma probabilidade muito maior de que se façam I/Os de páginas consecutivas ou pelo menos mais próximas. Isto permite aproveitar melhor as
caches que estejam envolvidas.
Testando
Com o objectivo de mostrar os efeitos da distribuição fisica dos dados, preparei um caso de teste relativamente simples. Gerei 2M de linhas para uma tableam usando um
script AWK. Esta tabela tem apenas duas colunas:
- col1 INTEGER
- col2 CHAR(75)
Os dados foram criados de uma forma em que os valores da col1 variam entre 1 e 40000. Em média, cada um destes valores corresponde a 50 registos na tabela. O valor mais repeitdo tem 82 ocorrências:
castelo@primary:informix-> dbaccess stores -
Database selected.
> select first 3 col1 c1, count(*) c2 from test_data group by 1 order by 2 desc;
select first 3 col1 c1, count(*) c2 from test_data group by 1 order by 2;
select avg(c2) from (select col1 c1, count(*) c2 from test_data group by 1 );
c1 c2
38137 82
30478 79
29224 79
3 row(s) retrieved.
>
c1 c2
33710 24
39038 26
32313 26
3 row(s) retrieved.
>
(avg)
50.0000000000000
1 row(s) retrieved.
>
Depois criei alguns
scripts SQL:
- test_query_count
Faz um COUNT(*) simples WHERE col1 BETWEEN 1000 and 1400
- test_query_data_full
Faz um SELECT * dos mesmos registos, forçando um SEQUENTIAL SCAN
- test_query_data_index_skip_scan
Faz um SELECT * dos mesmos registos forçando a nova funcionalidade
- test_query_data_index
Faz um SELECT * dos mesmos registos, usando o normal INDEX PATH
- test_query_force_sort
Este é o mais rebuscado. Faz um SELECT * usando ROWIDs obtidos numa inline view que usa o mesmo critério e um ORDERs dos ROWIDs.
O objectivo é simular a funcionalidade, numa forma que seja compatível com versões anteriores
Todos os
scripts escrevem o plano de execução num ficheiro e inserem numa tabela de controlo um registo que tem o identificador da query e o tempo de início e fim da mesma..
castelo@primary:informix-> for script in test_query*.sql
do
printf '===============================================\n%s\n===============================================\n' $script
cat $script
printf "\n\n\n"
done
===============================================
test_query_count.sql
===============================================
!rm sqexplain_query_count.out
SET EXPLAIN FILE TO 'sqexplain_query_count.out';
INSERT INTO test_data_results (access_type, query_start) VALUES ('Simple count', CURRENT YEAR TO FRACTION(3));
SELECT --+ EXPLAIN
COUNT(*) FROM test_data WHERE col1 BETWEEN 1000 AND 1400;
UPDATE test_data_results SET query_end = CURRENT YEAR TO FRACTION(3) WHERE access_type = 'Simple count';
===============================================
test_query_data_full.sql
===============================================
!rm sqexplain_data_full.out
SET EXPLAIN FILE TO 'sqexplain_data_full.out';
INSERT INTO test_data_results (access_type, query_start) VALUES ('Sequential scan (force)', CURRENT YEAR TO FRACTION(3));
UNLOAD TO /dev/null
SELECT --+ EXPLAIN, FULL (test_data)
* FROM test_data WHERE col1 BETWEEN 1000 AND 1400;
UPDATE test_data_results SET query_end = CURRENT YEAR TO FRACTION(3) WHERE access_type = 'Sequential scan (force)';
===============================================
test_query_data_index_skip_scan.sql
===============================================
!rm sqexplain_data_index_skip_scan.out
SET EXPLAIN FILE TO 'sqexplain_data_index_skip_scan.out';
INSERT INTO test_data_results (access_type, query_start) VALUES ('Data with index skip scan (force)', CURRENT YEAR TO FRACTION(3));
UNLOAD TO /dev/null SELECT --+ EXPLAIN, MULTI_INDEX(test_data)
* FROM test_data WHERE col1 BETWEEN 1000 AND 1400;
UPDATE test_data_results SET query_end = CURRENT YEAR TO FRACTION(3) WHERE access_type = 'Data with index skip scan (force)';
===============================================
test_query_data_index.sql
===============================================
!rm sqexplain_data_index.out
SET EXPLAIN FILE TO 'sqexplain_data_index.out';
INSERT INTO test_data_results (access_type, query_start) VALUES ('Data with index', CURRENT YEAR TO FRACTION(5));
UNLOAD TO /dev/null SELECT --+ EXPLAIN
* FROM test_data WHERE col1 BETWEEN 1000 AND 1400;
UPDATE test_data_results SET query_end = CURRENT YEAR TO FRACTION(3) WHERE access_type = 'Data with index';
===============================================
test_query_force_sort.sql
===============================================
!rm sqexplain_force_sort.out
SET EXPLAIN FILE TO 'sqexplain_force_sort.out';
INSERT INTO test_data_results (access_type, query_start) VALUES ('Data with index and sorted rowIDs', CURRENT YEAR TO FRACTION(3));
UNLOAD TO /dev/null SELECT --+ EXPLAIN
a.* FROM test_data a, (SELECT rowid r FROM test_data c WHERE col1 BETWEEN 1000 AND 1400 ORDER BY 1) b
WHERE a.rowid = b.r;
UPDATE test_data_results SET query_end = CURRENT YEAR TO FRACTION(3) WHERE access_type = 'Data with index and sorted rowIDs';
Por último criei um
script SHELL que recria e carrega a tabela de dados e a tabela de controlo. Depois executa cada um destes
scripts, fazendo um reinício do motor para limpar
caches. Vejamos os resultados:
castelo@primary:informix-> cat test_script.sh;printf "\n\n";./test_script.sh 2>&1| grep -v "^$"
#!/bin/bash
printf "Recreating and loading table...\n"
dbaccess stores <<EOF
DROP TABLE IF EXISTS test_data;
DROP TABLE IF EXISTS test_data_results;
CREATE RAW TABLE test_data
(
col1 INTEGER,
col2 CHAR(75)
) IN dbs1 EXTENT SIZE 5000 NEXT SIZE 5000;
BEGIN WORK;
LOCK TABLE test_data IN EXCLUSIVE MODE;
LOAD FROM test_data.unl INSERT INTO test_data;
COMMIT WORK;
ALTER TABLE test_data TYPE (standard);
CREATE INDEX ix_test_data_1 ON test_data(col1) IN dbs2;
CREATE TABLE test_data_results
(
access_type VARCHAR(255),
query_start DATETIME YEAR TO FRACTION(3),
query_end DATETIME YEAR TO FRACTION(3)
);
EOF
printf "Done...\n"
for SCRIPT in test_query_count.sql test_query_data_full.sql test_query_data_index_skip_scan.sql test_query_data_index.sql test_query_force_sort.sql
do
printf "Stopping informix...\n"
onmode -ky
printf "Starting informix...\n"
oninit -w
printf "Running script $SCRIPT\n"
dbaccess stores $SCRIPT
printf "\nDone...\n"
done
dbaccess stores <<EOF
SELECT
access_type, query_start, query_end, query_end - query_start elapsed_time
FROM
test_data_results;
EOF
Recreating and loading table...
Database selected.
Table dropped.
Table dropped.
Table created.
Started transaction.
Table locked.
2000000 row(s) loaded.
Data committed.
Table altered.
Index created.
Table created.
Database closed.
Done...
Stopping informix...
Starting informix...
Running script test_query_count.sql
Database selected.
Explain set.
1 row(s) inserted.
(count(*))
20267
1 row(s) retrieved.
1 row(s) updated.
Database closed.
Done...
Stopping informix...
Starting informix...
Running script test_query_data_full.sql
Database selected.
Explain set.
1 row(s) inserted.
20267 row(s) unloaded.
1 row(s) updated.
Database closed.
Done...
Stopping informix...
Starting informix...
Running script test_query_data_index_skip_scan.sql
Database selected.
Explain set.
1 row(s) inserted.
20267 row(s) unloaded.
1 row(s) updated.
Database closed.
Done...
Stopping informix...
Starting informix...
Running script test_query_data_index.sql
Database selected.
Explain set.
1 row(s) inserted.
20267 row(s) unloaded.
1 row(s) updated.
Database closed.
Done...
Stopping informix...
Starting informix...
Running script test_query_force_sort.sql
Database selected.
Explain set.
1 row(s) inserted.
20267 row(s) unloaded.
1 row(s) updated.
Database closed.
Done...
Database selected.
access_type Simple count
query_start 2014-09-02 13:08:12.141
query_end 2014-09-02 13:08:12.259
elapsed_time 0 00:00:00.118
access_type Sequential scan (force)
query_start 2014-09-02 13:08:29.215
query_end 2014-09-02 13:08:36.838
elapsed_time 0 00:00:07.623
access_type Data with index skip scan (force)
query_start 2014-09-02 13:08:54.250
query_end 2014-09-02 13:09:03.403
elapsed_time 0 00:00:09.153
access_type Data with index
query_start 2014-09-02 13:09:21.684
query_end 2014-09-02 13:10:53.474
elapsed_time 0 00:01:31.790
access_type Data with index and sorted rowIDs
query_start 2014-09-02 13:11:12.682
query_end 2014-09-02 13:11:22.874
elapsed_time 0 00:00:10.192
5 row(s) retrieved.
Database closed.
castelo@primary:informix->
Podemos ver os resultados numa tabela:
Query
|
Start
|
Stop
|
Elapsed
|
Simple count
|
13:08:12.141
|
13:08:12.259
|
00:00:00.118 (0s)
|
Sequential scan (force)
|
13:08:29.215
|
13:08:36.838
|
00:00:07.623 (7.6s)
|
Data with index skip scan (force)
|
13:08:54.250
|
13:09:03.403
|
00:00:09.153 (9.1s)
|
Data with index
|
13:09:21.684
|
13:10:53.474
|
00:01:31.790 (1m31s)
|
Data with index and sorted rowIDs
|
13:11:12.682
|
13:11:22.874
|
00:00:10.192 (10.1s)
|
Aqui ficam os pontos importantes:
- SELECT COUNT(*) é extremamente rápido. Apenas conta o número de entradas no índice que verificam o critério
- Uma leitura sequencial e integral da tabela demora cerca de 10s. O tamanho da tabela é pequeno.
- Forçando o acesso com o MULTI_INDEX permitimos o INDEX SKIP SCAN e demora aproximadamente o mesmo que o sequential scan neste caso.
- Através do normal INDEX PATH, é extremamente lento. 1.5 minutos comparado com cerca de 10s das outras alternativas. É este o preço que pagamos por leituras dispersas
- Simulando a funcionalidade, com o truque de ordenar os rowids, conseguimos um valor parecido com o obtido usando a própria funcionalidade
Também usei um
script que já referi noutr(s) artigos, chamado ixprofiling, para demonstrar o trabalho real feito por cada query. Para a query que usa o INDEX SKIP SCAN:
castelo@primary:informix-> onmode -ky;oninit -w;ixprofiling -z -i stores test1.sql
Database selected.
Engine statistics RESETed. Query results:
Query start time: 17:31:06.017924000
UNLOAD TO /dev/null SELECT --+ EXPLAIN, MULTI_INDEX(test_data)
* FROM test_data WHERE col1 BETWEEN 1000 AND 1400;
20267 row(s) unloaded.
Query stop time: 17:31:15.806221000
Thread profiles (SID: 6)
LkReq LkWai DLks TOuts LgRec IsRd IsWrt IsRWr IsDel BfRd BfWrt LgUse LgMax SeqSc Srts DskSr SrtMx Sched CPU Time Name
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----------- ------------
40641 0 0 0 0 526 0 0 0 18335 0 0 0 2 1 0 0 36962 0.340086802 sqlexec
LkWs IOWs nIOW IdxBR Name
------------ ------------ ------------ ------------ -----------------------------------
0.0 9.3030077117 18157 0 sqlexec
Partitions profiles (Database: stores)
LkReq LkWai DLks TOuts DskRd DskWr IsRd IsWrt IsRWr IsDel BfRd BfWrt SeqSc Object name
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ------------------------------------------------------
20267 0 0 0 18018 0 495 0 0 0 18076 0 1 test_data
20307 0 0 0 109 0 0 0 0 0 365 0 0 test_data#ix_test_data_1
Database closed.
real 0m11.807s
user 0m0.000s
sys 0m0.090s
castelo@primary:informix->
Para a query normal:
castelo@primary:informix-> onmode -ky;oninit -w;ixprofiling -z -i stores test2.sql
Database selected.
Engine statistics RESETed. Query results:
Query start time: 17:45:39.873507000
UNLOAD TO /dev/null SELECT --+ EXPLAIN
* FROM test_data WHERE col1 BETWEEN 1000 AND 1400;
20267 row(s) unloaded.
Query stop time: 17:47:11.803271000
Thread profiles (SID: 6)
LkReq LkWai DLks TOuts LgRec IsRd IsWrt IsRWr IsDel BfRd BfWrt LgUse LgMax SeqSc Srts DskSr SrtMx Sched CPU Time Name
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----------- ------------
40720 0 0 0 0 31 0 0 0 20608 0 0 0 1 0 0 0 950 0.261092628 sqlexec
LkWs IOWs nIOW IdxBR Name
------------ ------------ ------------ ------------ -----------------------------------
0.0 6.2963172635 320 0 sqlexec
Partitions profiles (Database: stores)
LkReq LkWai DLks TOuts DskRd DskWr IsRd IsWrt IsRWr IsDel BfRd BfWrt SeqSc Object name
----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ------------------------------------------------------
20267 0 0 0 17921 0 0 0 0 0 39562 0 0 test_data
20386 0 0 0 206 0 0 0 0 0 587 0 0 test_data#ix_test_data_1
Database closed.
real 1m35.757s
user 0m0.050s
sys 0m0.080s
castelo@primary:informix->
Notas:
- O número de leituras em disco é aproximadamente o mesmo
- O número de leituras de buffers por partição na segunda query é sinificativamente mais alto
- O tempo de CPU na verdade é mais alto (pouco) na primeira opção
Se compararmos algumas secções de um
onstat -g ioa podemos confirmar as semlhanças entre o trabalho feito de uma forma e de outra:
I/O queues:
AIO I/O queues com INDEX SKIP SCAN:
q name/id len maxlen totalops dskread dskwrite dskcopy
fifo 0 0 0 0 0 0 0
drda_dbg 0 0 0 0 0 0 0
sqli_dbg 0 0 0 0 0 0 0
kio 0 0 32 725 408 317 0
kio 1 0 33 20342 20274 68 0
adt 0 0 0 0 0 0 0
msc 0 0 1 4 0 0 0
AIO I/O queues para a query normal:
q name/id len maxlen totalops dskread dskwrite dskcopy
fifo 0 0 0 0 0 0 0
drda_dbg 0 0 0 0 0 0 0
sqli_dbg 0 0 0 0 0 0 0
kio 0 0 33 9667 9653 14 0
kio 1 0 33 11051 11035 16 0
adt 0 0 0 0 0 0 0
msc 0 0 1 4 0 0 0
Aparentemente o mais lento até fez menos trabalho. Mais algumas escritas para o primeiro. É difícil extrair o significado disto
AIO I/O vps com INDEX SKIP SCAN:
class/vp/id s io/s totalops dskread dskwrite dskcopy wakeups io/wup errors tempops
fifo 7 0 i 0.0 0 0 0 0 0 0.0 0 0
kio -1 0 i 0.5 379 210 169 0 579 0.7 0 0
kio -1 1 i 25.4 19839 19792 47 0 33783 0.6 0 0
msc 6 0 i 0.0 4 0 0 0 4 1.0 0 9
aio 5 0 i 0.6 469 195 84 0 469 1.0 0 0
[...]
AIO I/O vps para a query normal:
class/vp/id s io/s totalops dskread dskwrite dskcopy wakeups io/wup errors tempops
fifo 7 0 i 0.0 0 0 0 0 0 0.0 0 0
kio -1 0 i 56.2 9506 9492 14 0 15327 0.6 0 0
kio -1 1 i 61.5 10395 10379 16 0 14895 0.7 0 0
msc 6 0 i 0.0 4 0 0 0 4 1.0 0 9
aio 5 0 i 2.7 459 195 74 0 459 1.0 0 460
[...]
Grosso modo o mesmo trabalho, mas mais distribuído entre as
threads kio no caso da query mais lenta. Possivelmente uma indicação de que as operações estão a demorar mais.
AIO big buffer usage summary com INDEX SKIP SCAN:
class reads writes
pages ops pgs/op holes hl-ops hls/op pages ops pgs/op
fifo 0 0 0.00 0 0 0.00 0 0 0.00
drda_dbg 0 0 0.00 0 0 0.00 0 0 0.00
sqli_dbg 0 0 0.00 0 0 0.00 0 0 0.00
kio 21665 20002 1.08 983 151 6.51 871 216 4.03
AIO big buffer usage summary para a query normal:
class reads writes
pages ops pgs/op holes hl-ops hls/op pages ops pgs/op
fifo 0 0 0.00 0 0 0.00 0 0 0.00
drda_dbg 0 0 0.00 0 0 0.00 0 0 0.00
sqli_dbg 0 0 0.00 0 0 0.00 0 0 0.00
kio 22576 19871 1.14 1888 279 6.77 498 30 16.60
Não existe uma grande diferença no número de operações
Conclusão
Nunca subestime o efeito real da dispersão de dados em disco. Esta situação demonstra o potencial impacto com clareza. Numa outra situação num cliente, que tive muito recentemente, verifiquei uma degradação de
performance de 4 vezes quando comparava o método de ordenar os ROWIDs com o método "natural". Gostaria de ver esta funcionalidade ser usada mais naturalmente no Informix (fora dos cenários de utilização do MULTI_INDEX path).