If you have a table with a column included as the first column in a multi-column index and then again with it’s own index, you may be over indexing. Postgres will use the multi-column index for queries on the first column. First a pointer to the postgres docs that I can never find, and then data on performance of multi-column indexes vs single.
From the docs
A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.
Performance
If you click around that section of the docs, you’ll surely come across the section on multi-column indexing and performance, in particular this section (bold emphasis mine):
You could also create a multicolumn index on (x, y). This index would typically be more efficient than index combination for queries involving both columns, but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone
Life is full of tradeoffs performance wise, so we should explore just how much slower it is to use a multi-column index for single column queries.
First, lets create a dummy table:
CREATE TABLE foos_and_bars
(
id serial NOT NULL,
foo_id integer,
bar_id integer,
CONSTRAINT foos_and_bars_pkey PRIMARY KEY (id)
)
Then, using R, we’ll create 3 million rows of nicely distributed data:
rows = 3000000
foo_ids = seq(1,250000,1)
bar_ids = seq(1,20,1)
data = data.frame(foo_id = sample(foo_ids, rows,TRUE), bar_id= sample(bar_ids,rows,TRUE))
Dump that to a text file and load it up with copy and we’re good to go.
Create the compound index
CREATE INDEX foo_id_and_bar_id_index
ON foos_and_bars
USING btree
(foo_id, bar_id);
Run a simple query to make sure the index is used:
test_foo=# explain analyze select * from foos_and_bars where foo_id = 123;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foos_and_bars (cost=4.68..55.74 rows=13 width=12) (actual time=0.026..0.038 rows=8 loops=1)
Recheck Cond: (foo_id = 123)
-> Bitmap Index Scan on foo_id_and_bar_id_index (cost=0.00..4.68 rows=13 width=0) (actual time=0.020..0.020 rows=8 loops=1)
Index Cond: (foo_id = 123)
Total runtime: 0.072 ms
(5 rows)
If you have a table with a column included as the first column in a multi-column index and then again with it’s own index, you may be over indexing. Postgres will use the multi-column index for queries on the first column. First a pointer to the postgres docs that I can never find, and then data on performance of multi-column indexes vs single.
From the docs
A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.
Performance
If you click around that section of the docs, you’ll surely come across the section on multi-column indexing and performance, in particular this section (bold emphasis mine):
You could also create a multicolumn index on (x, y). This index would typically be more efficient than index combination for queries involving both columns, but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone
Life is full of tradeoffs performance wise, so we should explore just how much slower it is to use a multi-column index for single column queries.
First, lets create a dummy table:
CREATE TABLE foos_and_bars
(
id serial NOT NULL,
foo_id integer,
bar_id integer,
CONSTRAINT foos_and_bars_pkey PRIMARY KEY (id)
)
Then, using R, we’ll create 3 million rows of nicely distributed data:
rows = 3000000
foo_ids = seq(1,250000,1)
bar_ids = seq(1,20,1)
data = data.frame(foo_id = sample(foo_ids, rows,TRUE), bar_id= sample(bar_ids,rows,TRUE))
Dump that to a text file and load it up with copy and we’re good to go.
Create the compound index
CREATE INDEX foo_id_and_bar_id_index
ON foos_and_bars
USING btree
(foo_id, bar_id);
Run a simple query to make sure the index is used:
test_foo=# explain analyze select * from foos_and_bars where foo_id = 123;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foos_and_bars (cost=4.68..55.74 rows=13 width=12) (actual time=0.026..0.038 rows=8 loops=1)
Recheck Cond: (foo_id = 123)
-> Bitmap Index Scan on foo_id_and_bar_id_index (cost=0.00..4.68 rows=13 width=0) (actual time=0.020..0.020 rows=8 loops=1)
Index Cond: (foo_id = 123)
Total runtime: 0.072 ms
(5 rows)
If you have a table with a column included as the first column in a multi-column index and then again with it’s own index, you may be over indexing. Postgres will use the multi-column index for queries on the first column. First a pointer to the postgres docs that I can never find, and then data on performance of multi-column indexes vs single.
From the docs
A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.
Performance
If you click around that section of the docs, you’ll surely come across the section on multi-column indexing and performance, in particular this section (bold emphasis mine):
You could also create a multicolumn index on (x, y). This index would typically be more efficient than index combination for queries involving both columns, but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone
Life is full of tradeoffs performance wise, so we should explore just how much slower it is to use a multi-column index for single column queries.
First, lets create a dummy table:
CREATE TABLE foos_and_bars
(
id serial NOT NULL,
foo_id integer,
bar_id integer,
CONSTRAINT foos_and_bars_pkey PRIMARY KEY (id)
)
Then, using R, we’ll create 3 million rows of nicely distributed data:
rows = 3000000
foo_ids = seq(1,250000,1)
bar_ids = seq(1,20,1)
data = data.frame(foo_id = sample(foo_ids, rows,TRUE), bar_id= sample(bar_ids,rows,TRUE))
Dump that to a text file and load it up with copy and we’re good to go.
Create the compound index
CREATE INDEX foo_id_and_bar_id_index
ON foos_and_bars
USING btree
(foo_id, bar_id);
Run a simple query to make sure the index is used:
test_foo=# explain analyze select * from foos_and_bars where foo_id = 123;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foos_and_bars (cost=4.68..55.74 rows=13 width=12) (actual time=0.026..0.038 rows=8 loops=1)
Recheck Cond: (foo_id = 123)
-> Bitmap Index Scan on foo_id_and_bar_id_index (cost=0.00..4.68 rows=13 width=0) (actual time=0.020..0.020 rows=8 loops=1)
Index Cond: (foo_id = 123)
Total runtime: 0.072 ms
(5 rows)
Now we’ll make 100 queries by foo_id with this index, and then repeat with the single index installed using this code:
require 'rubygems'
require 'benchmark'
require 'pg'
TEST_IDS = [...] #randomly selected 100 ids in R
conn = PGconn.open(:dbname => 'test_foo')
def perform_test(conn,foo_id)
time = Benchmark.realtime do
res = conn.exec("select * from foos_and_bars where foo_id = #{foo_id}")
res.clear
end
end
TEST_IDS.map {|id| perform_test(conn,id)} #warm things up?
data = TEST_IDS.map {|id| perform_test(conn,id)}
data.each do |d|
puts d
end
How do things stack up? I’d say about evenly:
If you have a table with a column included as the first column in a multi-column index and then again with it’s own index, you may be over indexing. Postgres will use the multi-column index for queries on the first column. First a pointer to the postgres docs that I can never find, and then data on performance of multi-column indexes vs single.
From the docs
A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.
Performance
If you click around that section of the docs, you’ll surely come across the section on multi-column indexing and performance, in particular this section (bold emphasis mine):
You could also create a multicolumn index on (x, y). This index would typically be more efficient than index combination for queries involving both columns, but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone
Life is full of tradeoffs performance wise, so we should explore just how much slower it is to use a multi-column index for single column queries.
First, lets create a dummy table:
CREATE TABLE foos_and_bars
(
id serial NOT NULL,
foo_id integer,
bar_id integer,
CONSTRAINT foos_and_bars_pkey PRIMARY KEY (id)
)
Then, using R, we’ll create 3 million rows of nicely distributed data:
rows = 3000000
foo_ids = seq(1,250000,1)
bar_ids = seq(1,20,1)
data = data.frame(foo_id = sample(foo_ids, rows,TRUE), bar_id= sample(bar_ids,rows,TRUE))
Dump that to a text file and load it up with copy and we’re good to go.
Create the compound index
CREATE INDEX foo_id_and_bar_id_index
ON foos_and_bars
USING btree
(foo_id, bar_id);
Run a simple query to make sure the index is used:
test_foo=# explain analyze select * from foos_and_bars where foo_id = 123;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foos_and_bars (cost=4.68..55.74 rows=13 width=12) (actual time=0.026..0.038 rows=8 loops=1)
Recheck Cond: (foo_id = 123)
-> Bitmap Index Scan on foo_id_and_bar_id_index (cost=0.00..4.68 rows=13 width=0) (actual time=0.020..0.020 rows=8 loops=1)
Index Cond: (foo_id = 123)
Total runtime: 0.072 ms
(5 rows)
If you have a table with a column included as the first column in a multi-column index and then again with it’s own index, you may be over indexing. Postgres will use the multi-column index for queries on the first column. First a pointer to the postgres docs that I can never find, and then data on performance of multi-column indexes vs single.
From the docs
A multicolumn B-tree index can be used with query conditions that involve any subset of the index’s columns, but the index is most efficient when there are constraints on the leading (leftmost) columns.
Performance
If you click around that section of the docs, you’ll surely come across the section on multi-column indexing and performance, in particular this section (bold emphasis mine):
You could also create a multicolumn index on (x, y). This index would typically be more efficient than index combination for queries involving both columns, but as discussed in Section 11.3, it would be almost useless for queries involving only y, so it should not be the only index. A combination of the multicolumn index and a separate index on y would serve reasonably well. For queries involving only x, the multicolumn index could be used, though it would be larger and hence slower than an index on x alone
Life is full of tradeoffs performance wise, so we should explore just how much slower it is to use a multi-column index for single column queries.
First, lets create a dummy table:
CREATE TABLE foos_and_bars
(
id serial NOT NULL,
foo_id integer,
bar_id integer,
CONSTRAINT foos_and_bars_pkey PRIMARY KEY (id)
)
Then, using R, we’ll create 3 million rows of nicely distributed data:
rows = 3000000
foo_ids = seq(1,250000,1)
bar_ids = seq(1,20,1)
data = data.frame(foo_id = sample(foo_ids, rows,TRUE), bar_id= sample(bar_ids,rows,TRUE))
Dump that to a text file and load it up with copy and we’re good to go.
Create the compound index
CREATE INDEX foo_id_and_bar_id_index
ON foos_and_bars
USING btree
(foo_id, bar_id);
Run a simple query to make sure the index is used:
test_foo=# explain analyze select * from foos_and_bars where foo_id = 123;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foos_and_bars (cost=4.68..55.74 rows=13 width=12) (actual time=0.026..0.038 rows=8 loops=1)
Recheck Cond: (foo_id = 123)
-> Bitmap Index Scan on foo_id_and_bar_id_index (cost=0.00..4.68 rows=13 width=0) (actual time=0.020..0.020 rows=8 loops=1)
Index Cond: (foo_id = 123)
Total runtime: 0.072 ms
(5 rows)
Now we’ll make 100 queries by foo_id with this index, and then repeat with the single index installed using this code:
require 'rubygems'
require 'benchmark'
require 'pg'
TEST_IDS = [...] #randomly selected 100 ids in R
conn = PGconn.open(:dbname => 'test_foo')
def perform_test(conn,foo_id)
time = Benchmark.realtime do
res = conn.exec("select * from foos_and_bars where foo_id = #{foo_id}")
res.clear
end
end
TEST_IDS.map {|id| perform_test(conn,id)} #warm things up?
data = TEST_IDS.map {|id| perform_test(conn,id)}
data.each do |d|
puts d
end
How do things stack up? I’d say about evenly:
If you’re hooking up a Mac OS X machine to a 1080p monitor via a mini displayport to HDMI adapter, you may find your display settings doesn’t have a 1920×1080 setting, and the 1080p setting produces an image with the edges cut off. Adjusting the overscan/underscan slider will make the image fit, but it turns fuzzy.
Solution: check the monitor’s settings. In my ViewSonic VX2453 the HDMI inputs have 2 settings “AV” and “PC”. Switching it to PC solved the problem, and now the picture is exactly the right size and crisp.
I spent some time futzing around with SwitchRes and several fruitless reboots before discovering the setting, so I hope this saves someone time!