[isf-wifidog] Re: [wirelesstoronto-discuss] Re: PostGresql

Benoit Grégoire bock at step.polymtl.ca
Mar 24 Jan 15:08:29 EST 2006


On January 24, 2006 12:13 pm, Rein Petersen wrote:
> Hey Rob,
>
> Have you come across any discussion at ISF about their database schema
> problem (if it is still a problem)? I wonder if they really know that it is
> a problem? I'm a little surprised that they're using a powerful db like pg
> but have a completely screwed up db full of text fields - really bad
> form... Maybe we just have an older build and it has been fixed in the
> build on the devauth server. I just need a pg client on devauth to check
> that out... Rob can you set up phpPgAdmin on devauth ? Or, have you come
> across some db.phpscript that creates the database and tables?
>
> *If it hasn't changed*, and before I go rushing to change anything, I think
> it is wise try to help ISF understand the problem of a db with tables full
> of pointers to text blobs (changing a database schema in any collaborative
> software projects raises ire).
>
> Any database table or index is scanned in "pages" or "extents" (depending
> on the nomenclature of the database you are using). In MS SQL a "page" is
> 64K, in Oracle an "extent" is 84K. This means that each query for a given
> table or view will start at the beginning of that table and read the table
> in chunks of 64 or 84 K (I don't know what it is for postgresql but I'm
> certain it is in that range).
>
> So, a table with an integer id field (32 bytes), a char(12), a char(20)
> will produce rows that consume 64 bytes. Thats means, the first page that
> returns to the db from a query of this table will hold 1024 rows (1024 * 64
> bytes = 64K). So this explains the importance of keeping tables slim as it
> will be able to return more rows for each "page" or "extent" scan.
>
> table 1                 table2
> ___________________     ______________________________________
>
> This is a major factor in database design for performance.
>
> Now the 'Text' field or 'Blob' field is a little different and
> understanding the difference for good database design is very important.
> Any 'Text' or 'blob' field actually contains a 32byte pointer to another
> location on the harddrive where all such blobs are stored. And while you
> might argue that storing a name (ie.a longer field for storing a full venue
> name) that might otherwise assume 64bytes (char64) can keep the row smaller
> by only storing a 32 byte text pointer will provide greater speed - is
> actually causing a completely new "extent" or "page" scan on a different
> place on the drive. Do this with several fields and you'll have wild disk
> thrashing.

This is all a very interesting exposé on classic database implementation 
performance tips.  However, before everyone thinks that no one at ISF knows 
what they are talking about, I would kindly point you to postgresql 
documentation (which is more authoritative than you and I):  
http://www.postgresql.org/docs/8.1/interactive/datatype-character.html

There are NO performance difference in postgres between text, varchar(n), 
varchar(10), or even char(10).  Long values are stored in background tables 
so they don't interfere with fast access to shorter values.  Shorter values 
are stored in the the same extent as the rest of the data. In fact the manual 
makes the following recommandation:

Tip:  There are no performance differences between these three types, apart 
from the increased storage size when using the blank-padded type. While 
character(n) has performance advantages in some other database systems, it 
has no such advantages in PostgreSQL. In most situations text or character 
varying should be used instead.

> Worse, both the db and middle tier db connector must make allowances for
> the potential size of the results where each 'text' field usually consumes
> 2048KB - memory is put aside to hold that for text field in a result row.
> So what should be a simple query is causing major disk thrashing and insane
> (no really, really insane) memory consumption on both the db server and web
> server.

Depending on the driver design, it may indeed decide to reserve large amounts 
of memory.  I doubt it for a postgres ont, since the text is actually 
unlimited in length.  So how much memory a given driver pre-allocates is 
implementation dependent.

Crazy trashing doesn't happen except for very long actual data (not data 
types).  Even then, it would only happen if you do select * without needing 
the whole data, which is a bad idea anyway for reasons I don't need to 
explain.  

> I understand that in development, it is handy to have elbow room for fields
> that may need the potential to grow, but I think wifidog is well beyond
> that point and a schema can always grow without losing data (reducing the
> size is what causes problems).

Quite true, and since there are no performance in portgres we decided to 
expand most of the types to text early on.

> Here is a the schema for the nodes table on the auth server:
>
> nodes ------------------------------------------------------------
> last_heartbeat_ip           char(16)    [should be inet]

That was because MySql didn't support it.  Now that we abandoned the idea of 
supporting MySql, that is obviously no longer an issue.

> last_heartbeat_timestamp    timetz      [why no timezone?]

I don't know, we could put a timezone.  It doesn't really matter however since 
the timestamp is generated by postgres, so it should always be in the same 
timezone.

> description                 text        [normalize!]
> mass_transit_info           text        [normalize!]

I'm not sure what you mean by normalize in that context.

> node_deployment_status      char(32)    [normalize or constraint!]

It's constrained by the node_deployment_status table.

> venue_type                  text        [normalize or constraint!]

It's constrained by the venue_type table.             

> max_monthly_incoming        int8        [should be int4]
> max_monthly_outgoing        int8        [should be int4]

No, that would limit to the maximum data transferred by a not to 4 gigabytes.

> quota_reset_day_of_month    integer     [should be int2]

Fair enough.

(For clarity, I removed all columns you didn't comment on, and all those that 
had should be varchar(*), which I adressed above)

> The truth of the matter is that this wifidog db needs to be trimmed in size
> in a major way. The node and user tables in particular because they are hit
> frequently and most often. Altering all the unnecessary text fields to
> appropriate types and sizes in this db schema will result in at least 1000
> fold increase in performance (I don't know the actual number but I know it
> would actually be waaaay more than that so it is easy to boast 1000).

I'll be very humbled if it does, but I'm pretty certain it will be barely 
measurable, if it exists.

> Everybody scratches their head wondering why the simple node and user
> queries take painfully long and slams the hell out of our server - now you
> know why. Before we go into production with our new server, this *must* be
> addressed.

That is not the reason that simple node and user queries slam the hell out of 
your server.  Postgres being a true ACID database, every time a hotspot 
heartbeats it creates a additional row in the nodes table.  It would appear 
that you have a problem with your vacuuming.  In postgres it's an externally 
controlled process (like MySQL 5).  I recommend you try running VACUUM FULL 
ANALYSE once, and see if your problem is solved.  If it is, make sure you run 
VACUUM ANALYSE once an hour or at least once a day from a script.

Finallly, this kind of discussion should really be on the wifidog mailing 
list.
-- 
Benoit Grégoire, http://benoitg.coeus.ca/
-------------- section suivante --------------
Une pièce jointe non texte a été nettoyée...
Nom: non disponible
Type: application/pgp-signature
Taille: 189 octets
Desc: non disponible
Url: http://listes.ilesansfil.org/pipermail/wifidog/attachments/20060124/6b5d3976/attachment.pgp


More information about the WiFiDog mailing list