[isf-wifidog] Re: [wirelesstoronto-discuss] Re: PostGresql

Rein Petersen rein.petersen at gmail.com
Mar 24 Jan 15:55:05 EST 2006


Wow, Postgresql is an interesting beast. I can see how the architecture
would make it the perfect statistical database because it pulls out larger
character data into background tables and leaves the main table nice 'n
slim. Using MS SQL, I've had to normalize character data into referencing
tables for statistical solutions.

In databases where character information in king (like most web solutions),
I think the architecture is less desirable.

But you're right in that there was no need to raise alarm about the Wifidog
db schema while Postgresql is the intended server.

Thanks for the info on Postgresql, I'm looking forward to using it (for
statistics) in the future.

Rein

On 1/24/06, Benoit Grégoire <bock at step.polymtl.ca> wrote:
>
> On January 24, 2006 12:13 pm, Rein Petersen wrote:
> > Hey Rob,
> >
> > Have you come across any discussion at ISF about their database schema
> > problem (if it is still a problem)? I wonder if they really know that it
> is
> > a problem? I'm a little surprised that they're using a powerful db like
> pg
> > but have a completely screwed up db full of text fields - really bad
> > form... Maybe we just have an older build and it has been fixed in the
> > build on the devauth server. I just need a pg client on devauth to check
> > that out... Rob can you set up phpPgAdmin on devauth ? Or, have you come
> > across some db.phpscript that creates the database and tables?
> >
> > *If it hasn't changed*, and before I go rushing to change anything, I
> think
> > it is wise try to help ISF understand the problem of a db with tables
> full
> > of pointers to text blobs (changing a database schema in any
> collaborative
> > software projects raises ire).
> >
> > Any database table or index is scanned in "pages" or "extents"
> (depending
> > on the nomenclature of the database you are using). In MS SQL a "page"
> is
> > 64K, in Oracle an "extent" is 84K. This means that each query for a
> given
> > table or view will start at the beginning of that table and read the
> table
> > in chunks of 64 or 84 K (I don't know what it is for postgresql but I'm
> > certain it is in that range).
> >
> > So, a table with an integer id field (32 bytes), a char(12), a char(20)
> > will produce rows that consume 64 bytes. Thats means, the first page
> that
> > returns to the db from a query of this table will hold 1024 rows (1024 *
> 64
> > bytes = 64K). So this explains the importance of keeping tables slim as
> it
> > will be able to return more rows for each "page" or "extent" scan.
> >
> > table 1                 table2
> > ___________________     ______________________________________
> >
> > This is a major factor in database design for performance.
> >
> > Now the 'Text' field or 'Blob' field is a little different and
> > understanding the difference for good database design is very important.
> > Any 'Text' or 'blob' field actually contains a 32byte pointer to another
> > location on the harddrive where all such blobs are stored. And while you
> > might argue that storing a name (ie.a longer field for storing a full
> venue
> > name) that might otherwise assume 64bytes (char64) can keep the row
> smaller
> > by only storing a 32 byte text pointer will provide greater speed - is
> > actually causing a completely new "extent" or "page" scan on a different
> > place on the drive. Do this with several fields and you'll have wild
> disk
> > thrashing.
>
> This is all a very interesting exposé on classic database implementation
> performance tips.  However, before everyone thinks that no one at ISF
> knows
> what they are talking about, I would kindly point you to postgresql
> documentation (which is more authoritative than you and I):
> http://www.postgresql.org/docs/8.1/interactive/datatype-character.html
>
> There are NO performance difference in postgres between text, varchar(n),
> varchar(10), or even char(10).  Long values are stored in background
> tables
> so they don't interfere with fast access to shorter values.  Shorter
> values
> are stored in the the same extent as the rest of the data. In fact the
> manual
> makes the following recommandation:
>
> Tip:  There are no performance differences between these three types,
> apart
> from the increased storage size when using the blank-padded type. While
> character(n) has performance advantages in some other database systems, it
> has no such advantages in PostgreSQL. In most situations text or character
> varying should be used instead.
>
> > Worse, both the db and middle tier db connector must make allowances for
> > the potential size of the results where each 'text' field usually
> consumes
> > 2048KB - memory is put aside to hold that for text field in a result
> row.
> > So what should be a simple query is causing major disk thrashing and
> insane
> > (no really, really insane) memory consumption on both the db server and
> web
> > server.
>
> Depending on the driver design, it may indeed decide to reserve large
> amounts
> of memory.  I doubt it for a postgres ont, since the text is actually
> unlimited in length.  So how much memory a given driver pre-allocates is
> implementation dependent.
>
> Crazy trashing doesn't happen except for very long actual data (not data
> types).  Even then, it would only happen if you do select * without
> needing
> the whole data, which is a bad idea anyway for reasons I don't need to
> explain.
>
> > I understand that in development, it is handy to have elbow room for
> fields
> > that may need the potential to grow, but I think wifidog is well beyond
> > that point and a schema can always grow without losing data (reducing
> the
> > size is what causes problems).
>
> Quite true, and since there are no performance in portgres we decided to
> expand most of the types to text early on.
>
> > Here is a the schema for the nodes table on the auth server:
> >
> > nodes ------------------------------------------------------------
> > last_heartbeat_ip           char(16)    [should be inet]
>
> That was because MySql didn't support it.  Now that we abandoned the idea
> of
> supporting MySql, that is obviously no longer an issue.
>
> > last_heartbeat_timestamp    timetz      [why no timezone?]
>
> I don't know, we could put a timezone.  It doesn't really matter however
> since
> the timestamp is generated by postgres, so it should always be in the same
> timezone.
>
> > description                 text        [normalize!]
> > mass_transit_info           text        [normalize!]
>
> I'm not sure what you mean by normalize in that context.
>
> > node_deployment_status      char(32)    [normalize or constraint!]
>
> It's constrained by the node_deployment_status table.
>
> > venue_type                  text        [normalize or constraint!]
>
> It's constrained by the venue_type table.
>
> > max_monthly_incoming        int8        [should be int4]
> > max_monthly_outgoing        int8        [should be int4]
>
> No, that would limit to the maximum data transferred by a not to 4
> gigabytes.
>
> > quota_reset_day_of_month    integer     [should be int2]
>
> Fair enough.
>
> (For clarity, I removed all columns you didn't comment on, and all those
> that
> had should be varchar(*), which I adressed above)
>
> > The truth of the matter is that this wifidog db needs to be trimmed in
> size
> > in a major way. The node and user tables in particular because they are
> hit
> > frequently and most often. Altering all the unnecessary text fields to
> > appropriate types and sizes in this db schema will result in at least
> 1000
> > fold increase in performance (I don't know the actual number but I know
> it
> > would actually be waaaay more than that so it is easy to boast 1000).
>
> I'll be very humbled if it does, but I'm pretty certain it will be barely
> measurable, if it exists.
>
> > Everybody scratches their head wondering why the simple node and user
> > queries take painfully long and slams the hell out of our server - now
> you
> > know why. Before we go into production with our new server, this *must*
> be
> > addressed.
>
> That is not the reason that simple node and user queries slam the hell out
> of
> your server.  Postgres being a true ACID database, every time a hotspot
> heartbeats it creates a additional row in the nodes table.  It would
> appear
> that you have a problem with your vacuuming.  In postgres it's an
> externally
> controlled process (like MySQL 5).  I recommend you try running VACUUM
> FULL
> ANALYSE once, and see if your problem is solved.  If it is, make sure you
> run
> VACUUM ANALYSE once an hour or at least once a day from a script.
>
> Finallly, this kind of discussion should really be on the wifidog mailing
> list.
> --
> Benoit Grégoire, http://benoitg.coeus.ca/
>
>
>


--
Rein Petersen MCP MCP+I MCSE MCDBA MCAD
Software Engineer
-------------- section suivante --------------
Une pièce jointe HTML a été enlevée...
URL: http://listes.ilesansfil.org/pipermail/wifidog/attachments/20060124/ad295956/attachment-0001.htm


More information about the WiFiDog mailing list