Is PostgreSQL good enough?

tldr; you can do jobs, queues, real time change feeds, time series, object store, document store, full text search with PostgreSQL. How to, pros/cons, rough performance and complexity levels are all discussed. Many sources and relevant documentation is linked to.

Your database is first. But can PostgreSQL be second?

Web/app projects these days often have many distributed parts. It's not uncommon for groups to use the right tool for the job. The right tools are often something like the choice below.

Redis for queuing, and caching.
Elastic Search for searching, and log stash.
Influxdb or RRD for timeseries.
S3 for an object store.
PostgreSQL for relational data with constraints, and validation via schemas.
Celery for job queues.
Kafka for a buffer of queues or stream processing.
Exception logging with PostgreSQL (perhaps using Sentry)
KDB for low latency analytics on your column oriented data.
Mongo/ZODB for storing documents JSON (or mangodb for /dev/null replacement)
SQLite for embedded.
Neo4j for graph databases.
RethinkDB for your realtime data, when data changes, other parts 'react'.
...

For all the different nodes this could easily cost thousands a month, require lots of ops knowledge and support, and use up lots of electricity. To set all this up from scratch could cost one to four weeks of developer time depending on if they know the various stacks already. Perhaps you'd have ten nodes to support.

Could you gain an ops advantage by using only PostgreSQL? Especially at the beginning when your system isn't all that big, and your team size is small, and your requirements not extreme? Only one system to setup, monitor, backup, install, upgrade, etc.

This article is my humble attempt to help people answer the question...

Is PostgreSQL good enough?

Can it be 'good enough' for all sorts of different use cases? Or do I need to reach into another toolbox?

Every project is different, and often the requirements can be different. So this question by itself is impossible to answer without qualifiers. Many millions of websites and apps in the world have very few users (less than thousands per month), they might need to handle bursty traffic at 100x the normal rate some times. They might need interactive, or soft realtime performance requirements for queries and reports. It's really quite difficult to answer the question conclusively for every use case, and for every set of requirements. I will give some rough numbers and point to case studies, and external benchmarks for each section.

Most websites and apps don't need to handle 10 million visitors a month, or have 99.999% availability when 95% availability will do, ingest 50 million metric rows per day, or do 400,000 jobs per second, or query over TB's of data with sub millisecond response times.

Tool choice.

I've used a LOT of different databases over time. CDB, Elastic Search, Redis, SAP (is it a db or a COBOL?), BSDDB/GDBM, SQLite... Even written some where the requirements were impossible to match with off the shelf systems and we had to make them ourselves (real time computer vision processing of GB/second in from the network). Often PostgreSQL simply couldn't do the job at hand (or mysql was installed already, and the client insisted). But sometimes PostgreSQL was just merely not the best tool for the job.

A Tool Chest

Recently I read a book about tools. Woodworking tools, not programming tools. The whole philosophy of the book is a bit much to convey here... but The Anarchist's Tool Chest is pretty much all about tool choice (it's also a very fine looking book, that smells good too). One lesson it teaches is about when selecting a plane (you know the things for stripping wood). There are dozens of different types perfect for specific situations. There's also some damn good general purpose planes, and if you just select a couple of good ones you can get quite a lot done. Maybe not the best tool for the job, but at least you will have room for them in your tool chest. On the other hand, there are also swiss army knives, and 200 in one tools off teevee adverts. I'm pretty sure PostgreSQL is some combination of a minimal tool choice and the swiss army knife tool choice in the shape of a big blue solid elephant.

“PostgreSQL is an elephant sized tool chest that holds a LOT of tools.”

Batteries included?

Does PostgreSQL come with all the parts for full usability? Often the parts are built in, but maybe a bit complicated, but not everything is built in. But luckily there are some good libraries which make the features more usable ("for humans").

For from scratch people, I'll link to the PostgreSQL documentation. I'll also link to already made systems which already use PostgreSQL for (queues, time series, graphs, column stores, document data bases), which you might be able to use for your needs. This article will slanted towards the python stack, but there are definitely alternatives in the node/ruby/perl/java universes. If not, I've listed the PostgreSQL parts and other open source implementations so you can roll your own.

By learning a small number of PostgreSQL commands, it may be possible to use 'good enough' implementations yourself. You might be surprised at what other things you can implement by combining these techniques together.

Task, or job queues.

Recent versions of PostgeSQL support a couple of useful technologies for efficient and correct queues.

First is the LISTEN/NOTIFY. You can LISTEN for events, and have clients be NOTIFY'd when they happen. So your queue workers don't have to keep polling the database all the time. They can get NOTIFIED when things happen.

The recent addition in 9.5 of the SKIP LOCKED locking clause to PostgreSQL SELECT, enables efficient queues to be written when you have multiple writers and readers. It also means that a queue implementation can be correct [2].

Finally 9.6 saw plenty of VACUUM performance enhancements which help out with queues.

Batteries included?

~~A very popular job and task system is celery. It can support various SQL backends, including PostgreSQL through sqlalchemy and the Django ORM.~~ [ED: version 4.0 of celery doesn't have pg support]

A newer, and smaller system is called pq. It sort of models itself off the redis python 'rq' queue API. However, with pq you can have a transactional queue. Which is nice if you want to make sure other things are committed AND your job is in the queue. With a separate system this is a bit harder to guarantee.

Is it fast enough? pq states in its documentation that you can do 1000 jobs per second per core... but on my laptop it did around 2000. In the talk "Can elephants queue?" 10,000 messages per second are mentioned with eight clients.

More reading.

Full text search.

“Full text search — Searching the full text of the document, and not just the metadata.”

PostgreSQL has had full text search for quite a long time as a separate extension, and now it is built in. Recently, it's gotten a few improvements which I think now make it "good enough" for many uses.

The big improvement in 9.6 is phrase search. So if I search for "red hammer" I get things which have both of them - not things that are red, and things that are a hammer. It can also return documents where the first word is red, and then five words later hammer appears.

One other major thing that elastic search does is automatically create indexes on all the fields. You add a document, and then you can search it. That's all you need to do. PostgreSQL is quite a lot more manual than that. You need to tell it which fields to index, and update the index with a trigger on changes (see triggers for automatic updates). But there are some libraries which make things much easier. One of them is sqlalchemy_searchable. However, I'm not aware of anything as simple and automatic as elastic search here.

What about faceted search? These days it's not so hard to do at speed. [6][7]
What about substring search on an index (fast LIKE)? It can be made fast with a trigram index. [8][9]
Stemming? Yes. [11]
"Did you mean" fuzzy matching support? Yes. [11]
Accent support? (My name is René, and that last é breaks sooooo many databases). Yes. [11]
Multiple languages? Yes. [11]
Regex search when you need it? Yes. [13]

If your main data store is PostgreSQL and you export your data into Elasticsearch (you should NOT use elastic search as the main store, since it still crashes sometimes), then that's also extra work you need to do. With elastic search you also need to manually set weighting of different fields if you want the search to work well. So in the end it's a similar amount of work.

Using the right libraries, I think it's a similar amount of work overall with PostgreSQL. Elasticsearch is still easier initially. To be fair Lucene (which elasticsearch is based on) is a much more advanced text searching system.

What about the speed? They are index searches, and return fast - as designed. At [1] they mention that the speed is ok for 1-2 million documents. They also mention 50ms search time. It's also possible to make replicas for read queries if you don't want to put the search load on your main database. There is another report for searches taking 15ms [10]. Note that elastic search often takes 3-5ms for a search on that same authors hardware. Also note, that the new asyncpg PostgreSQL driver gives significant latency improvements for general queries like this (35ms vs 2ms) [14].

Hybrid searches (relational searches combined with full text search) is another thing that PostgreSQL makes pretty easy. Say you wanted to ask "Give me all companies who have employees who wrote research papers, stack overflow answers, github repos written with the text 'Deep Learning' where the authors live with within 50km of Berlin. PostgreSQL could do those joins fairly efficiently for you.

The other massive advantage of PostgreSQL is that you can keep the search index in sync. The search index can be updated in the same transaction. So your data is consistent, and not out of date. It can be very important for some applications to return the most recent data.

How about searching across multiple human natural languages at once? PostgreSQL allows you to efficiently join across multiple language search results. So if you type "red hammer" into a German hardware website search engine, you can actually get some results.

Anyone wanting more in-depth information should read or watch this FTS presentation [15] from last year. It's by some of the people who has done a lot of work on the implementation, and talks about 9.6 improvements, current problems, and things we might expect to see in version 10. There is also a blog post [16] with more details about various improvements in 9.6 to FTS.

You can see the RUM index extension (which has faster ranking) at https://github.com/postgrespro/rum

More reading.

Time series.

“Data points with timestamps.”

Time series databases are used a lot for monitoring. Either for monitoring server metrics (like cpu load) or for monitoring sensors and all other manner of things. Perhaps sensor data, or any other IoT application you can think of.

RRDtool from the late 90s.

To do efficient queries of data over say a whole month or even a year, you need to aggregate the values into smaller buckets. Either minute, hour, day, or month sized buckets. Some data is recorded at such a high frequency, that doing an aggregate (sum, total, ...) of all that data would take quite a while.

Round robin databases don't even store all the raw data, but put things into a circular buffer of time buckets. This saves a LOT of disk space.

The other thing time series databases do is accept a large amount of this type of data. To efficiently take in a lot of data, you can use things like COPY IN, rather than lots of individual inserts, or use SQL arrays of data. In the future (PostgreSQL 10), you should be able to use logical replication to have multiple data collectors.

Materialized views can be handy to have a different view of the internal data structures. To make things easier to query.

date_trunc can be used to truncate a timestamp into the bucket size you want. For example SELECT date_trunc('hour', timestamp) as timestamp.

Array functions, and binary types can be used to store big chunks of data in a compact form for processing later. Many time series databases do not need to know the latest results, and some time lag is good enough.

A BRIN index (new in 9.5) can be very useful for time queries. Selecting between two times on a field indexed with BRIN is much quicker. "We managed to improve our best case time by a factor of 2.6 and our worst case time by a factor of 30" [7]. As long as the rows are entered roughly in time order [6]. If they are not for some reason you can reorder them on disk with the CLUSTER command -- however, often time series data comes in sorted by time.

Monasca can provide graphana and API, and Monasca queries PostgreSQL. There's still no direct support in grapha for PostgreSQL, however work has been in progress for quite some time. See the pull request in grafana.

Another project which uses time series in PostgreSQL is Tgres. It's compatible with statsd, graphite text for input, and provides enough of the Graphite HTTP API to be usable with Grafana. The author also blogs[1] a lot about different optimal approaches to use for time series databases.

See this talk by Steven Simpson at the fosdem conference about infrastructure monitoring with PostgreSQL. In it he talks about using PostgreSQL to monitor and log a 100 node system.

In an older 'grisha' blog post [5], he states "I was able to sustain a load of ~6K datapoints per second across 6K series" on a 2010 laptop.

Can we get the data into a dataframe structure for analysis easily? Sure, if you are using sqlalchemy and pandas dataframes, you can load dataframes like this...

df = pd.read_sql(query.statement, query.session.bind)

This lets you unleash some very powerful statistics, and machine learning tools on your data. (there's also a to_sql).

Some more reading.

Object store for binary data.

“Never store images in your database!”

I'm sure you've heard it many times before. But what if your images are your most important data? Surely they deserve something better than a filesystem? What if they need to be accessed from more than one web application server? The solution to this problem is often to store things in some cloud based storage like S3.

BYTEA is the type to use for binary data in PostgreSQL if the size is less than 1GB.

CREATE TABLE files (
    id serial primary key,
    filename text not null,
    data bytea not null
)

Note, however, that streaming the file is not really supported with BYTEA by all PostgreSQL drivers. It needs to be entirely in memory.

However, many images are only 200KB or up to 10MB in size. Which should be fine even if you get hundreds of images added per day. A three year old laptop benchmark for you... Saving 2500 1MB iPhone sized images with python and psycopg2 takes about 1 minute and 45 seconds, just using a single core. (That's 2.5GB of data). It can be made 3x faster by using COPY IN/TO BINARY [1], however that is more than fast enough for many uses.

If you need really large objects, then PostgreSQL has something called "Large Objects". But these aren't supported by some backup tools without extra configuration.

Batteries included? Both the python SQL libraries (psycopg2, and sqlalchemy) have builtin support for BYTEA.

But how do you easily copy files out of the database and into it? I made a image save and get gist here to save and get files with a 45 line python script. It's even easier when you use an ORM, since the data is just an attribute (open('bla.png').write(image.data)).

A fairly important thing to consider with putting gigabytes of binary data into your PostgreSQL is that it will affect the backup/restore speed of your other data. This isn't such a problem if you have a hot spare replica, have point in time recovery(with WALL-e, pgbarman), use logical replication, or decide to restore selective tables.

How about speed? I found it faster to put binary data into PostgreSQL compared to S3. Especially on low CPU clients (IoT), where you have to do full checksums of the data before sending it on the client side to S3. This also depends on the geographical location of S3 you are using, and your network connections to it.

S3 also provides other advantages and features (like built in replication, and it's a managed service). But for storing a little bit of binary data, I think PostgreSQL is good enough. Of course if you want a highly durable globally distributed object store with very little setup then things like S3 are first.

More reading.

http://stackoverflow.com/questions/8144002/use-binary-copy-table-from-with-psycopg2/8150329#8150329

Realtime, pubsub, change feeds, Reactive.

Change feeds are a feed you can listen to for changes. The pubsub (or Publish–subscribe pattern), can be done with LISTEN / NOTIFY and TRIGGER.

Implement You've Got Mail functionality.

This is quite interesting if you are implementing 'soft real time' features on your website or apps. If something happens to your data, then your application can 'immediately' know about it. Websockets is the name of the web technology which makes this perform well, however HTTP2 also allows server push, and various other systems have been in use for a long time before both of these. Say you were making a chat messaging website, and you wanted to make a "You've got mail!" sound. Your Application can LISTEN to PostgreSQL, and when some data is changed a TRIGGER can send a NOTIFY event which PostgreSQL passes to your application, your application can then push the event to the web browser.

PostgreSQL can not give you hard real time guarantees unfortunately. So custom high end video processing and storage systems, or specialized custom high speed financial products are not domains PostgreSQL is suited.

How well does it perform? In the Queue section, I mentioned thousands of events per core on an old laptop.

Issues for latency are the query planner and optimizer, and VACUUM, and ANALYZE.

The query planner is sort of amazing, but also sort of annoying. It can automatically try and figure out the best way to query data for you. However, it doesn't automatically create an index where it might think one would be good. Depending on environmental factors, like how much CPU, IO, data in various tables and other statistics it gathers, it can change the way it searches for data. This is LOTS better than having to write your queries by hand, and then updating them every time the schema, host, or amount of data changes.

But sometimes it gets things wrong, and that isn't acceptable when you have performance requirements. William Stein (from the Sage Math project) wrote about some queries mysteriously some times being slow at [7]. This was after porting his web app to use PostgreSQL instead of rethinkdb (TLDR; the port was possible and the result faster). The solution is usually to monitor those slow queries, and try to force the query planner to follow a path that you know is fast. Or to add/remove or tweak the index the query may or may not be using. Brady Holt wrote a good article on "Performance Tuning Queries in PostgreSQL".

Later on I cover the topic of column databases, and 'real time' queries over that type of data popular in financial and analytic products (pg doesn't have anything built in yet, but extensions exist).

VACUUM ANALYZE is a process that cleans things up with your data. It's a garbage collector (VACUUM) combined with a statistician (ANALYZE). It seems every release of PostgreSQL improves the performance for various corner cases. It used to have to be run manually, and now automatic VACUUM is a thing. Many more things can be done concurrently, and it can avoid having to read all the data in many more situations. However, sometimes, like with all garbage collectors it makes pauses. On the plus side, it can make your data smaller and inform itself about how to make faster queries. If you need to, you can turn off the autovacuum, and do things more manually. Also, you can just do the ANALYZE part to gather statistics, which can run much faster than VACUUM.

To get better latency with python and PostgreSQL, there is asyncpg by magicstack. Which uses an asynchronous network model (python 3.5+), and the binary PostgreSQL protocol. This can have 2ms query times and is often faster than even golang, and nodejs. It also lets you read in a million rows per second from PostgreSQL to python per core [8]. Memory allocations are reduced, as is context switching - both things that cause latency.

For these reasons, I think it's "good enough" for many soft real time uses, where the occasional time budget failure isn't the end of the world. If you load test your queries on real data (and for more data than you have), then you can be fairly sure it will work ok most of the time. Selecting the appropriate client side driver can also give you significant latency improvements.

More reading.

Log storage and processing

Being able to have your logs in a central place for queries, and statistics is quite helpful. But so is grepping through logs. Doing relational or even full text queries on them is even better.

rsyslog allows you to easily send your logs to a PostgeSQL database [1]. You set it up so that it stores the logs in files, but sends them to your database as well. This means if the database goes down for a while, the logs are still there. The rsyslog documentation has a section on high speed logging by using buffering on the rsyslog side [4].

systemd is the more modern logging system, and it allows logging to remote locations with systemd-journal-remote. It sends JSON lines over HTTPS. You can take the data in with systemd (using it as a buffer) and then pipe it into PostgreSQL with COPY at high rates. The other option is to use the systemd support for sending logs to traditional syslogs like rsyslog, which can send it into a PostgreSQL.

Often you want to grep your logs. SELECT regex matches can be used for grep/grok like functionality. It can also be used to parse your logs into a table format you can more easily query.

TRIGGER can be used to parse the data every time a log entry is inserted. Or you can use MATERIALIZED VIEWs if you don't need to refresh the information as often.

Is it fast enough? See this talk by Steven Simpson at the fosdem conference about infrastructure monitoring with PostgreSQL. In it he talks about using PostgreSQL to monitor and log a 100 node system. PostgreSQL on a single old laptop can quite happy ingest at a rate in the hundreds of thousands of messages per second range. Citusdata is an out of core solution which builds on PostgreSQL(and contributes to it ya!). It is being used to process billions of events, and is used by some of the largest companies on the internet (eg. Cloudflare with 5% of internet traffic uses it for logging). So PostgreSQL can scale up too(with out of core extensions).

Batteries included? In the timeseries database section of this article, I mentioned that you can use grafana with PostgreSQL (with some effort). You can use this for dashboards, and alerting (amongst other things). However, I don't know of any really good systems (Sentry, Datadog, elkstack) which have first class PostgreSQL support out of the box.

One advantage of having your logs in there is that you can write custom queries quite easily. Want to know how many requests per second from App server 1 there were, and link it up to your slow query log? That's just a normal SQL query, and you don't need to have someone grep through the logs... normal SQL tools can be used. When you combine this functionality with existing SQL analytics tools, this is quite nice.

I think it's good enough for many small uses. If you've got more than 100 nodes, or are doing a lot of events, it might not be the best solution (unless you have quite a powerful PostgreSQL cluster). It does take a bit more work, and it's not the road most traveled. However it does let you use all the SQL analytics tools with one of the best metrics and alerting systems.

More reading.

Queue for collecting data

When you have traffic bursts, it's good to persist the data quickly, so that you can queue up processing for later. Perhaps you normally get only 100 visitors per day, but then some news article comes out or your website is mentioned on the radio (or maybe spammers strike) -- this is bursty traffic.

Storing data, for processing later is things that systems like Kafka excel at.
Using the COPY command, rather than lots of separate inserts can give you a very nice speedup for buffering data. If you do some processing on the data, or have constraints and indexes, all these things slow it down. So instead you can just put it in a normal table, and then process the data like you would with a queue.

A lot of the notes for Log storage, and Queuing apply here. I guess you're starting to see a pattern? We've been able to use a few building blocks to implement efficient patterns that allow us to use PostgreSQL which might have required specialized databases in the past.

The fastest way to get data into PostgreSQL from python? See this answer [1] where 'COPY {table} FROM STDIN WITH BINARY' is shown to be the fastest way.

More reading.

http://stackoverflow.com/questions/8144002/use-binary-copy-table-from-with-psycopg2/8150329#8150329

High availability, elasticity.

“Will the database always be there for you? Will it grow with you?”

To get things going quickly there are a number of places which offer PostgreSQL as a service [3][4][5][6][7][8]. So you can get them to setup replication, monitoring, scaling, backups, and software updates for you.

The Recovery Point Objective (RPO), and Recovery Time Objective (RTO) are different for every project. Not all projects require extreme high availability. For some, it is fine to have the recovery happen hours or even a week later. Other projects can not be down for more than a few minutes or seconds at a time. I would argue that for many non-critical websites a hot standby and offsite backup will be 'good enough'.

I would highly recommend this talk by Gunnar Bluth - "An overview of PostgreSQL's backup, archiving, and replication". However you might want to preprocess the sound with your favourite sound editor (eg. Audacity) to remove the feedback noise. The slides are there however with no ear destroying feedback sounds.

By using a hot standby secondary replication you get the ability to quickly fail over from your main database. So you can be back up within minutes or seconds. By using pgbarman or wall-e, you get point in time recovery offsite backup of the database. To make managing the replicas easier, a tool like repmgr can come in handy.

Having really extreme high availability with PostgreSQL is currently kind of hard, and requires out of core solutions. It should be easier in version 10.0 however.

Patroni is an interesting system which helps you deploy a high availability cluster on AWS (with Spilo which is used in production), and work is in progress so that it works on Kubernetes clusters. Spilo is currently being used in production and can do various management tasks, like auto scaling, backups, node replacement on failure. It can work with a minimum of three nodes.

As you can see there are multiple systems, and multiple vendors that help you scale PostgreSQL. On the low end, you can have backups of your database to S3 for cents per month, and a hotstandby replica for $5/month. You can also scale a single node all the way up to a machine with 24TB of storage, 32 cores and 244GB of memory. That's not in the same range as casandra installations with thousands of nodes, but it's still quite an impressive range.

More reading.

Column store, graph databases, other databases, ... finally The End?

This article is already way too long... so I'll go quickly over these two topics.

Graph databases like Neo4j allow you to do complex graph queries. Edges, nodes, and hierarchies. How to do that in PostgreSQL? Denormalise the data, and use a path like attribute and LIKE. So to find things in a graph, say all the children, you can pre-compute the path inside a string, rather than do complex recursive queries and joins using foreign keys.

SELECT * FROM nodes WHERE path LIKE '/parenta/child2/child3%';

Then you don't need super complex queries to get the graph structure from parent_id, child_ids and such. (Remember before how you can put a trigram index for fast LIKEs?) You can also use other pattern matching queries on this path, to do things like find all the parents up to 3 levels high that have a child.

Tagging data with a fast LIKE becomes very easy as well. Just store the tags in a comma separated field and use an index on it.

Column stores are where the data is stored in a column layout, instead of in rows. Often used for real time analytic work loads. One the oldest and best of these is Kdb+. Google made one, Druid is another popular one, and there are also plenty of custom ones used in graphics.

But doesn't PostgreSQL store everything in row based format? Yes it does. However, there is an open source extension called cstore_fdw by Citus Data which is a column-oriented store for PostgreSQL.

So how fast is it? There is a great series of articles by Mark Litwintschik, where he benchmarks a billion taxi ride data set with PostgreSQL and with kdb+ and various other systems. Without cstore_fdw, or parallel workers PostgreSQL took 3.5 hours to do a query. With 4 parallel workers, it was reduced to 1 hour and 1 minute. With cstore_fdw it took 2 minutes and 32 seconds. What a speed up!

The End.

I'm sorry that was so long. But it could have been way longer. It's not my fault...

PostgreSQL carries around such a giant Tool Chest.

Hopefully all these words may be helpful next time you want to use PostgreSQL for something outside of relational data. Also, I hope you can see that it can be possible to replace 10 database systems with just one, and that by doing so you can a gain significant ops advantage.

Any corrections or suggestions? Please leave a comment, or see you on twitter @renedudfield
There was discussion on hn and python reddit.

Search This Blog

making apps, making webs.