XML Storage Stuff
Via Bill Humphries, I found today the PostgreSQL and XPath extension. Looks nice and this is exactly what I meant when Sandro asked me, what I'd like to have in MySQL.
Not really sure, but it looks like the PostgreSQL UDF reads all the relevant BLOBs into the memory, if you have an XPath-expression in a WHERE clause. So i'm wondering what the performance of this is. If it would index the XML-BLOBs, this would be great.
I'm particularly interested in these discussions, because Bitflux CMS does at the moment store its unstructered XML data in MySQL, which is not always ideal. Therefore we're looking into a better solution. One way would be the approach the above extension is taking (RDBMS do have advantages, if it comes to structured and relational data).
I also looked into exist - "an Open Source native XML database", which looks quite nice. But I don't like going down the Java road for a PHP project and it doesn't solve the relational relationship problem. Although it does support xinclude, you can't search in them (at the moment). Xindice is another Open Source native XML DB, which I didn't really test 'till now, 'cause the project looks quite stalled and the 1.0 version has problems (according to the developers) with large unstructured content. Hope this will change.
One of the main goals of the next Bitflux CMS will be to abstract the storage interface. With this approach, it shouldn't really matter if you want to use MySQL, PGSQL, exist or anything else. Just use the best tool for the job ;) You should be even able to use different storages in the same project (for example, use MySQL for relational data, an XMLDB for really unstructured content and the filesystem for binary data).
Not really sure, but it looks like the PostgreSQL UDF reads all the relevant BLOBs into the memory, if you have an XPath-expression in a WHERE clause. So i'm wondering what the performance of this is. If it would index the XML-BLOBs, this would be great.
I'm particularly interested in these discussions, because Bitflux CMS does at the moment store its unstructered XML data in MySQL, which is not always ideal. Therefore we're looking into a better solution. One way would be the approach the above extension is taking (RDBMS do have advantages, if it comes to structured and relational data).
I also looked into exist - "an Open Source native XML database", which looks quite nice. But I don't like going down the Java road for a PHP project and it doesn't solve the relational relationship problem. Although it does support xinclude, you can't search in them (at the moment). Xindice is another Open Source native XML DB, which I didn't really test 'till now, 'cause the project looks quite stalled and the 1.0 version has problems (according to the developers) with large unstructured content. Hope this will change.
One of the main goals of the next Bitflux CMS will be to abstract the storage interface. With this approach, it shouldn't really matter if you want to use MySQL, PGSQL, exist or anything else. Just use the best tool for the job ;) You should be even able to use different storages in the same project (for example, use MySQL for relational data, an XMLDB for really unstructured content and the filesystem for binary data).
Comments
ZZ/OSS weblog
@ 05.08.2003 21:31 CEST
[...] 05.08.2003, 19:31, by ordnas, Category: Databases
Christian has some info on XML and XPath features in PostgreSQL, provided via UDF. Yes, I would also love to see something similar in MySQ [...]
[...] 05.08.2003, 19:31, by ordnas, Category: Databases
Christian has some info on XML and XPath features in PostgreSQL, provided via UDF. Yes, I would also love to see something similar in MySQ [...]
ZZ/OSS Weblog
@ 05.08.2003 21:31 CEST
(Trackback)
XML Features in PostgreSQL
Christian has some info on XML and XPath features in PostgreSQL, provided via UDF. Yes, I would also love to see something similar in MySQL!
XML Features in PostgreSQL
Christian has some info on XML and XPath features in PostgreSQL, provided via UDF. Yes, I would also love to see something similar in MySQL!
You're right about the way that the Postgres XML functions work, and while the libxml2 library which they are based on is astonishingly fast, you would run into performance problems querying against XPath statements on thousands of records. However, if you are extracting the values of XML nodes from resultsets based on meta-data queries, it's a very efficient and effective solution:
SELECT xpath_string(xml_column,'/article/summary') AS summary
FROM articles
WHERE category = 'birds';
PostgreSQL's flexible indexing mechanisms will support dramatic performance improvements for frequently used XPath statements.
The latest version of our XML functions include better handling of multiple rows (a tricky area when XML meets relational data) and fast XSLT support, allowing the homogenisation of incoming data or database-side conversion of XML to HTML, for example.
Let me know if you're interested in this stuff and I'll email you the latest version.
Have you considered using the Sedna XML Database?, it's written in C++, is fast, reliable and provides many APIs including PHP.
add a comment
Your email adress will never be published.
Comment spam will be deleted!
