Blog Posts

solr: Too many open files

 We recently reimplemented a search in a client project with solr (version 3.5). For the communication between our PHP application and the solr server we used the PHP library solr-php-client (http://solr-php-client.googlecode.com).

After the first release to the stage server our client run loadtests on the whole application and the results for the search use case were pretty bad: 80% failures and an average response time of 30 seconds. The loadtest was run with 1000 parallel requests over 10 hours.

When checking the server we noticed that after around 3-4 hours the library did not get any responses back anymore from solr. The solr logs, however, were empty from the point where it stopped responding to the search requests until the loadtest was over. The system administrators found out that around the same time that solr stopped responding there were system errors that the maximun number of allowed open files on the system was exceeded. This limit was set to 1024 allowed open files. We started to regularly check the number of open files during the loadtests and noticed that that number went up to 25000 open files.

On the internet we found a lot of blog posts saying that setting useCompoundFile to true and mergeFactor to a lower number, e.g. 2 instead of 10, the number of index files could be reduced and hence also the number of opened files during a search. However, these blogposts were always talking about indexes with over 1'000'000 documents in them whereas our index only contained about 50'000 documents. So we started investigating in another direction. Netstat finally pointed us in the right direction. We noticed that some connections to the solr server were not properly closed after a search request. solr-php-client by default uses the function file_get_contents() to do requests to solr. This function opens a file handle even if you use it to open a stream. Hence, after some hours of constantly doing requests to solr through file_get_contents() and some of these connections not being closed properly the maximum number of open files was exceeded.

solr-php-client gives you the possiblity to choose the http transport, with file_get_contents() or curl. After changing the http transport to curl for all our requests to solr the failure rate of the loadtests went down to 0% and an average response time of 0.4 seconds and the maximum number of open files during the loadtest was at around 250 files.

Tags:
Related Entries:
- Making of resolutionfinder.org UN database

About the author

Comments [5]

Till, 21.06.2012 11:23 CET

I'd check if the library still does an extra ping prior to each request to Solr. So e.g. in our case, 50 requests to Solr were really 25 because it did an extra ping to check (from within the lib/PHP userland) if the Solr server was available.

Ben Davies, 21.06.2012 11:38 CET

Thanks for this.
We have changed our implementation to use curl.

Any reason why you used solr-php-client rather than Solarium, which is now very powerful and stable.

nick, 21.06.2012 11:53 CET

thanks! you save my day!

Lea, 25.06.2012 07:18 CET

Ben, the only reason was that the project was already using solr-php-client for another solr search and we didn't want to use two different libraries

suvi, 26.09.2012 15:35 CET

On unix,linux pipes and sockets are files as well ;-)

Add a comment

Your email adress will never be published. Comment spam will be deleted!