|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
I need to store some information with my ruby program and I am not sure
on what would be the best method. I'm mostly concerned about what would be the most efficient use of cpu resources. Basically, I will have a list of names each belonging to one of 5 categories. Sort of like this: Cat1 -name1 -name2 -name3 -etc... Cat2 -name4 -name5 -name6 -etc... Cat3 -name7 -name8 -name9 -etc... There will be hundreds of names, evenly divided between the categories. But each name will go in only one category, there is no relation between categories or anything like that. All the information will be completely rewritten once a day and then read several times throughout the day. My choices for storage are an sqlite database (using ActiveRecord), a flat text file of my own design, a YAML file, or an XML file. -- Posted via http://www.ruby-forum.com/. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
2008/4/1, James Dinkel <jdinkel@gmail.com>:
> I need to store some information with my ruby program and I am not sure > on what would be the best method. I'm mostly concerned about what would > be the most efficient use of cpu resources. > > Basically, I will have a list of names each belonging to one of 5 > categories. Sort of like this: > > Cat1 > -name1 > -name2 > -name3 > -etc... > > Cat2 > -name4 > -name5 > -name6 > -etc... > > Cat3 > -name7 > -name8 > -name9 > -etc... > > There will be hundreds of names, evenly divided between the categories. That's not much. I'd probably use XML - but that also depends on what generates the data and what needs to be able to read it. You can efficiently generate it and read it (using a stream parser for example, but that seems unnecessary for hundreds of names only). But ultimately it depends on what you want to do with the data. In some cases a DB might be a better choice. Also, if your volume is going to increase dramatically etc. > But each name will go in only one category, there is no relation between > categories or anything like that. All the information will be > completely rewritten once a day and then read several times throughout > the day. > > My choices for storage are an sqlite database (using ActiveRecord), a > flat text file of my own design, a YAML file, or an XML file. YAML is another nice alternative because it is human readable. And you can use Marshal if producer and consumer of the data are Ruby programs. Kind regards robert -- use.inject do |as, often| as.you_can - without end |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
James Dinkel wrote:
> I need to store some information with my ruby program and I am not sure > on what would be the best method. I'm mostly concerned about what would > be the most efficient use of cpu resources. > > Basically, I will have a list of names each belonging to one of 5 > categories. Sort of like this: > > Cat1 > -name1 > -name2 > -name3 > -etc... > > Cat2 > -name4 > -name5 > -name6 > -etc... > > Cat3 > -name7 > -name8 > -name9 > -etc... > > There will be hundreds of names, evenly divided between the categories. > But each name will go in only one category, there is no relation between > categories or anything like that. All the information will be > completely rewritten once a day and then read several times throughout > the day. > > My choices for storage are an sqlite database (using ActiveRecord), a > flat text file of my own design, a YAML file, or an XML file. IMHO Databases are best when you have concurrent access to data being modified regularly and want to enforce constraints during concurrent write accesses. In your case, the data is mostly static and constraints are easily handled outside the storage layer (you overwrite all data with another consistent version in one pass). I'd advise to use the simplest storage method, which probably is a YAML dump of an object holding all this data. Marshall.dump/load is an option too. It may be faster than YAML if this matters to you (I've not benchmarked it, so you better do it if you need fast read/write). It's not human-readable, so it can be a drawback when debugging. That was the code/integration complexity side of your problem. For the performance side of the problem : If you dump your data in a temporary file and then rename it to overwrite the final destination, you can use a neat hack for long running processes needing fresh data: you can design a little cache that checks the mtime of the backing store (the final destination) on read accesses and reload it when it changes. mtime checks are cheap and simple to code and if the need arise for really high throughput you can minimize them by coding a TTL logic. Lionel |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
> But ultimately it depends on what you want to do with the data. yeah, it's kinda hard to describe without just posting my entire script, which I doubt people will want to read. The data will be accessed by one ruby script, running on one computer. The data will be read in, then the file closed and done for a couple hours. So no concurrent access, no relations, no keeping the connection open for extended periods of time, which is why I thought a database would probably be overkill and just add overhead. But I didn't know if maybe reading a file into memory would take more effort than reading entries from a database. Also, I was a little off on the numbers, I meant to say that there are hundreds of names per category, so total names could be over a thousand. That size will likely never ever change beyond +/- 100 at the most. Thanks for the info. I'm really a newb at this, so any thoughts on storing data using any of these methods is ful. James. -- Posted via http://www.ruby-forum.com/. |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
[Note: parts of this message were removed to make it a legal post.]
Seems like the type of problem yaml thats perfect for yaml On Tue, Apr 1, 2008 at 11:32 AM, James Dinkel <jdinkel@gmail.com> wrote: > > > But ultimately it depends on what you want to do with the data. > > yeah, it's kinda hard to describe without just posting my entire script, > which I doubt people will want to read. > > The data will be accessed by one ruby script, running on one computer. > The data will be read in, then the file closed and done for a couple > hours. So no concurrent access, no relations, no keeping the connection > open for extended periods of time, which is why I thought a database > would probably be overkill and just add overhead. > > But I didn't know if maybe reading a file into memory would take more > effort than reading entries from a database. Also, I was a little off > on the numbers, I meant to say that there are hundreds of names per > category, so total names could be over a thousand. That size will > likely never ever change beyond +/- 100 at the most. > > Thanks for the info. I'm really a newb at this, so any thoughts on > storing data using any of these methods is ful. > > James. > -- > Posted via http://www.ruby-forum.com/. > > |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On Tue, Apr 1, 2008 at 10:32 AM, James Dinkel <jdinkel@gmail.com> wrote:
> > > But ultimately it depends on what you want to do with the data. > > yeah, it's kinda hard to describe without just posting my entire script, > which I doubt people will want to read. > > The data will be accessed by one ruby script, running on one computer. > The data will be read in, then the file closed and done for a couple > hours. So no concurrent access, no relations, no keeping the connection > open for extended periods of time, which is why I thought a database > would probably be overkill and just add overhead. > > But I didn't know if maybe reading a file into memory would take more > effort than reading entries from a database. Also, I was a little off > on the numbers, I meant to say that there are hundreds of names per > category, so total names could be over a thousand. That size will > likely never ever change beyond +/- 100 at the most. > > Thanks for the info. I'm really a newb at this, so any thoughts on > storing data using any of these methods is ful. > > James. I'm going to slightly disagree with Lionel -- and also Robert -- on this one. First of all, a database is not necessarily just for concurrency. It's for data integrity and allows the ability to build reports on that data that you can trust because of the strict nature of the underlying data store (I'm talking about RDBMS, but I've kept my eyes open about OO databases as well; stay away from Pick, though!!). Here's the problem with relational databases, though (RDBMSs): it's hard to model a hierarchy (which you can pull off somewhat clumsily with XML). If you are not going to do serious queries and inserts on the db, and your data isn't complex, then a flat file approach might work. It works, after all, for software builds. I strongly recommend against it in higher languages, though, even for small apps. And, no, I am not a database vendor. I always tell people they should learn SQL, but nowadays I'm getting a cold shoulder, especially with OO people ![]() The other important thing that I've noticed about data and storage is: what do you want to do with it and how often? Store it, query it (and how), add to it, move it around, archive it, etc. These are important factors to consider. Todd |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
Oh wait, Lionel already suggested that.
|
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
Don't forget: you could put the data into a hash, and marshall it to
disc. Not a DB, but better than a flat file! |
|
![]() |
| Outils de la discussion | |
|
|