05.13
Today I needed to migrate data from a Drupal site to Ruby on Rails. It is pretty easy to export data from my Drupal database (MySQL) into CSV format that I can use to import into my Rails application (Postgre SQL). So that’s where I started. I’m running Drupal 6.x, here is my database query to pull nodes out of the database. I ran this in phpMyAdmin then exported the results to a CSV file.
SELECT * FROM www_node n INNER JOIN www_content_type_release t INNER JOIN www_location l INNER JOIN www_location_instance i INNER JOIN www_node_revisions r ON t.nid=n.nid AND n.nid=i.nid AND i.lid=l.lid AND r.nid=n.nid WHERE n.type="release"
Back in Rails, I already have my application setup to support the press releases I’ll be importing and I won’t be covering that code here, it’s a standard scaffolding type MVC. You’ll need to update the code below to match your application naming.
First, we need a model to store the uploaded file and some related information. To upload and attach the CSV data, I’m going to use the Paperclip plugin, thanks to jimneath.org for the blog post on integrating Paperclip into your rails application. If you have more questions on Paperclip, please refer to their documentation or jimneath.org.
script/plugin install git://github.com/thoughtbot/paperclip.git
Continuing on with the code, you’ll need to generate a model for you data.
ruby script/generate model import
Edit the model import.rb
class Import < ActiveRecord::Base # Paperclip has_attached_file :csv validates_attachment_presence :csv validates_attachment_content_type :csv, :content_type => ['text/csv','text/comma-separated-values','text/csv','application/csv','application/excel','application/vnd.ms-excel','application/vnd.msexcel','text/anytext','text/plain'] end
Edit the migration
class CreateImports < ActiveRecord::Migration def self.up create_table :imports do |t| t.string :datatype t.integer :processed, :default => 0 t.string :csv_file_name t.string :csv_content_type t.integer :csv_file_size t.timestamps end end def self.down drop_table :imports end end
Update your database:
rake db:migrate
Next, lets generate a controller to upload files and process the data
ruby script/generate controller imports
In imports_controller.rb I’m going to add methods to process the data I’ll be importing as well as a private methods to read the CSV file and store the records in the database. Note: I installed the gem, fastercsv | http://fastercsv.rubyforge.org/ to parse the CSV data.
class ImportsController < ApplicationController
before_filter :login_required #protect controller from anonymous users
def new
@import = Import.new
end
def create
@import = Import.new(params[:import])
respond_to do |format|
if @import.save!
flash[:notice] = 'CSV data was successfully imported.'
format.html { redirect_to(@import) }
else
flash[:error] = 'CSV data import failed.'
format.html { render :action => "new" }
end
end
end
def show
@import = Import.find(params[:id])
end
def proc_csv
@import = Import.find(params[:id])
lines = parse_csv_file(@import.csv.path)
lines.shift #comment this line out if your CSV file doesn't contain a header row
if lines.size > 0
@import.processed = lines.size
lines.each do |line|
case @import.datatype
when "releases"
new_release(line)
end
end
@import.save
flash[:notice] = "CSV data processing was successful."
redirect_to :action => "show", :id => @import.id
else
flash[:error] = "CSV data processing failed."
render :action => "show", :id => @import.id
end
end
private
def parse_csv_file(path_to_csv)
lines = []
#if not installed run, sudo gem install fastercsv
#http://fastercsv.rubyforge.org/
require 'fastercsv'
FasterCSV.foreach(path_to_csv) do |row|
lines << row
end
lines
end
def new_release(line)
params = Hash.new
params[:release] = Hash.new
params[:release]["title"] = line[0]
params[:release]["rdate"] = line[1]
params[:release]["body"] = line[3]
params[:release]["notes"] = line[4]
release = Release.new(params[:release])
release.save
end
end
Last, we need to create two views, the first allows you to upload the CSV file, the second shows the results of the upload and allows you to process the data. Note: At this point you should probably reload your web server so the fastercsv gem and paperclip plugin are available.
new.html.erb
Upload a CSV file to import into the database
<% form_for @import, :html => { :multipart => true } do |f| %> <%= f.file_field :csv %> <select name="import[datatype]" size="1"> <option value="releases"> Press Releases </option> </select> <%= f.submit "Import" %> <% end %>
show.html.erb
Results of your CSV Upload
<%= @import.csv.path %> <% if @import.processed > 0 %> Success! This import contained <%= @import.processed %> records <% else %> <%= link_to "process", import_proc_path(@import.id) %> <% end %>
Update your routes.rb, add the following line:
map.resources :imports map.import_proc '/import/proc/:id', :controller => "imports", :action => "proc_csv"
That’s pretty much it. You can alter/expand on this to support multiple models in your application. Just make sure to pay close attention to detail when your mapping columns in the CSV file to values in your model. You may want to add a call somewhere to destroy old CSV import records since you likely don’t need to store the CSV data indefinately. Also, there could be a lot more error checking to this process to make sure the upload is valid data and each row is processed correctly. I hope to follow up this post soon with a Part 2 which adds more data processing details. Last, I know there is a lot of stuff placed in the controller, it’s something i’ve been in the habit of doing. Any input on moving processing to the model would be welcome.

[...] I decided to allow csv imports for creating members in a two step process similar to this one, http://goodbadtech.com/2009/05/13/ruby-on-rails-import-csv-data-into-database/ [...]
This sounds so good, but then I get …
NoMethodError in Import#new
Showing app/views/import/new.html.erb where line #1 raised:
undefined method `imports_path’ for #<ActionView::Base:0x2588a0c>
Extracted source (around line #1):
1: <% form_for @import, :html => { :multipart => true } do |f| %>
2: <%= f.file_field :csv %>
3: <select name=”import[datatype]” size=”1″>
4: <%= f.submit “Import” %>
Can you help me on this?
Thanks, Val
It looks like you’re missing the required links in the routes.rb, specifically,
map.resources :importsMake sure that’s added and you should be okay. Let me know if that works.Hi this is a fantastic tutorial, with all the things shared that are needed.
But i get the following problem after Iidid exactly as described in your tutorial but it came out with this errormessage and i can’t quite figure out why:
Unknown action
No action responded to index. Actions: create, new, pages_for, proc_csv, and show
Hope you can help me quickly with it would be great to get a response.
Thanks, Markus
@Markus, at what point in the process is this error showing up? There isn’t an index action defined in the controller, so I wonder if there is something wrong in the routes.rb file.
It shows at the point, when i get on the new or show page, which i have created for the upload.
my routes.rb says the following:
ActionController::Routing::Routes.draw do |map|
map.resources :inventarylists
map.resources :imports
map.import_proc ‘/import/proc/:id’, :controller => “imports”, :action => “proc_csv”
map.root :controller => “inventarylists”
map.connect ‘:controller/:action/:id’
map.connect ‘:controller/:action/:id.:format’
end
maybe theres something wrong in the map.imports?
I think the problem is URL you’re accessing. There is no index action defined in the controller or the views so if that’s what the web server is looking for it will error out. The correct URL you should be accessing is: http://yourdomain.com/imports/new, also, make sure there is no trailing slash on new.
Thank you very very much, how could i overlook this mistake =(
its me again, having some trouble, everything worked quite allright but when i upload some csv file i get this errormessage:
ActiveRecord::RecordInvalid in ImportsController#create
Validation failed: Csv is not one of the allowed file types.
is this a problem of the csv? or do i have to create the csv with a specific program?
Hi Markus,
You’ll want to review import.rb, and make sure you have all of the correct file types entered into the validates_attachment_content_type method. If you’re still getting validation errors I’m guessing your CSV file has some other mime type. Maybe copy/paste all the text into Notepad/TextEdit and save as a .csv file. That should set an accepted mime type for the file.
Hi,
thank you for your quick answer . Do i have to add these file types or just correct them in some way? Can’t quite figure out how to adjust them.
An addition to the above post:
when i get it right in the import.rb i put the name of my application in application, but what do i put in text and does my csv have to be named like my application?
I get this error:
compile error/imports/new.html.erb:6: syntax error, unexpected tIDENTIFIER, expecting kEND
select name="import[datatype]" size="1">
^
/imports/new.html.erb:7: syntax error, unexpected '<'
<%option value="releases"> Press Releases </option>
^
/imports/new.html.erb:7: unknown type of %string
<%option value="releases"> Press Releases </option>
^
Where am I going wrong? I've checked the routes.rb file as well and it has the required links.
anything on above ^? Any help would be greatly appreciated.
Thanks!
@Markus
I’m not 100% sure on what you’re asking, but to clarify file types, in import.rb you’ll need line below (as noted in the post above)
validates_attachment_content_type :csv, :content_type => ['text/csv','text/comma-separated-values','text/csv','application/csv','application/excel','application/vnd.ms-excel','application/vnd.msexcel','text/anytext','text/plain']
To adjust them to suport additional file types you can add the files to this method call. For instance, after ‘text/plain’, you could add “,’my/filetype/csv’”
As far as naming the file, the name of the file itself shouldn’t matter. I would suggest keeping the extension .csv in place though.
@Ashish
There seems to be an encoding error that just showed up. The correct code should be (remove the % signs after the first < on each line):
<select name=”import[datatype]” size=”1″>
<option value=”releases”> Press Releases </option>
</select>
I’ll correct the post. Thanks.
That worked! Thank you!
I was wondering if theres anyway you’d be able to share the scaffolding code as well. I’m new to Rails and im not sure how to get the CSV records into a database. It works right now and shows me what file was imported and how many records, but im not sure how to go about creating a separate controller/model/db to actually view the records in MySQL db.
Any help would be great.
Thanks again!
Hi Ashish,
The key lines in this code that determine what is actually done with the imported data are lines 62-71 of imports_controller.rb which define a method, new_release, which is called in line 35 of the proc_csv method.
In my code, I use a model called Release that contains title, rdate, body, and notes columns. If you wanted to create a model with exact same properties as mine, you could use the following:
script/generate scaffold Release title:string rdate:date body:text notes:text
I hope this helps.
Hi,
I’ve been using your model to import csv data into the db. Up until now, ive been using a combination of ActiveRecord importation and importing through MySQL queries. Problem is, if i do it through MySQL queries, I’m restricted to how many records I can import. It creates a new connection for each record..which means at the most I can import 1000 records. Is there a way to import using ActiveRecord but also allowing for UPDATING Duplicate entries? I’ve gotten to where I can import several thousand entries through activeRecord, but if there’s a duplicate, it throws an exception. I am trying to use ar-extensions on_duplicate_key_update, however that does not support Rube 2.3.4.
Any help would be greatly appreciated!
Thanks.
Hi, sorry for the delay in response. I would suggest looking that the method
new_release(line)in the controller. Before callingrelease = Release.new(params[:release]), you could query for an existing record with the same parameters, maybeif Release.not_duplicate(params[:release] == truerelease = Release.new(params[:release])
release.save
end
You’d have to add the method, not_duplicate, to your model, have it requery the table for existing records with the same data, and return true if nothing is found and return false otherwise.
Hi,
Thanks for the reply. I ended up resolving the issue. If you’re interested:
rescue ActiveRecord::StatementInvalid => e
raise e unless /Mysql::Error: Duplicate entry/.match(e)
if /Mysql::Error: Duplicate entry/.match(e)
then
MySQL UPDATE query here….
So in short, i ended up running the regex to catch the exception thrown by MySQL when it encounters a duplicate record, and then I update that specific record with a MySQL update query and update/concat the values i need to. Works quite well actually.
Thanks for all your help!
Me thinks a lot of your code in your controller should be moved into your model, to keep your controllers clean. There’s no need to have to use private and muck up your controller like that.
hello there,
i’m finding a problem, when i accessed myhomedomain/imports/new
then appear the error, said that
undefined method `file’ for #<Class:0×5419808>
do you know what the problem is?
any help would be appreciated, thanks
-bayu-
I would say double check your code in new.html.erb, if you’re still having an issue please please let me know.
HI,
i tried and i tried and i tried =)
and i get this as a mistake
FasterCSV::MalformedCSVError in ImportsController#proc_csv
Illegal quoting on line 2.
can you tell me what went wrong?
solved that problem, was just a matter of the csv file
but how can i make the now imported data visible in my database? it wont show up in my viewfiles
by that i mean, i have alle the columns and stuff but i dont see the actual data, only many many empty rows
Hi Markus,
I didn’t add any code to my imports view because I use a different controller to actually view imported data. It would take several additions to correctly view the imported data.
You’ll see on lines 62-71 of imports_controller.rb, that each line of imported data gets inserted into a Release model that exists elsewhere in my application, but none of the imported data is stored in the Import model, only the number of rows imported is stored there.
I would suggest adding a view to your Release model, or whatever model you’re importing data into that displays the imported rows.
@Darren
You are correct. I actually mentioned that at the very end of my post. It’s an habit I’m always trying to break. Thanks for the feedback.
Hi thanks for your great answers so fat, but i the problem continues, as you mentioned i should create another view, did that originally for the uploaded data, but it doesnt import my data right now.
All my data is crashed together in the first two columns of about 10 or so. So it looks disturbing, plus it doesnt view all of them correctly, it somehow corrupts my pagination.
got an idea for that? =)
hi good tutorial this is want i am looking so far..
Hi,
Thanks for the tutorial – it was helpful.
I think there is a slight error on line 12 of the controller;
@import.save! should be @import.save
Thanks for the feedback Sai.
The call to save! is actually a way to force all validations on a model to be run and capture any errors. You can check it out here, http://api.rubyonrails.org/classes/ActiveRecord/Base.html#M002330
Ah, thank you for clarifying this. What are the benefits of <% .save! %> over <% .save %>? I thought the validations would have been run regardless.
Also, save! appears to raise ActiveRecord::RecordInvalid, as opposed to the neater error_messages which are available via the view. Is it possible to raise a neater error message? As far as I can tell, it isn’t, but I may be wrong.
Hello,
Thanks for the tutorial, it helped extremely.
I do have a little problem.
The import is created but the proc_csv does not.
I get an …
“
undefined method `to_sym' for nil:NilClass" error on the Imports/show view - Line 11.8:9: <% else %>
10:
11: <%= link_to "process", import_proc_path(@import.id) %>
12:
13: <% end %>
Thank you for any help with this.
John
I can replace Line 11 with…
<%= link_to “process”, :controller => ‘imports’, :action => ‘proc_csv’, :id => @import.id %>
and everything works fine.
John
Found it.
My routes were messed up.
map.import_proc ‘/import/proc/id’, :controller => ‘imports’, :action => ‘proc_csv’
should be…
map.import_proc ‘/import/proc/:id’, :controller => ‘imports’, :action => ‘proc_csv’
John
Hi John,
I’m glad the post was helpful and that you already tracked down answer for the question you had.
Regards,
Spencer
In regards to moving the logic into the controller: http://gist.github.com/313281
This is just a quick example and I’m not sure it would work perfect. My setup is quite a lot different, but a good starting point.
Also error handling could be done a few ways. For me first I’m using delayed_job and after a csv file is uploaded I past off the load to delayed_job b/c we have imports of 100,000 + lines. As load_csv method is looping over rows we collect errors, if any, into an array and send out the results in an email.
Err I meant model
Just implemented for a project. Works great. Many thanks for the work.
Thanks for the feedback. I’m glad it helped you out.
Forgot to mention in my last comment but if you push this to a background job you should do some (rescue FasterCSV::MalformedCSVError) action. Or even better do it if not tossing to background job so you don’t get app errors if users upload a file that isn’t parsed perfectly.
I hope this thread/message board is still active…
I’ve implemented FasterCSV into my project for a long time now. Actually now I’ve moved to importing XLS files and parsing the data and importing into MySQL. Problem I’m running into now, after the database has become significantly large (>200,000 records) is that when parsing and importing data now…rails is painfully slow. A file of about a 1000 records which previously used to take minutes…now takes at least 30. I’m having major memory leaks and performance problems.
Any possible explanations or remedies? I’ve gone through the code and noticed that there are several connection strings to the database which are constantly connecting and quitting..I know this is probably a big issue and I’m in the process of fixing it. However, any other pointers on where I can look to increase performance and decrease memory issues?