Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

varchar mysql errors #30

Open
anthonymobile opened this issue May 4, 2016 · 4 comments
Open

varchar mysql errors #30

anthonymobile opened this issue May 4, 2016 · 4 comments

Comments

@anthonymobile
Copy link

anthonymobile commented May 4, 2016

trying to import a GTFS file to mysql using the following and getting a ton of VARCHAR length unspecified errors from sqlalchemy. what am i doing wrong?

import pygtfs
sched = pygtfs.Schedule('mysql://gtfs:gtfs@localhost:5432/gtfs_njt_rail')
pygtfs.append_feed(sched, "/home/anthony/code/pi_transitsign/data/gtfsdata/njt_rail_data")

here's the output

python2 import_rail_mysql.py

Traceback (most recent call last):
File "import_rail_mysql.py", line 17, in
sched = pygtfs.Schedule('mysql://gtfs:gtfs@localhost:5432/gtfs_njt_rail')
File "/usr/lib/python2.7/site-packages/pygtfs/schedule.py", line 36, in init
Base.metadata.create_all(self.engine)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/schema.py", line 3695, in create_all
tables=tables)
File "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1856, in _run_visitor
conn._run_visitor(visitorcallable, element, *_kwargs)
File "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1481, in _run_visitor
*_kwargs).traverse_single(element)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/visitors.py", line 121, in traverse_single
return meth(obj, *_kw)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/ddl.py", line 730, in visit_metadata
_is_metadata_operation=True)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/visitors.py", line 121, in traverse_single
return meth(obj, *_kw)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/ddl.py", line 764, in visit_table
include_foreign_key_constraints=include_foreign_key_constraints
File "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute
return meth(self, multiparams, params)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/ddl.py", line 68, in _execute_on_connection
return connection._execute_ddl(self, multiparams, params)
File "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 962, in _execute_ddl
compiled = ddl.compile(dialect=dialect)
File "", line 1, in
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 494, in compile
return self._compiler(dialect, bind=bind, *_kw)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/ddl.py", line 26, in _compiler
return dialect.ddl_compiler(dialect, self, *_kw)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 190, in init
self.string = self.process(self.statement, *_compile_kwargs)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 213, in process
return obj._compiler_dispatch(self, *_kwargs)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/visitors.py", line 81, in _compiler_dispatch
return meth(self, *_kw)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 2170, in visit_create_table
(table.description, column.name, ce.args[0])
File "/usr/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 2159, in visit_create_table
and not first_pk)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 213, in process
return obj._compiler_dispatch(self, *_kwargs)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/visitors.py", line 81, in compiler_dispatch
return meth(self, **kw)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 2190, in visit_create_column
first_pk=first_pk
File "/usr/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 1961, in get_column_specification
column.type, type_expression=column)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 261, in process
return type
._compiler_dispatch(self, *_kw)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/visitors.py", line 81, in _compiler_dispatch
return meth(self, *kw)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/compiler.py", line 2594, in visit_unicode
return self.visit_VARCHAR(type
, **kw)
File "/usr/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2313, in visit_VARCHAR
self.dialect.name)
sqlalchemy.exc.CompileError: (in table '_feed', column 'feed_name'): VARCHAR requires a length on dialect mysql

@jarondl
Copy link
Owner

jarondl commented May 5, 2016

Hi @anthonymobile ,
You are doing nothing wrong. So far I have tested on SQLite and PostgreSQL, and not on MySQL, and that is why this bug was not discovered.
However, this raises a design issue. We use SQLAlchemy's Unicode[1][2]. In SQLite the varchar length is undefined, and in PostgreSQL unnecessary, while MySQL demands that we put a length. I couldn't find any string length constraint in the GTFS reference [3].

We can choose one of the following options.

  1. Announce that PyGTFS is not MySQL compatible, and tell users to pick a different database.
  2. Pick a length (255 is a popular choice), and apply it all the time. (Should we truncate?)
  3. Pick a length but only when the database requires it.

What do you think?

BTW, cool project!
https://github.com/anthonymobile/pi_transitsign

[1] http://docs.sqlalchemy.org/en/latest/core/type_basics.html#sqlalchemy.types.Unicode
[2] http://docs.sqlalchemy.org/en/latest/core/type_basics.html#sqlalchemy.types.String
[3] https://developers.google.com/transit/gtfs/reference

@anthonymobile
Copy link
Author

thanks... i think for me at least this is not needed. i was looking at mysql because of performance issues on a Raspberry Pi and i refactored my code so that is moot.

however, i was thinking about this a bit. it seems kind of crazy to -have- to load the entire GTFS database into memory. is there a way to select a subset of the full database through a different sqlalchemy query without substantially modifying the pygtfs code?

@jarondl
Copy link
Owner

jarondl commented May 22, 2016

I am not sure what you mean by that. You can store your database on disk with SQLite or PostgreSQL, and then the amount of memory used depends on your database settings.
You can also store it in-memory in SQLite, but that does seem crazy for large GTFS datasets.

If you mean you want to put only a subset of the CSV files in your database, then that is way tougher to implement in pygtfs. You can filter the files yourself before putting them in the database.

@jarondl
Copy link
Owner

jarondl commented Nov 11, 2018

To clarify my latest comment: You can use the included gtfs2db script to translate your zip file to sqlite:

gtfs2db append  gtfs.zip  db.sqlite

Then you can use the sqlite database in pygtfs:

sched = pygtfs.Schedule("db.sqlite")

The nice thing here is that sqlite is quite smart about loading only the relevant stuff that you need.
If you are really capped by your memory use, you can always do your full preprocessing somewhere else and store only what you need on your Pi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants