Release notes
Semantic Versioning
rows
uses [semantic versioning][semver]. Note that it means we do not
guarantee API backwards compatibility on 0.x.y
versions (but we try the best
to).
Version 0.4.2dev0
Released on: (in development)
General Changes and Enhancements
export_to_html
is now available even iflxml
is not installed- Add Jupyter Notebook integration (implements
_repr_html_
,.head
and.tail
) - Fix code to remove some warnings
- Add support to read compressed files directly (like in
rows.import_from_csv("filename.csv.gz")
) rows.Table
now returns a new table when sliced- Remove functions
export_data
andget_filename_and_fobj
(the newSource
implements the features better).
Plugins
- Add param
max_rows
tocreate_table
(import only part of a table, all plugins are supported) - Add
start_row
,end_row
,start_column
andend_column
to ODS plugin - Prevent
xlrd
(XLS plugin) from printing wrong sector size warning ("WARNING *** file size (551546) not 512 + multiple of sector size (512)
") - Set
rows.Table
name (table.meta["name"]
) for ODS, XLS and XLSX plugins - Add option to set
<caption>
tag inexport_to_html
- Use correct table name when exporting to PostgreSQL
- Carefully close all fobjs in pgimport/pgexport
- Added CSV dialect "excel-semicolon"
- Improved PostgreSQL import from CSV (pgimport) when dealing with null values
- PDF now supports
page_numbers
as string (range of numbers) - Add support to exporto to multiple sheets on the same XLSX file
Command-Line Interface
rows schema
is now "lazy" (before it imported the whole file, even if samples were defined)- Add support for compressed files output on
rows pdf-to-text
androws schema
- HTTP cache is disabled by default (this may change in the future)
- Accept URI schemes in
rows convert
rows convert
now supports compressed filesrows pgexport
now accepts query instead of table name (useful for selecting from a view since\copy
cannot use a view but can use a query instead of a table name).- Detect input encoding whenever possible
- Add
--quiet
to some commands (fixprogress
) - Add plugins' input/output options to
convert
- Add
rows csv-merge
(lazily merge CSV files even if they don't share a common schema) - Add
rows csv-clean
(lazily clean a CSV file, removing empty columns and creating a consistent output format) - Add
rows list-sheets
(prints sheet names for ODS, XLS and XLSX files)
Utils
- Add support for CSV format on schema export
- Use dataclasses to describe Source
import_from_source
now supports compressed files (and so all CLI commands)- Add support for passing a
context
toload_schema
Bug Fixes
- #314 rows pgimport fails if using --schema
- #309 Fix file-magic detection
- #320 Get correct data if ODS spreadsheet has empty cells
- Fix slug function (so
"a/b"
will turn into"a_b"
) - Detect as fallback type if all values are empty
- Fix output on
rows schema
(was printing to stdout even if output file is provided) - Fix
rows schema
(some output formats where not working properly)
Version 0.4.1
(bugfix release)
Released on: 2019-02-14
General Changes and Enhancements
- Add new way to make docs (remove sphinx and uses mkdocs + click-man + pycco)
- Update Dockerfile
Bug Fixes
- #305 "0" was not being
deserialized by
IntegerField
Version 0.4.0
Released on: 2019-02-09
General Changes and Enhancements
- #243 Change license to LGPL3.0.
- Added official Python 3.6 support.
Table.__add__
does not depend on table sizes anymore.- Implemented
Table.__iadd__
(table += other
will work). - #234 Remove
BinaryField
from the default list of detection types.
Plugins
- #224 Add
|
as possible delimiter (CSV dialect detection). - Export CSV in batches.
- Change CSV dialect detection sample size to 256KiB.
- #225 Create export callbacks (CSV and SQLite plugins).
- #270 Added options to export pretty text table frames (TXT plugin).
- #274
start_row
andstart_column
now behave the same way in XLS and XLSX (starting from 0). - #261 Add support to
end_row
andend_column
on XLS and XLSX (thanks @Lrcezimbra for the suggestion). - #4 Add PostgreSQL plugin (thanks to @juliano777).
- #290 Fix percent formatting reading on XLSX and ODS file formats (thanks to @jsbueno).
- #220 Do not use non-import_fields and force_types columns on type detection algorithm.
- #50 Create PDF extraction plugin
with two backend libraries (
pymupdf
andpdfminer.six
) and 3 table extraction algorithms. - #294 Decrease XLSX reading time (thanks to @israelst).
- Change to pure Python version of Apache Thrift library (parquet plugin)
- @299 Change CSV field limit
Command-Line Interface
- #242 Add
--fields
/--fields-exclude
toconvert
,join
andsum
(and rename--fields-exclude
onprint
), also remove--fields
fromquery
(is not needed). - #235 Implement
--http-cache
and--http-cache-path
. - #237 Implement
rows schema
(generates schema in text, SQL and Django models). - Enable progress bar when downloading files.
- Create
pgimport
andpgexport
commands. - Create
csv-to-sqlite
andsqlite-to-csv
commands. - Create
pdf-to-text
command. - Add shortcut for all command names:
2
can be used instead of-to-
(sorows pdf2text
is a shortcut torows pdf-to-text
).
Utils
- Create
utils.open_compressed
helper function: can read/write files, automatically dealing with on-the-fly compression. - Add progress bar support to
utils.download_file
(thanks totqdm
library). - Add helper class
utils.CsvLazyDictWriter
(write asdict
s without needing to pass the keys in advance). - Add
utils.pgimport
andutils.pgexport
functions. - Add
utils.csv2sqlite
andutils.sqlite2csv
functions.
Bug Fixes
- #223
UnicodeDecodeError
on dialect detection. - #214 Problem detecting dialect.
- #181 Create slugs inside
Table.__init__
. - #221 Error on
pip install rows
. - #238
import_from_dicts
supports generator as input - #239 Use correct field ordering
- #299 Integer field detected for numbers started with zero
Version 0.3.1
Released on: 2017-05-08
Enhancements
- Move information on README to a site, organize and add more examples. Documentation is available at turicas.info/rows. Thanks to @ellisonleao for Sphinx implementation and @ramiroluz for new examples.
- Little code refactorings.
Bug Fixes
- #200 Escape output when exporting to HTML (thanks to @arloc)
- Fix some tests
- #215 DecimalField does not handle negative values correctly if using locale (thanks to @draug3n for reporting)
Version 0.3.0
Released on: 2016-09-02
Backwards Incompatible Changes
Bug Fixes
- Return
None
on XLS blank cells; - #188 Change
sample_size
on encoding detection.
Enhancements and Refactorings
rows.fields.detect_fields
will considerBinaryField
if all the values arestr
(Python 2)/bytes
(Python 3) and all other fields will work only withunicode
(Python 2)/str
(Python 3);- Plugins HTML and XPath now uses a better way to return inner HTML (when
preserve_html=True
); - #189 Optimize
Table.__add__
.
New Features
- Support for Python 3 (finally!);
rows.fields.BinaryField
now automatically uses base64 to encode/decode;- Added
encoding
information torows.Table
metadata in text plugins; - Added
sheet_name
information torows.Table
metadata in XLS and XLSX plugins; - #190 Add
query_args
toimport_from_sqlite
; - #177 Add
dialect
toexport_to_csv
.
Version 0.2.1
Released on: 2016-08-10
Backwards Incompatible Changes
rows.utils.export_to_uri
signature is now likerows.export_to_*
(first therows.Table
object, then the URI)- Changed default table name in
import_from_sqlite
andexport_to_sqlite
(fromrows
androws_{number}
totable{number}
)
Bug Fixes
- #170 (SQLite plugin) Error
converting
int
andfloat
when value isNone
. - #168 Use
Field.serialize
if does not know the field type (affecting: XLS, XLSX and SQLite plugins). - #167 Use more data to detect dialect, delimit the possible delimiters and fallback to excel if can't detect.
- #176 Problem using quotes on CSV plugin.
- #179 Fix double underscore
problem on
rows.utils.slug
- #175 Fix
None
serialization/deserialization in all plugins (and also field types) - #172 Expose all tables in
rows query
for SQLite databases - Fix
examples/cli/convert.sh
(missing-
) - Avoids SQL injection in table name
Enhancements and Refactorings
- Refactor
rows.utils.import_from_uri
- Encoding and file type are better detected on
rows.utils.import_from_uri
- Added helper functions to
rows.utils
regarding encoding and file type/plugin detection - There's a better description of plugin metadata (MIME types accepted) on
rows.utils
(should be refactored to be inside each plugin) - Moved
slug
andipartition
functions torows.plugins.utils
- Optimize
rows query
when using only one SQLite source
Version 0.2.0
Released on: 2016-07-15
Backwards Incompatible Changes
rows.fields.UnicodeField
was renamed torows.fields.TextField
rows.fields.BytesField
was renamed torows.fields.BinaryField
Bug Fixes
- Fix import errors on older versions of urllib3 and Python (thanks to @jeanferri)
- #156
BoolField
should not accept "0" and "1" as possible values - #86 Fix
Content-Type
parsing - Fix locale-related tests
- #85 Fix
preserve_html
iffields
is not provided - Fix problem with big integers
- #131 Fix problem when empty sample data
- Fix problem with
unicode
andDateField
- Fix
PercentField.serialize(None)
- Fix bug with
Decimal
receiving''
- Fix bug in
PercentField.serialize(Decimal('0'))
- Fix nested table behaviour on HTML plugin
General Changes
- (EXPERIMENTAL) Add
rows.FlexibleTable
class (with help on tests from @maurobaraildi) - Lots of refactorings
- Add
rows.operations.transpose
- Add
Table.__repr__
- Renamte
rows.fields.UnicodeField
torows.fields.TextField
androws.fields.ByteField
torows.fields.BinaryField
- Add a man page (thanks to @kretcheu)
- #40 The package is available on Debian!
- #120 The package is available on Fedora!
- Add some examples
- #138 Add
rows.fields.JSONField
- #146 Add
rows.fields.EmailField
- Enhance encoding detection using file-magic library
- #160 Add
support for column get/set/del in
rows.Table
Tests
- Fix "\r\n" on tests to work on Windows
- Enhance tests with
mock
to assure some functions are being called - Improve some tests
Plugins
- Add plugin JSON (thanks @sxslex)
- #107 Add
import_from_txt
- #149 Add
import_from_xpath
- (EXPERIMENTAL) Add
import_from_ods
- (EXPERIMENTAL) Add
import_from_parquet
- Add
import_from_sqlite
andexport_to_sqlite
(implemented by @turicas with help from @infog) - Add
import_from_xlsx
andexport_to_xlsx
(thanks to @RhenanBartels) - Autodetect delimiter in CSV files
- Export to TXT, JSON and XLS also support an already opened file and CSV can export to memory (thanks to @jeanferri)
- #93 Add HTML helpers inside
rows.plugins.html
:count_tables
,extract_text
,extract_links
andtag_to_dict
- #162 Add
import_from_dicts
andexport_to_dicts
- Refactor
export_to_txt
Utils
- Create
rows.plugins.utils
- #119 Rename field name if name is duplicated (to "field_2", "field_3", ..., "field_N") or if starts with a number.
- Add option to import only some fields (
import_fields
parameter insidecreate_table
) - Add option to export only some fields (
export_fields
parameter insideprepare_to_export
) - Add option
force_types
to force field types in some columns (instead of detecting) oncreate_table
. - Support lazy objects on
create_table
- Add
samples
parameter tocreate_table
CLI
- Add option to disable SSL verification (
--verify-ssl=no
) - Add
print
command - Add
--version
- CLI is not installed by default (should be installed as
pip install rows[cli]
) - Automatically detect default encoding (if not specified)
- Add
--order-by
to some commands and removesort
command. #111 - Do not use locale by default
- Add
query
command: converts (from many sources) internally to SQLite, execute the query and then export
Version 0.1.1
Released on: 2015-09-03
- Fix code to run on Windows (thanks @sxslex)
- Fix locale (name, default name etc.)
- Remove
filemagic
dependency (waiting forpython-magic
to be available on PyPI) - Write log of changes for
0.1.0
and0.1.1
Version 0.1.0
Released on: 2015-08-29
- Implement
Table
and its basic methods - Implement basic plugin support with many utilities and the following formats:
csv
(input/output)html
(input/output)txt
(output)xls
(input/output)- Implement the following field types - many of them with locale support:
ByteField
BoolField
IntegerField
FloatField
DecimalField
PercentField
DateField
DatetimeField
UnicodeField
- Implement basic
Table
operations: sum
join
transform
serialize
- Implement a command-line interface with the following commands:
convert
join
sort
sum
- Add examples to the repository