Release Notes
2.2.0 -> 2.3.0 2025-May-05
Maintenance release.
- changed development environment to vscode as ide
- changed to use poetry as dependency management tool
- inform user if a newer version is available on pypi.org
2.1.0 -> 2.2.0 2025-March-07
- New
CIKXXFilter
was introduced for RawDataBag
and JoinedDataBag
- ciks_filter parameter was added to the load methods of
RawDataBag
and JoinedDataBag
- The notebook 09_00_segments_basics gives an idea how you can work with the information in the segment column.
segment_basics
- The concat processes
ConcatByChangedTimestampProcess
and ConcatByNewSubfoldersProcess
now have a switch to choose whether in_memory or file_based concatenation should be used
ConcatByChangedTimestampProcess
and ConcatByNewSubfoldersProcess
now also support the concatenation of StandardizedBag
StandardizeProcess
now also works with multiple subfolders where each contains BS, CF, and IS folders
- A new example of a memory optimized pipeline was introduced:
secfsdstools.x_examples.automation.memory_optimized_automation.define_extra_processes
.
Have a look at the description of this pipeline in 08_02_automation_a_memory_optimized_example_2.2.0
- Changes
- The
is_xxx_bag_path
methods in the module secfsdstools.d_container.databagmodel
have been moved into RawDataBag
, resp. JoinedDataBag
classes in the same module.
The StandardizedBag
now also has a is_xxx_bag_path
method.
- Other
- GitHub sponsoring account was activated: https://github.com/sponsors/HansjoergW
- GitHub Discussions was activated: https://github.com/HansjoergW/sec-fincancial-statement-data-set/discussions
2.0.0 -> 2.1.0 2025-February-18
The main goal of this release was to improve the memory footprint when working with the framework.
These mainly includes the support for Predicate Pushdown in the load methods, as well
as being able to concatenate bags directly on the file system, which significantly improved the
memory footprint during concatenation steps when using “automation”.
Checkout the notebook: bulk_data_processing_memory_efficiency
- New
- Predicate Pushdown in
load
methods of RawDataBag
and JoinedDataBag
Directly apply filters for adshs, statements, forms, and tags during loading of the data
concat_filebased
concatenates RawDataBag
and JoinedDataBag
folders without loading them into memory
ConcatByChangedTimestampProcess
and ConcatByNewSubfoldersProcess
use concat_filebased
save
for RawDataBag
, JoinedDataBag
, and StandardizedBag
create new the target folder if it does not exist.
1.8.2 -> 2.0.0 2025-February-11
Introducing the new version of the datasets that includes the “segments” column in then num tables.
The main purpose of this version is to ensure that the new “segments” colomn does not interfere with existing logic.
The following did change:
- Checks during starting if only data from the new datasets is present. If not, data have to be reloaded
- New NoSegmentInfo filter for raw and joined bags: removes datapoints with non-empty
segment
info
StandardPresenter
has a new show_segments
flag. If True, datapoints with segments information are displayed as well
- Notebook 03_explore_with_interactive_notebook has new option
show_segments
for displaying the details of a report
- Support for Daily-Datasets has been removed
1.8.1 -> 1.8.2 2025-January-20
- Ensures data is read only from the archived version of the datasets without the segments column in num.
1.8.0 -> 1.8.1 2025-January-12
- Fix problem with circular import when using the new FilterProcess module in secfsdstools.g_pipeline
1.7.0 -> 1.8.0 2025-January-10
1.6.2 -> 1.7.0 2024-December-22
- Fix for new path to zip files on SEC.gov
- The SEC did change the location of the zip files and this latest version fixes the path to them
1.6.1 -> 1.6.2 2024-September-15
- Major changes
- Compatibility for Python 3.7 is no longer checked
- Compatibility for Python 3.11 was added
- Minor changes
secfsdstools.__version__
now returns the version of the library
IncomeStatementStandardizer
- Calculation for
OutstandingShares
and EarningsPerShare
was simplified and improved
- Validation rule for
EarningsPerShare
was added
- Please have a look at the comments in 07_02_IS_standardizer
- Ability to customize the standardizer was improved
- Configure the columns that are merged from sub_df into the final results can be extended
- Configure additional tags that should appear in the final result can be defined
- All constructor parameters of the
Standardizer
base class can be overwritten via the constructor of the three standardizer classes
- New notebook that shows the different possibilities for customization: 07_04_customize_standardizer
1.6 -> 1.6.1 2024-August-20
- Minor improvements
filed
column added to result of present method of standardizer
StandardizedBag
now has a concat()
method to concat multiple instances into one
Standardizer
checks that the data contains just one currency
IncomeStatementStandardizer
now also returns OustandingShares and EarningsPerShare tags
- 03_explore_with_interactive_notebook.ipynb includes use of the
CashFlowStandardizer
- improvements in the READMD.md -> thanks to Hamid Ebadi
- Documentation
1.6.0 2024-July-12
- New
- Introducing Cash Flow Standardizer
The Cash Flow Standardizer makes the cash flow statements easily comparable.
07_03_CF_standardizer
- Improvements
- Small improvements in the Standardizer framework and rules
1.5.0 2024-May-18_
- New
- Introducing Income Statement Standardizer
The Income Statement Standardizer makes the income statements easily comparable.
07_02_IS_standardizer
- Improvements
- Small improvements in the Standardizer framework and rules
1.4.2 2024-Mar-29
- Fix
- The StandardStatementPresenter didn’t consider
qtrs
when displaying the data. This was a problem for the
Income Statement and the Cash Flow.
- Improvements
- Several in the
Standardizer
as preparation to implement the Income Statement and Cash Flow Standardizer
.
1.4.0 2024-Feb-02
- New
- Introducing the Standardizer Framework and the Balance Sheet Standardizer as a first implementation.
The Balance Sheet Standardizer makes the balance sheets easily comparable.
Check out the following notebooks:
07_00_standardizer_basics
07_01_BS_standardizer
- Improvements
- Efficiency improvements for
MultiReportCollector
: Every zip file is opened just once if there are multiple reports
to load from the same zip file.
1.3.0 2023-Dec-28
- New
- Notebook 06_bulk_data_processing_deep_dive
This first version shows how datasets can be created with data from all available zip files. It shows a faster
parallel approach which uses more memory and cpu resources and a slower serial approach which uses significant
less resources.
- Package
u_usecases
introduced.
This package is a place to provide concrete examples showing what you can do
with the secfsdstools
library. As a first usecase, the logic shown and explained in the 06_bulk_data_processing_deep_dive
is provided as logic within the modul bulk_loading
.
1.2.0 2023-Dec-02
- API Changes
MainCoregFilter
was renamed to MainCoregRawFilter
OfficialTagsOnlyFilter
was renamed to OfficialTagsOnlyRawFilter
- New
secfsdstools.e_filter.rawfiltering.USDOnlyRawFilter
is new and removes none USD currency datapoints
- All filters have been implemented for the JoinedDataBag as well:
secfsdstools.e_filter.joinedfiltering
- Notebook 05_filter_deep_dive notebook.
1.1.0 2023-Oct-28
- API Changes
- Zipcollector has now a factory method that can load multiple zip files as one
- Zipcollector has now a factory method that can load all zip files at one
- Zipcollector factory methods have a new filter parameter “post_load_filter”
- New
- Filter for official tags only -> company specific tags are removed
- RawDataBag and JoinedDataBag have now copy_bag method
- Notebook 04_collector_deep_dive
1.0.1 2023-Oct-16
- README.md adpated
Added information about using the library on windows because the multiprocessing package is used
https://docs.python.org/3.10/library/multiprocessing.html#the-process-class
1.0.0 2023-Sep-28
ApiChanges:
- The API has completely changed, it should be more structured now.
Please check out the README.md and the 01_quickstart notebook for details
0.5.0 2023-Jun-02
- use parquet as storing format instead of zipfiles with csv files -> 5-10x faster access to data
- auto discover of new zip files on sec.gov
- launch first time download of zip files without calling the update method
ApiChanges:
- package secfsdstools.d_index was renamed into secfsdstools.c_index
0.4.0 2023-Mar-25
- new MultiReportReader - reads reports from different zipfiles at once
- new CompanyCollector - reads reports for one company from different zipfiles at once
- new merge_pre_and_num() method which only merges the pre and num data but does not pivot it
- new notebook that shows how the data can be analyzed with an interactive jupyter notebook
BugFixes:
- coreg was not considered correctly when merging the data
0.3.0 2023-Feb-04
- integration of https://rapidapi.com/hansjoerg.wingeier/api/daily-sec-financial-statement-dataset. Daily updates instead of quarterly updates.
0.2.1 2023-Jan-21
- class ZipReportReader: helps to read data from a whole zip file; has the same interface as report reader
- class IndexSearch: helps with searching the index_report table
- added a getting started notebook: https://nbviewer.org/github/HansjoergW/sec-fincancial-statement-data-set/blob/main/notebooks/01_quickstart.ipynb
- ensure runs also with python 3.7
- improvements in the API documentation
0.2.0 2023-Jan-14
- first simple APi docu on githubpages https://hansjoergw.github.io/sec-fincancial-statement-data-set/secfsdstools/
- renaming of internal package structure
0.1.3 2023-Jan-08
- dependencies added into pyproject.toml
0.1.1 2023-Jan-07